parsing many documents takes too long

2011-08-12 Thread Tri Nguyen
Hi,
 
My results from solr returns about 982 documents and I use jaxb to parse them 
into java objects, which takes about 469 ms, which is over my 150-200ms 
threshold.
 
Is there a solution around this?  Can I store the java objects in the index and 
return them in the solr response and then serialize them back into java 
objects?  Would this take less time?
 
Any other ideas?
 
Thanks,
 
Tri

Re: Strip special chars like -

2011-08-12 Thread roySolr
Erick, you're right. It's working, my schema looks like this:

fieldType name=name_type class=solr.TextField
positionIncrementGap=100
  analyzer type=index
 charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
filter class=solr.ASCIIFoldingFilterFactory/ 
filter class=solr.TrimFilterFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 catenateWords=1 splitOnCaseChange=0
splitOnNumerics=0 stemEnglishPossessive=0/ 
  /analyzer
  analyzer type=query
 charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.WhitespaceTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
filter class=solr.ASCIIFoldingFilterFactory/ 
filter class=solr.TrimFilterFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 catenateWords=0 catenateNumbers=0
splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=0 / 
  /analyzer
/fieldType

Thanks for helping me!!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Strip-special-chars-like-tp3238942p3248545.html
Sent from the Solr - User mailing list archive at Nabble.com.


Clustering not working when using 'text' field as snippet.

2011-08-12 Thread Pablo Queixalos
Hi,

 

 

I am using solr-3.3.0 and carrot² clustering which works fine out of the box 
with the examples doc and default solr configuration (the 'features' Field is 
used as snippet).

 

I indexed my own documents using the embed ExtractingRequestHandler wich by 
default stores contents in the 'text' Field. When configuring clustering on 
'text' as snippet, carrot doesn't work fine and only shows 'Other topics' with 
all the documents within. It looks like carrot doesn't get the 'text' Field 
stored content.

 

 

If I store the documents content in the 'features' field and get back to the 
original configuration clustering works fine.

 

The only difference I see between 'text' and 'features' Fields in schema.xml is 
that some CopyFields are defined for 'text'.

 

 

I didn't debug solr.clustering.ClusteringComponent nor CarrotClusteringEngine 
yet, but am I misunderstanding something about the 'text' Field ? 

 

 

Thanks,

 

Pablo.



Last successful build of Solr 4.0 and Near Realtime Search

2011-08-12 Thread Vadim Kisselmann
Hi folks,

I'm writing here again (beside Jira: SOLR-2565), eventually any one can help
here:


I tested the nightly build #1595 with an new patch (2565), but NRT doesn't
work in my case.

I index 10 docs/sec, it takes 1-30sec. to see the results.
same behavior when i update an existing document.

My addedDate is an timestamp (default=NOW). In worst case i can see what
the document which i indexed is already more when 30
seconds in my index, but i can't see it.

My Settings:
autoCommit
maxDocs1000/maxDocs
maxTime6/maxTime
/autoCommit

autoSoftCommit
maxDocs1/maxDocs
maxTime1000/maxTime
/autoSoftCommit

Are my settings wrong or need you more details?
Should i use the coldSearcher (default=false)? Or set maxWarmingSearchers
higher than 2?
UPDATE:
If i only use autoSoftCommit and uncomment autoCommit it works.
But i should use the hard autoCommit, right?
Mark said yes, because only with hard commits my docs are in stable storage:
http://www.lucidimagination.com/blog/2011/07/11/benchmarking-the-new-solr-
‘near-realtime’-improvements/

Regards
Vadim


Re: Clustering not working when using 'text' field as snippet.

2011-08-12 Thread Stanislaw Osinski
Hi Pablo,

The reason clustering doesn't work with the text field is that the field
is not stored:

 field name=text type=text_general indexed=true stored=false
multiValued=true/

For clustering to work, you'll need to keep your documents' titles and
content in stored fields.

Staszek


On Fri, Aug 12, 2011 at 10:28, Pablo Queixalos pablo.queixa...@polyspot.com
 wrote:

 Hi,





 I am using solr-3.3.0 and carrot² clustering which works fine out of the
 box with the examples doc and default solr configuration (the 'features'
 Field is used as snippet).



 I indexed my own documents using the embed ExtractingRequestHandler wich by
 default stores contents in the 'text' Field. When configuring clustering on
 'text' as snippet, carrot doesn't work fine and only shows 'Other topics'
 with all the documents within. It looks like carrot doesn't get the 'text'
 Field stored content.





 If I store the documents content in the 'features' field and get back to
 the original configuration clustering works fine.



 The only difference I see between 'text' and 'features' Fields in
 schema.xml is that some CopyFields are defined for 'text'.





 I didn't debug solr.clustering.ClusteringComponent nor
 CarrotClusteringEngine yet, but am I misunderstanding something about the
 'text' Field ?





 Thanks,



 Pablo.




Re: SEVERE: org.apache.solr.common.SolrException: Error loading class 'solr.ICUTokenizerFactory'

2011-08-12 Thread Péter Király
Hi Satish,

 :   I also added the following files to my apache-solr-3.3.0\example\lib
 :  folder:

I use ICU, and I copied the jar files not into example/lib as you did,
but example/solr/lib. First I had to create that directory. It works
for me both under 3.1, 3.2 and 3.3. In multicore setup I put the lib
under the cores directories.

Hope it helps.

Péter
-- 
eXtensible Catalog
http://drupal.org/project/xc


Re: Timeout trying to index from nutch

2011-08-12 Thread Markus Jelsma
Firewall? Proxy?

 I am new user and I have SOLR installed. I can use the admin page and
 query the example data.
 However, I was using nutch to load index with intranet web pages and I
 got this message.
 
 SolrIndexer: starting at 2011-08-12 16:52:44
 org.apache.solr.client.solrj.SolrServerException:
 java.net.ConnectException: Connection timed out
 
 Timeout happened after about 12 minutes. I cant seem to find this
 message in an archive search. Can anyone give me some clues?
 
 Notice: This email and any attachments are confidential. If received in
 error please destroy and immediately notify us. Do not copy or disclose
 the contents.


SOLR 3.3.0 multivalued field sort problem

2011-08-12 Thread johnnyisrael
Hi,

I am currently using SOLR 1.4.1, With this version sorting working fine even
in multivalued field.

Now I am planning to upgrade my SOLR version from 1.4.1 -- 3.3.0, In this
latest version sorting is not working on multivauled field.

So I am in unable to upgrade my SOLR due to this drawback.

Is there a work around available to fix this problem?

Thanks,

Johnny

--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-3-3-0-multivalued-field-sort-problem-tp3248778p3248778.html
Sent from the Solr - User mailing list archive at Nabble.com.


Fuzzy search with sort combination - drawback

2011-08-12 Thread johnnyisrael
Hi,

I am having one problem while using the fuzzy search from query.

I have two fields in my SOLR output, one field is endNgramed and other one
is a normal Integer field which will have my customized score for that
document.

I have a handler[myhandler] which by default will sort the documents based
on my customized score field and return the response.

Here I am trying fuzzy search sample query.

http://localhost:8080/solr/core0/select/?qt=myhandlerq=apple~0.7

I am getting the following result for the above query

{
 Term:tool academy application,
 MyScore:1152},
{
 Term:fiona apple,
 MyScore:928},
{
 Term:apple bottom jeans,
 MyScore:637},
{
 Term:apply for reality show,
 MyScore:606},
{
 Term:tool academy 3 application,
 MyScore:203}]
 
 
Here less relevant content has been coming on top [due to my customized
sort].

If I remove that sorting criteria and do the same search, I am getting the
following result.

{
 Term:fiona apple,
 MyScore:928},
{
 Term:apple bottom jeans,
 MyScore:637},
{
 Term:apply for reality show,
 MyScore:606},
{
 Term:tool academy application,
 MyScore:1152},
{
 Term:tool academy 3 application,
 MyScore:203}]
 },
 
Is there a way to achieve the customized sort as well as the relevant
content on top in this scenario.

Can anyone advice?

Thanks,

Johnny

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Fuzzy-search-with-sort-combination-drawback-tp3248774p3248774.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SOLR 3.3.0 multivalued field sort problem

2011-08-12 Thread Péter Király
Hi,

There is no direct solution, you have to create single value field(s)
to create search. I am aware of two workarounds:

- you can use a random or a given (e.g. the first) instance of the
multiple values of the field, and that would be your sortable field.
- you can create two sortable fields: _min and _max, which
contains the minimal and maximal values of the given field values.

At least, that's what I do. Probably there are other solutions as well.

Péter
-- 
eXtensible Catalog
http://drupal.org/project/xc


2011/8/12 johnnyisrael johnnyi.john...@gmail.com:
 Hi,

 I am currently using SOLR 1.4.1, With this version sorting working fine even
 in multivalued field.

 Now I am planning to upgrade my SOLR version from 1.4.1 -- 3.3.0, In this
 latest version sorting is not working on multivauled field.

 So I am in unable to upgrade my SOLR due to this drawback.

 Is there a work around available to fix this problem?

 Thanks,

 Johnny

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/SOLR-3-3-0-multivalued-field-sort-problem-tp3248778p3248778.html
 Sent from the Solr - User mailing list archive at Nabble.com.



RE: Clustering not working when using 'text' field as snippet.

2011-08-12 Thread Pablo Queixalos
Thanks for your reply Staszek,


Of course, the field has to be stored. I forgot to mention that I already 
updated the schema for that. I also checked that data was effectiveley stored 
in that field. 

Anyway, I tried to reproduce it on a fresh Solr install and clustering works 
well. ;-)


Pablo.

-Message d'origine-
De : stac...@gmail.com [mailto:stac...@gmail.com] De la part de Stanislaw 
Osinski
Envoyé : vendredi 12 août 2011 11:00
À : solr-user@lucene.apache.org
Objet : Re: Clustering not working when using 'text' field as snippet.

Hi Pablo,

The reason clustering doesn't work with the text field is that the field is 
not stored:

 field name=text type=text_general indexed=true stored=false
multiValued=true/

For clustering to work, you'll need to keep your documents' titles and content 
in stored fields.

Staszek


On Fri, Aug 12, 2011 at 10:28, Pablo Queixalos pablo.queixa...@polyspot.com
 wrote:

 Hi,





 I am using solr-3.3.0 and carrot² clustering which works fine out of 
 the box with the examples doc and default solr configuration (the 'features'
 Field is used as snippet).



 I indexed my own documents using the embed ExtractingRequestHandler 
 wich by default stores contents in the 'text' Field. When configuring 
 clustering on 'text' as snippet, carrot doesn't work fine and only shows 
 'Other topics'
 with all the documents within. It looks like carrot doesn't get the 'text'
 Field stored content.





 If I store the documents content in the 'features' field and get back 
 to the original configuration clustering works fine.



 The only difference I see between 'text' and 'features' Fields in 
 schema.xml is that some CopyFields are defined for 'text'.





 I didn't debug solr.clustering.ClusteringComponent nor 
 CarrotClusteringEngine yet, but am I misunderstanding something about 
 the 'text' Field ?





 Thanks,



 Pablo.




Not update on duplicate key

2011-08-12 Thread Rohit
Hi All,

 

Please correct  me if I am wrong, but when I am trying to insert a document
into Solr which was previously index, it overwrites the current key. 

 

Is there a way to change the behaviour,

 

1. I don't want Solr to override but on the other hand it should ignore the
entry

2. Also, if I could change the behaviour on the fly, update based on a flag
and ignore on another flag.

 

 

Thanks and Regards,

Rohit

 



Post content to be indexed to Solr

2011-08-12 Thread rahul
Hi,

Currently I am indexing documents by directly adding files as
'req.addFile(fi);' or  by sending the content of the file like
'req.addContentStream(stream);' using solrj.

Assume, if the solrj client  Solr server are in different network (ie, Solr
server is in remote location) I need to transfer entire file content to
Solr. I believe the indexed content of a file should be less than the actual
file. 

Hence is there a way to get the content that to be indexed from client part
(instead of simply sending the entire file content - I believe the content
to be indexed should be 1 to 10% of original file. plz Correct me, if I am
wrong...) using any lucene  api and then post the specific content to remote
server.

Is there any way to achieve this ?? Plz update me, if I am anything wrongly
understand.

Thanks in Advance..

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Post-content-to-be-indexed-to-Solr-tp3249009p3249009.html
Sent from the Solr - User mailing list archive at Nabble.com.


Nutch related issue: URL Ignore

2011-08-12 Thread Pawan Darira
hi

i am using nutch 1.2. in my crawl-urlfilter.txt, i am specifying URLs to be
skipped. i am giving some patterns that need to be skipped but it is not
working

e.g.

-^http://([a-z0-9]*\.)*domain.com
+^http://([a-z0-9]*\.)*domain.com/([0-9-a-z])*.html
-^http://([a-z0-9]*\.)*domain.com/([a-z/])*
-^http://([a-z0-9]*\.)*domain.com/top-ads.php

i want the second URL only to be included while crawling  all other
patterns to be excluded. but it is crawling all of them. Please suggest
where might be the issue

thanks
Pawan


Re: Unbuffered entity enclosing request can not be repeated Invalid chunk header

2011-08-12 Thread Vadim Kisselmann
Hi Markus,

thanks for your answer.
I'm using Solr. 4.0 and jetty now and observe the behavior and my error logs
next week.
tomcat can be a reason, we will see, i'll report.

I'm indexing WITHOUT batches, one doc after another. But i would try out the
batch indexing as well as
retry indexing faulty docs.
if you indexing one batch, and one doc in batch is corrupt, what happens
with another 249docs(total 250/batch)? Are they indexed and
updated when you retry to indexing the batch, or fails the complete batch?

Regards
Vadim




2011/8/11 Markus Jelsma markus.jel...@openindex.io

 Hi,

 We  see these errors too once on a while but there is real answer on the
 mailing list here except one user suspecting Tomcat is responsible
 (connection
 time outs).

 Another user proposed to limit the number of documents per batch but that,
 of
 course, increases the number of connections made. We do only 250 docs/batch
 to
 limit RAM usage on the client and start to see these errors very
 occasionally.
 There may be a coincidence.. or not.

 Anyway, it's really hard to reproduce if not impossible. It happens when
 connecting directly as well when connecting through a proxy.

 What you can do is simply retry the batch and it usually works out fine. At
 least you don't loose a batch in the process. We retry all failures at
 least a
 couple of times before giving up an indexing job.

 Cheers,

  Hello folks,
 
  i use solr 1.4.1 and every 2 to 6 hours i have indexing errors in my log
  files.
 
  on the client side:
  2011-08-04 12:01:18,966 ERROR [Worker-242] IndexServiceImpl - Indexing
  failed with SolrServerException.
  Details: org.apache.commons.httpclient.ProtocolException: Unbuffered
 entity
  enclosing request can not be repeated.:
  Stacktrace:
 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHtt
  pSolrServer.java:469) .
  .
  on the server side:
  INFO: [] webapp=/solr path=/update params={wt=javabinversion=1} status=0
  QTime=3
  04.08.2011 12:01:18 org.apache.solr.update.processor.LogUpdateProcessor
  finish
  INFO: {} 0 0
  04.08.2011 12:01:18 org.apache.solr.common.SolrException log
  SCHWERWIEGEND: org.apache.solr.common.SolrException: java.io.IOException:
  Invalid chunk header
  .
  .
  .
  i`m indexing ONE document per call, 15-20 documents per second, 24/7.
  what may be the problem?
 
  best regards
  vadim



sorting issue with solr 3.3

2011-08-12 Thread Bernd Fehling

It turned out that there is a sorting issue with solr 3.3.
As fas as I could trace it down currently:

4 docs in the index and a search for *:*

sorting on field dccreator_sort in descending order

http://localhost:8983/solr/select?fsv=truesort=dccreator_sort%20descindent=onversion=2.2q=*%3A*start=0rows=10fl=dccreator_sort

result is:
--
lst name=sort_values
arr name=dccreator_sort
strconvertitovistitutonazionaled/str
str莊國鴻chuangkuohung/str
strzyywwwxxx/str
strabdelhadiyasserabdelfattah/str
/arr
/lst

fieldType:
--
fieldType name=alphaOnlySortLim class=solr.TextField sortMissingLast=true 
omitNorms=true
  analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.TrimFilterFactory /
filter class=solr.PatternReplaceFilterFactory 
pattern=([\x20-\x2F\x3A-\x40\x5B-\x60\x7B-\x7E]) replacement= replace=all/
filter class=solr.PatternReplaceFilterFactory pattern=(.{1,30})(.{31,}) 
replacement=$1 replace=all/
  /analyzer
/fieldType

field:
--
field name=dccreator_sort type=alphaOnlySortLim indexed=true stored=true 
/


According to documentation the sorting is UTF8 but _why_ is the first string
at position 1 and _not_ at position 3 as it should be?


Following sorting through the code is somewhat difficult.
Any hint where to look for or where to start debugging?

Regards
Bernd


Re: how to integrate solr with web page?

2011-08-12 Thread Ahmet Arslan
 Hi i have queried solr to retrieve
 information from database now i have to
 integrate with web page...i dont know how to implement this
 please help
 me...
 
     actually i have one jsp page which is having
 search field and search
 button now i need to get the results from solr in the jsp
 page..pls
 help me
 

Consider using Solritas http://wiki.apache.org/solr/VelocityResponseWriter


JEE servlet mapping, security and multiple Solr cores

2011-08-12 Thread Jaeger, Jay - DOT
This is both an FYI for the list so the issue gets documented and a suggestion 
for the developers.  I thought about a JIRA, and would be happy to submit one, 
but the issue is pretty environment-specific, so I have not done so at this 
point.

In testing Solr 3.3 under WebSphere Application Server 7 (WAS 7), we discovered 
that WAS 7 and multiple cores did not get along  (at least in cases where 
security is applied).  The result was that any URL request for a core (e.g.,  
/solr/our-core-name/select, /solr/ our-core-name /update and, in particular, 
/solr/ our-core-name /replication) resulted in 404 errors.

Investigation revealed that WAS 7 ignored the filter definition in deciding 
whether or not a particular resource existed or not.  So index pages worked 
fine (because it knew about them from the WAR) and the default core worked fine 
(because of the Servlet definitions in web.xml define /select/* and /update/*) 
but replication did not work at all, even for the default core (because 
/replication is not defined in web.xml), and select and update requests for 
other than the default core did not work.

To fix it, we added the following for *each* *core* name we had:

servlet-mapping
  servlet-nameSolrServer/servlet-name
  url-pattern/our-core-name/*/url-pattern
  !-- Note:  This mapping is not actually used, but tells the container the 
resource is not a static web page --
/servlet-mapping

That way WAS 7 knew that the resource existed.  Because of the filter, this 
servlet mapping never actually gets used, so I expect it really does not matter 
what servlet it points to, but this seemed a likely choice.  But putting this 
in web.xml was enough to tell WAS that the resource existed.  It was then able 
to properly apply security to the resource as well.

(Note:  We tried url-pattern/*/url-pattern .  That works, after a fashion, 
but because these servlet definitions take priority over static web resources 
(e.g.  /admin/index.jsp), it broke all of that stuff, so we had to go with core 
by core.  8^) ).

(FYI:  We then added individual security-constraint entries with associated 
web-resource-collection entries and auth-constraint entries for /core-name/*  
[for select and update] and one such entry with multiple URL's to handle 
security for replication).

A suggestion for the developers would be to add some sort of comment in web.xml 
or at least in the documentation about this need to provide a servlet-mapping 
entry for each core for WAS 7 (and possibly other web application servers).  It 
took more than a day of research and testing to figure it all out.

Jay R. Jaeger
State of Wisconsin, 
Dept. of Transportation



Re: how to integrate solr with web page?

2011-08-12 Thread Nicholas Chase

On 8/12/2011 12:52 AM, nagarjuna wrote:

Hi i have queried solr to retrieve information from database now i have to
integrate with web page...i dont know how to implement this please help
me...

 actually i have one jsp page which is having search field and search
button now i need to get the results from solr in the jsp page..pls
help me


This video: 
http://www.lucidimagination.com/devzone/videos-podcasts/how-to/adding-search-your-web-application-lucidworks-enterprise 
explains how to do it in PHP, but the concepts are the same using 
SolrJ.  (The video is about LucidWorks Enterprise, but the section on 
retrieving data, which starts at about 5:40, applies just as well to 
Solr.)  You can get more information on SolrJ here: 
http://wiki.apache.org/solr/Solrj


Hope that helps...

  Nick


Re: SOLR 3.3.0 multivalued field sort problem

2011-08-12 Thread Martijn v Groningen
Hi Johnny,

Sorting on a multivalued field has never really worked in Solr.
Solr versions = 1.4.1 allowed it, but there was a change that an error
occurred and that the sorting might not be what you expect.
From Solr 3.1 and up sorting on a multivalued isn't allowed and a http 400
is returned.

Duplicating documents or fields (what Peter describes) is as far as I know
they only option until Lucene supports sorting on multivalued fields
properly.

Martijn

2011/8/12 Péter Király kirun...@gmail.com

 Hi,

 There is no direct solution, you have to create single value field(s)
 to create search. I am aware of two workarounds:

 - you can use a random or a given (e.g. the first) instance of the
 multiple values of the field, and that would be your sortable field.
 - you can create two sortable fields: _min and _max, which
 contains the minimal and maximal values of the given field values.

 At least, that's what I do. Probably there are other solutions as well.

 Péter
 --
 eXtensible Catalog
 http://drupal.org/project/xc


 2011/8/12 johnnyisrael johnnyi.john...@gmail.com:
  Hi,
 
  I am currently using SOLR 1.4.1, With this version sorting working fine
 even
  in multivalued field.
 
  Now I am planning to upgrade my SOLR version from 1.4.1 -- 3.3.0, In
 this
  latest version sorting is not working on multivauled field.
 
  So I am in unable to upgrade my SOLR due to this drawback.
 
  Is there a work around available to fix this problem?
 
  Thanks,
 
  Johnny
 
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/SOLR-3-3-0-multivalued-field-sort-problem-tp3248778p3248778.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 




-- 
Met vriendelijke groet,

Martijn van Groningen


Exception DirectSolrSpellChecker when using spellcheck.q

2011-08-12 Thread O. Klein
Spellchecker works fine, but when using spellcheck.q it gives following
exception (queryAnalyzerFieldType is defined if that would matter).

Is it bug or am I doing something wrong?

2011-08-12 17:30:54,368 java.lang.NullPointerException
at
org.apache.solr.handler.component.SpellCheckComponent.getTokens(SpellCheckComponent.java:476)
at
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:131)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:202)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1401)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Exception-DirectSolrSpellChecker-when-using-spellcheck-q-tp3249565p3249565.html
Sent from the Solr - User mailing list archive at Nabble.com.


Tomcat7 with Solr closes at fixed hours, every time another hour

2011-08-12 Thread Adrian Fita
Hello.

I'm having a Solr running within Tomcat7 and Tomcat is closing at
fixed hours, everytime is a different hour. catalina.log doesn't show
anything other than a clean tomcat shutdown  (no exception or
anything). I would really apreciate some advice on how to debug this.
Tomcat doesn't run anything other than solr.

The Context XML definition for the solr application is the following:

Context path=/solr docBase=/opt/solr/webapp
Environment name=solr/home type=java.lang.String
value=/opt/solr/solr-home/ override=true/
/Context

Here are some relevant messages from catalina.log:

Aug 12, 2011 5:53:41 PM
org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: {optimize=} 0 6255
Aug 12, 2011 5:53:41 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/solr path=/update params={wt=json} status=0 QTime=6255
Aug 12, 2011 6:00:01 PM org.apache.coyote.AbstractProtocol pause
INFO: Pausing ProtocolHandler [http-bio-8081]
Aug 12, 2011 6:00:02 PM org.apache.coyote.AbstractProtocol pause
INFO: Pausing ProtocolHandler [ajp-bio-8010]
Aug 12, 2011 6:00:03 PM org.apache.catalina.core.StandardService stopInternal
INFO: Stopping service Catalina
Aug 12, 2011 6:00:03 PM org.apache.solr.core.SolrCore close
INFO: []  CLOSING SolrCore org.apache.solr.core.SolrCore@16d7894
Aug 12, 2011 6:00:03 PM org.apache.solr.update.DirectUpdateHandler2 close
INFO: closing 
DirectUpdateHandler2{commits=9408,autocommits=0,optimizes=9408,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=7500,cumulative_deletesById=1908,cumulative_deletesByQuery=0,cumulative_errors=0}
Aug 12, 2011 6:00:03 PM org.apache.solr.update.DirectUpdateHandler2 close
INFO: closed 
DirectUpdateHandler2{commits=9408,autocommits=0,optimizes=9408,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=7500,cumulative_deletesById=1908,cumulative_deletesByQuery=0,cumulative_errors=0}
Aug 12, 2011 6:00:03 PM org.apache.solr.core.SolrCore closeSearcher
INFO: [] Closing main searcher on request.
Aug 12, 2011 6:00:03 PM org.apache.solr.search.SolrIndexSearcher close
INFO: Closing Searcher@1b0f1ab main

fieldValueCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}

filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=0,cumulative_evictions=0}

queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=7958,cumulative_hits=25,cumulative_hitratio=0.00,cumulative_inserts=7935,cumulative_evictions=0}

documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,warmupTime=0,cumulative_lookups=736,cumulative_hits=442,cumulative_hitratio=0.60,cumulative_inserts=294,cumulative_evictions=0}
Aug 12, 2011 6:00:04 PM org.apache.catalina.loader.WebappClassLoader
checkThreadLocalMapForLeaks
SEVERE: The web application [/solr] created a ThreadLocal with key of
type [org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value
[org.apache.solr.schema.DateField$ThreadLocalDateFormat@e2d63f]) and a
value of type [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat]
(value [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a])
but failed to remove it when the web application was stopped. Threads
are going to be renewed over time to try and avoid a probable memory
leak.
Aug 12, 2011 6:00:04 PM org.apache.catalina.loader.WebappClassLoader
checkThreadLocalMapForLeaks
SEVERE: The web application [/solr] created a ThreadLocal with key of
type [org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value
[org.apache.solr.schema.DateField$ThreadLocalDateFormat@e2d63f]) and a
value of type [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat]
(value [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a])
but failed to remove it when the web application was stopped. Threads
are going to be renewed over time to try and avoid a probable memory
leak.
Aug 12, 2011 6:00:04 PM org.apache.coyote.AbstractProtocol stop
INFO: Stopping ProtocolHandler [http-bio-8081]
Aug 12, 2011 6:00:04 PM org.apache.coyote.AbstractProtocol stop
INFO: Stopping ProtocolHandler [ajp-bio-8010]
Aug 12, 2011 6:00:04 PM org.apache.coyote.AbstractProtocol destroy
INFO: Destroying ProtocolHandler [http-bio-8081]
Aug 12, 2011 6:00:04 PM org.apache.coyote.AbstractProtocol destroy
INFO: Destroying ProtocolHandler [ajp-bio-8010]

--
Fita Adrian


Re: sorting issue with solr 3.3

2011-08-12 Thread Yonik Seeley
On Fri, Aug 12, 2011 at 9:53 AM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de wrote:
 It turned out that there is a sorting issue with solr 3.3.
 As fas as I could trace it down currently:

 4 docs in the index and a search for *:*

 sorting on field dccreator_sort in descending order

 http://localhost:8983/solr/select?fsv=truesort=dccreator_sort%20descindent=onversion=2.2q=*%3A*start=0rows=10fl=dccreator_sort

 result is:
 --
 lst name=sort_values
 arr name=dccreator_sort
 strconvertitovistitutonazionaled/str
 str莊國鴻chuangkuohung/str
 strzyywwwxxx/str
 strabdelhadiyasserabdelfattah/str
 /arr
 /lst


Hmmm, are the docs sorted incorrectly too, or is it the sort_values
that are incorrect?
All variants of string sorting should be well tested... see TestSort.testSort()


 fieldType:
 --
 fieldType name=alphaOnlySortLim class=solr.TextField
 sortMissingLast=true omitNorms=true
  analyzer
    tokenizer class=solr.KeywordTokenizerFactory/
    filter class=solr.LowerCaseFilterFactory /
    filter class=solr.TrimFilterFactory /
    filter class=solr.PatternReplaceFilterFactory
 pattern=([\x20-\x2F\x3A-\x40\x5B-\x60\x7B-\x7E]) replacement=
 replace=all/
    filter class=solr.PatternReplaceFilterFactory
 pattern=(.{1,30})(.{31,}) replacement=$1 replace=all/
  /analyzer
 /fieldType

 field:
 --
 field name=dccreator_sort type=alphaOnlySortLim indexed=true
 stored=true /


 According to documentation the sorting is UTF8 but _why_ is the first string
 at position 1 and _not_ at position 3 as it should be?


 Following sorting through the code is somewhat difficult.
 Any hint where to look for or where to start debugging?


Sorting.getStringSortField()

Can you reproduce this with a smaller test that we could use to debug/fix?

-Yonik
http://www.lucidimagination.com


Re: LockObtainFailedException

2011-08-12 Thread Naveen Gupta
HI Peter

I found the issue,

Actually we were getting this exception because of JVM space. I allocated
512 xms and 1024 xmx .. finally increased the time limit for write lock to
20 secs .. things are working fine ... but still it did not help ...

On closely analysis of doc which we were indexing, we were using
commitWithin as 10 secs, which was the root cause of taking so long for
indexing the document because of so many segments to be committed.

On separate commit command using curl solved the issue.

The performance improved from 3 mins to 1.5 secs :)

Thanks a lot
Naveen

On Thu, Aug 11, 2011 at 6:27 PM, Peter Sturge peter.stu...@gmail.comwrote:

 Optimizing indexing time is a very different question.
 I'm guessing your 3mins+ time you refer to is the commit time.

 There are a whole host of things to take into account regarding
 indexing, like: number of segments, schema, how many fields, storing
 fields, omitting norms, caching, autowarming, search activity etc. -
 the list goes on...
 The trouble is, you can look at 100 different Solr installations with
 slow indexing, and find 200 different reasons why each is slow.

 The best place to start is to get a full understanding of precisely
 how your data is being stored in the index, starting with adding docs,
 going through your schema, Lucene segments, solrconfig.xml etc,
 looking at caches, commit triggers etc. - really getting to know how
 each step is affecting performance.
 Once you really have a handle on all the indexing steps, you'll be
 able to spot the bottlenecks that relate to your particular
 environment.

 An index of 4.5GB isn't that big (but the number of documents tends to
 have more of an effect than the physical size), so the bottleneck(s)
 should be findable once you trace through the indexing operations.



 On Thu, Aug 11, 2011 at 1:02 PM, Naveen Gupta nkgiit...@gmail.com wrote:
  Yes this was happening because of JVM heap size
 
  But the real issue is that if our index size is growing (very high)
 
  then indexing time is taking very long (using streaming)
 
  earlier for indexing 15,000 docs at a time (commit after 15000 docs) , it
  was taking 3 mins 20 secs time,
 
  after deleting the index data, it is taking 9 secs
 
  What would be approach to have better indexing performance as well as
 index
  size should also at the same time.
 
  The index size was around 4.5 GB
 
  Thanks
  Naveen
 
  On Thu, Aug 11, 2011 at 3:47 PM, Peter Sturge peter.stu...@gmail.com
 wrote:
 
  Hi,
 
  When you get this exception with no other error or explananation in
  the logs, this is almost always because the JVM has run out of memory.
  Have you checked/profiled your mem usage/GC during the stream operation?
 
 
 
  On Thu, Aug 11, 2011 at 3:18 AM, Naveen Gupta nkgiit...@gmail.com
 wrote:
   Hi,
  
   We are doing streaming update to solr for multiple user,
  
   We are getting
  
  
   Aug 10, 2011 11:56:55 AM org.apache.solr.common.SolrException log
  
   SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain
  timed
   out: NativeFSLock@/var/lib/solr/data/index/write.lock
  at org.apache.lucene.store.Lock.obtain(Lock.java:84)
  at
  org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1097)
  at
   org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:83)
  at
  
 
 org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102)
  at
  
 
 org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174)
  at
  
 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222)
  at
  
 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
  at
   org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147)
  at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77)
  at
  
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55)
  at
  
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360)
  at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
  at
  
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
  at
  
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
  at
  
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
  at
  
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
  at
  
 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
  at
  
 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
  at
  
 
 

Re: Need help indexing/querying a particular type of hierarchy

2011-08-12 Thread Michael B. Klein
After a whole lot of facet-wrangling, I've come up with a practical
solution that suits my situation, which is to index each triple as a
series of paths. For example, if the shelve process of the
accessionWF workflow is completed, it gets indexed as:

field name=wf_wpsaccessionWF/field
field name=wf_wpsaccessionWF:shelve/field
field name=wf_wpsaccessionWF:shelve:completed/field
field name=wf_wspaccessionWF/field
field name=wf_wspaccessionWF:completed/field
field name=wf_wspaccessionWF:completed:shelve/field
field name=wf_swpcompleted/field
field name=wf_swpcompleted:accessionWF/field
field name=wf_swpcompleted:accessionWF:shelve/field

(I could use PathHierarchyTokenizerFactory to eliminate 2/3 of those
field declarations, but doing it this way keeps me from having to
upgrade my Solr to 3.1 yet.)

That lets solr return a facet structure that looks like this:

lst name=facet_fields
    lst name=wf_wps
        int name=accessionWF554/int
        int name=accessionWF:shelve554/int
        int name=accessionWF:shelve:completed550/int
        int name=accessionWF:shelve:error4/int
    /lst
    lst name=wf_wsp
        int name=accessionWF554/int
        int name=accessionWF:completed554/int
        int name=accessionWF:completed:shelve550/int
        int name=accessionWF:error4/int
        int name=accessionWF:error:shelve4/int
    /lst
    lst name=wf_swp
        int name=completed554/int
        int name=completed:accessionWF554/int
        int name=completed:accessionWF:shelve550/int
        int name=error4/int
        int name=error:accessionWF4/int
        int name=error:accessionWF:shelve4/int
    /lst
/lst

I then use some Ruby post-processing to turn it into:

{
wf_wps: {
accessionWF: [554, {
shelve: [554, {
completed: 550,
error: 4
}],
publish: [554, {
completed: 554
}]
}]
},
wf_swp: {
completed: [554, {
accessionWF: [554, {
shelve: 550,
publish: 554
}]
}],
error: [4, {
accessionWF: [4, {
shelve: 4
}]
}]
},
wf_wsp: {
accessionWF: [554, {
completed: [554, {
shelve: 550,
publish: 554
}],
error: [4, {
shelve: 4
}]
}]
}
}

Eventually I may try to code up something that does the restructuring
on the solr side, but for now, this suits my purposes.

Michael


Re: sorting issue with solr 3.3

2011-08-12 Thread Yonik Seeley
On Fri, Aug 12, 2011 at 1:04 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Fri, Aug 12, 2011 at 9:53 AM, Bernd Fehling
 bernd.fehl...@uni-bielefeld.de wrote:
 It turned out that there is a sorting issue with solr 3.3.
 As fas as I could trace it down currently:

 4 docs in the index and a search for *:*

 sorting on field dccreator_sort in descending order

 http://localhost:8983/solr/select?fsv=truesort=dccreator_sort%20descindent=onversion=2.2q=*%3A*start=0rows=10fl=dccreator_sort

 result is:
 --
 lst name=sort_values
 arr name=dccreator_sort
 strconvertitovistitutonazionaled/str
 str莊國鴻chuangkuohung/str
 strzyywwwxxx/str
 strabdelhadiyasserabdelfattah/str
 /arr
 /lst


 Hmmm, are the docs sorted incorrectly too, or is it the sort_values
 that are incorrect?
 All variants of string sorting should be well tested... see 
 TestSort.testSort()

OK, something is very wrong with that test - I purposely introduced an
error into MissingLastOrdComparator and the test isn't failing.
I'll dig.

-Yonik
http://www.lucidimagination.com


custom velocity tool

2011-08-12 Thread Stéphane Campinas

Hi,

I am working with the velocity response writer, and I want to develop a 
custom velocity tool.

To do so, I have written a JAVA class that looks like that:

   @DefaultKey(mytool)
   public class MyCustomTool {

  public MyCustomTool() {
  }

  public String doit(Object arg) {
// Do something
return something;
  }

   }

Then in order to register my custom tool, I create the property file:

   tools.toolbox = application
   tools.application.mytool = org.my.custom.MyCustomTool

By default, this file is named velocity.properties and is located 
within the conf folder,
according to [1]. This is set by the velocity response writer in the 
v.properties parameter.


After copying the jar file into the lib folder of my solr application, 
I use my custom tool in

my template by calling:

   $mytool.doit($var)

However, this doesn't do anything: no exception, no processing of the 
$var variable. It seems

that the custom tool is not loaded.
Would you know what I am doing wrong ?

Thanks

ps: By the way, I am using the 2.0-beta3 of the velocity tools.

[1] 
http://java.dzone.com/news/quick-look-%E2%80%93-solritas-and-gui?utm_source=feedburnerutm_medium=feedutm_campaign=Feed%253A+javalobby%252Ffrontpage+%28Javalobby+%252F+Java+Zone%29 
http://java.dzone.com/news/quick-look-%E2%80%93-solritas-and-gui?utm_source=feedburnerutm_medium=feedutm_campaign=Feed%253A+javalobby%252Ffrontpage+%28Javalobby+%252F+Java+Zone%29

--
Campinas Stéphane


Re: sorting issue with solr 3.3

2011-08-12 Thread Yonik Seeley
On Fri, Aug 12, 2011 at 2:08 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Fri, Aug 12, 2011 at 1:04 PM, Yonik Seeley
 yo...@lucidimagination.com wrote:
 On Fri, Aug 12, 2011 at 9:53 AM, Bernd Fehling
 bernd.fehl...@uni-bielefeld.de wrote:
 It turned out that there is a sorting issue with solr 3.3.
 As fas as I could trace it down currently:

 4 docs in the index and a search for *:*

 sorting on field dccreator_sort in descending order

 http://localhost:8983/solr/select?fsv=truesort=dccreator_sort%20descindent=onversion=2.2q=*%3A*start=0rows=10fl=dccreator_sort

 result is:
 --
 lst name=sort_values
 arr name=dccreator_sort
 strconvertitovistitutonazionaled/str
 str莊國鴻chuangkuohung/str
 strzyywwwxxx/str
 strabdelhadiyasserabdelfattah/str
 /arr
 /lst


 Hmmm, are the docs sorted incorrectly too, or is it the sort_values
 that are incorrect?
 All variants of string sorting should be well tested... see 
 TestSort.testSort()

 OK, something is very wrong with that test - I purposely introduced an
 error into MissingLastOrdComparator and the test isn't failing.
 I'll dig.

Oops, scratch that.  It was a bug I just introduced into the test in
my local copy to try and reproduce your issue.

-Yonik
http://www.lucidimagination.com


Some questions about SolrJ

2011-08-12 Thread Shawn Heisey
I currently have a build system for my Solr index written in Perl.  I am 
in the process of rewriting it in Java.  I've reached the part of the 
project where I'm using SolrJ, and I have a bunch of questions.  All of 
the SolrJ examples I can find are too simple to answer them.


A note before I launch into the questions.  The wiki page for SolrJ says 
that a static instance of CommonsHttpSolrServer is recommended, but NONE 
of the examples I have been able to find actually use it that way.  I've 
since learned that our webapp is creating a new object for every query.  
I've brought it to the attention of our development team, they'll be 
fixing it.


1) I can't find any examples of using CoreAdmin with SolrJ.  There seems 
to be a general lack of examples of doing anything complicated at all.  
Can anyone point me at comprehensive and detailed examples of using 
SolrJ that do everything in accordance with SolrJ recommendations?


2) When constructing and using HTTP requests that you make yourself, you 
can use a POST request to issue a query.  I use this method in my Perl 
build system to check for the existence of a large quantity of 
documents, and if any of them do exist, I use the same query to delete 
those documents with another POST request.  Can I do the same thing with 
SolrJ, or is it limited to queries using GET requests only?


3) I'll need to access CoreAdmin as well as individual cores for 
updates, queries, etc.  The former uses a /solr/ URL, the latter 
/solr/corename/.  Will I need two CommonsHttpSolrServer instances to do 
this, or is there a way to specify a core through a parameter?


I am sure that I have more questions, but I may be able to answer a lot 
of them myself if I can see better examples.


Thanks,
Shawn



Re: Some questions about SolrJ

2011-08-12 Thread Shawn Heisey

On 8/12/2011 1:49 PM, Shawn Heisey wrote:
I am sure that I have more questions, but I may be able to answer a 
lot of them myself if I can see better examples.


Thought of another question.  My Perl build system uses DIH for all 
indexing, but with the Java rewrite I am planning to do all actions 
other than a full index rebuild using the /update handler.  I have 
autoCommit completely turned off in solrconfig.xml. Do I need to set any 
parameters to ensure that nothing gets committed until I do a 
server.commit() myself?


Thanks,
Shawn



Re: sorting issue with solr 3.3

2011-08-12 Thread Yonik Seeley
I've checked in an improved TestSort that adds deleted docs and
randomizes things a lot more (and fixes the previous reliance on doc
ids not being reordered).
I still can't reproduce this error though.
Is this stock solr?  Can you verify that the documents are in the
wrong order also (and not just the field sort values)?

-Yonik
http://www.lucidimagination.com


dataimporthandler large dataset

2011-08-12 Thread Eric Myers
Recently started looking into solr to solve a problem created before my
time.  We have a dataset consisting of 390,000,000+ records that had a
search written for it using a simple query.  The problem is that the
dataset needs additional indices to keep operating.  The DBA says no go,
too large a dataset. 

I came to the very quick conclusion that they needed a search engine,
preferably one that can return some data. 

My problem lies in the initial index creation.  Using the
DataImportHandler with JDBC to import 390m records will, I am guessing
take far longer than I would like, and use up quite a few resources. 

Is there any way to chunk this data, with the DataImportHandler?  If not
I will just write some code to handle the initial import.

Thanks

-- 
Eric Myers




Re: dataimporthandler large dataset

2011-08-12 Thread Kyle Lee
We have a 200,000,000 record index with 14 fields, and we can re-index
the entire data set in about five hours. One thing to note is that the
DataImportHandler uses one thread per entity by default. If you have a
multcore box, you can drastically speed indexing by specifying a
threadcount of n+1, where n is the number of cores at your disposal.
See the DataImportHandler wiki page for more information.

If that's still too slow, you may wish to consider setting up multiple
Sorl instances on different machines. If you go this route, then each
Solr instance can house a portion of the index rather than the whole
and may build these indexes concurrently. This is called sharding and
has both benefits and drawbacks. The wiki page on Distributed Search
has a more thorough explanation.

You can use whatever scheme you like to partition the data, but one of
the simplest approaches using the DataImportHandler is to simply mod
the record ID by the number of shards you're intending to create.

For example:

SELECT your columns
FROM your table
WHERE primary key % numShards == 0

On Fri, Aug 12, 2011 at 4:32 PM, Eric Myers emy...@nabancard.com wrote:
 Recently started looking into solr to solve a problem created before my
 time.  We have a dataset consisting of 390,000,000+ records that had a
 search written for it using a simple query.  The problem is that the
 dataset needs additional indices to keep operating.  The DBA says no go,
 too large a dataset.

 I came to the very quick conclusion that they needed a search engine,
 preferably one that can return some data.

 My problem lies in the initial index creation.  Using the
 DataImportHandler with JDBC to import 390m records will, I am guessing
 take far longer than I would like, and use up quite a few resources.

 Is there any way to chunk this data, with the DataImportHandler?  If not
 I will just write some code to handle the initial import.

 Thanks

 --
 Eric Myers





Re: dataimporthandler large dataset

2011-08-12 Thread Shawn Heisey

On 8/12/2011 3:32 PM, Eric Myers wrote:

Recently started looking into solr to solve a problem created before my
time.  We have a dataset consisting of 390,000,000+ records that had a
search written for it using a simple query.  The problem is that the
dataset needs additional indices to keep operating.  The DBA says no go,
too large a dataset.

I came to the very quick conclusion that they needed a search engine,
preferably one that can return some data.

My problem lies in the initial index creation.  Using the
DataImportHandler with JDBC to import 390m records will, I am guessing
take far longer than I would like, and use up quite a few resources.

Is there any way to chunk this data, with the DataImportHandler?  If not
I will just write some code to handle the initial import.


Eric,

You can pass variables into the DIH via the request URL, which you can 
then use in your DIH SQL.  For example, minDid=7000 on the URL can be 
accessed as ${dataimporter.request.minDid} in the dih-config.xml file 
(or whatever you called your dih config).  I know this works as far back 
as 1.4.0, but I've never used anything older than that.


Once you have variables passing in via the DIH url, use whatever SQL 
constraints you need on each DIH call to do it in chunks.  You can 
either issue delta-import commands or full-import with clean=false, to 
tell it not to delete the index before starting.


To further speed up both indexing and searching, you should limit the 
amount of data *stored* (contrast with indexed) in Solr to the smallest 
subset that is required to display a search results grid without 
consulting the original datastore.  If someone opens an individual item, 
the original datastore is likely to be fast enough to retrieve full details.


Thanks,
Shawn



sorting distance in solr 1.4.1

2011-08-12 Thread Tri Nguyen
Hi,
 
We are using solr 1.4.1 and we need to sort our results by distance. We have 
lat lons for each document in the response and our reference point.
 
Is it possible?  I read about the spatial plugin but the does range searching:
 
http://blog.jayway.com/2010/10/27/geo-search-with-spatial-solr-plugin/
 
Doesn't talk about sorting the results by distance (as supported by solr 3.1).
 
Tri

Re: custom velocity tool

2011-08-12 Thread Erik Hatcher
Stephane -

Also - I don't think even with v.properties=velocity.properties that it'd be 
picked up from the solr-home/conf directory the way the code is loading it 
using SolrResourceLoader.  The .properties file would need to be in your JAR 
file for your custom tool (or in the classpath somehow seen by the 
SolrResourceLoader).   This is something that hasn't really been tried out (to 
my knowledge) so we might need to tinker a little to get custom tools to get 
injected smoothly the way it ideally should work.

Erik

On Aug 12, 2011, at 14:17 , Stéphane Campinas wrote:

 Hi,
 
 I am working with the velocity response writer, and I want to develop a 
 custom velocity tool.
 To do so, I have written a JAVA class that looks like that:
 
   @DefaultKey(mytool)
   public class MyCustomTool {
 
  public MyCustomTool() {
  }
 
  public String doit(Object arg) {
// Do something
return something;
  }
 
   }
 
 Then in order to register my custom tool, I create the property file:
 
   tools.toolbox = application
   tools.application.mytool = org.my.custom.MyCustomTool
 
 By default, this file is named velocity.properties and is located within 
 the conf folder,
 according to [1]. This is set by the velocity response writer in the 
 v.properties parameter.
 
 After copying the jar file into the lib folder of my solr application, I 
 use my custom tool in
 my template by calling:
 
   $mytool.doit($var)
 
 However, this doesn't do anything: no exception, no processing of the $var 
 variable. It seems
 that the custom tool is not loaded.
 Would you know what I am doing wrong ?
 
 Thanks
 
 ps: By the way, I am using the 2.0-beta3 of the velocity tools.
 
 [1] 
 http://java.dzone.com/news/quick-look-%E2%80%93-solritas-and-gui?utm_source=feedburnerutm_medium=feedutm_campaign=Feed%253A+javalobby%252Ffrontpage+%28Javalobby+%252F+Java+Zone%29
  
 http://java.dzone.com/news/quick-look-%E2%80%93-solritas-and-gui?utm_source=feedburnerutm_medium=feedutm_campaign=Feed%253A+javalobby%252Ffrontpage+%28Javalobby+%252F+Java+Zone%29
 -- 
 Campinas Stéphane



RE: need some guidance about how to configure a specific solr solution.

2011-08-12 Thread Jonathan Rochkind
I don't know anything about LifeRay (never heard of it), but it sounds like 
you've actually figured out what you need to know about LifeRay, all you've got 
left is: how to replicate the writer solr server content into the readers.

This should tell you how: 
http://wiki.apache.org/solr/SolrReplication

You'll need to find and edit the configuration files for the Solr's involved -- 
if you don't normally do that because LifeRay hides em from you, you'll need to 
find em. But it's a straightforward Solr feature (since 1.4), replication. 

From: Roman, Pablo [pablo.ro...@uhn.ca]
Sent: Thursday, August 11, 2011 12:10 PM
To: solr-user@lucene.apache.org
Subject: need some guidance about how to configure a specific solr solution.

Hi There,

I am IT and  work on a project based on Liferary 605 with solr-3.2 like the 
indexer/search engine.

I presently have only one server that is indexing and searching but reading the 
Liferay Support suggestions they point to the need of having:
- 2 to n SOLR read-server for searching from any member of the liferay cluster
- 1 SOLR write-server where all liferay cluster members write.

However, going down to detail to implement that on the liferay side I think I 
know how to do that which is inserting into the plugin for Solr this entries

 solr-spring.xml in the WEB-INF/classes/META-INF folder. Open this file in a 
text editor and you will see that there are two entries which define where the 
Solr server can be found by Liferay:

bean id=indexSearcher 
class=com.liferay.portal.search.solr.SolrIndexSearcherImpl property 
name=serverURL value=http://localhost:8080/solr/select; / /bean bean 
id=indexWriter class=com.liferay.portal.search.solr.SolrIndexWriterImpl 
property name=serverURL value=http://localhost:8080/solr/update; / /bean

However, I don't know how to replicate the writer solr server content into the 
readers. Please can you provide advice about that?

Thanks,
Pablo

This e-mail may contain confidential and/or privileged information for the sole 
use of the intended recipient.
Any review or distribution by anyone other than the person for whom it was 
originally intended is strictly prohibited.
If you have received this e-mail in error, please contact the sender and delete 
all copies.
Opinions, conclusions or other information contained in this e-mail may not be 
that of the organization.