Re: solr search

2009-11-03 Thread Noble Paul നോബിള്‍ नोब्ळ्
start with the examples in the download. That should help

On Wed, Nov 4, 2009 at 11:14 AM, manishkbawne  wrote:
>
> Thank you for your reply. I have corrected this error, but now I am getting
> this error --
>
> HTTP Status 500 - Bad version number in .class file
> java.lang.UnsupportedClassVersionError: Bad version number in .class file at
> java.lang.ClassLoader.defineClass1(Native Method) at
> java.lang.ClassLoader.defineClass
>
> I have checked the java -version and javac -version. Both shows the same
> version 1.5.0_09.
>
> How to remove this error?
>
>
>
> Lance Norskog-2 wrote:
>>
>> The problem is in db-dataconfig.xml. You should start with the example
>> DataImportHandler configuration fles.
>>
>> The structure is wrong. First there is a datasource, then there are
>> 'entities' which fetch a document's fields from the datasource.
>>
>> On Fri, Oct 30, 2009 at 9:03 PM, manishkbawne 
>> wrote:
>>>
>>> Hi,
>>> I have made following changes in solrconfig.xml
>>>
>>>   >> class="org.apache.solr.handler.dataimport.DataImportHandler">
>>>    
>>>        >> name="config">C:/Apache-Tomcat/apache-tomcat-6.0.20/solr/conf/db-data-config.xml
>>>    
>>>  
>>>
>>>
>>> in db-dataconfig.xml
>>> 
>>>        
>>>                >> driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
>>>                url="jdbc:sqlserver://servername:1433/databasename"
>>> user="sa"
>>> password="p...@123"/>
>>>                        
>>>                         
>>>                        
>>>        
>>> 
>>>
>>> in schema.xml files
>>> 
>>>
>>> Please suggest me the possible cause of error??
>>>
>>>
>>>
>>>
>>> Lance Norskog-2 wrote:

 Please post your dataimporthandler configuration file.

 On Fri, Oct 30, 2009 at 4:17 AM, manishkbawne 
 wrote:
>
> Thanks for your reply .. I am trying to use the database for solr
> search
> but
> getting this error..
>
> false in null
> -
> java.lang.NullPointerException at
> org.apache.solr.handler.dataimport.DataImporter.(DataImporter.java:95)
> at
> org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106)
> at org.apache.solr.core.SolrResourceLoader
>
> Can you please suggest me some possible solution?
>
>
>
>
>
>
>
>
> Karsten F. wrote:
>>
>> hi manishkbawne,
>>
>> unspecific ideas of search improvements are her:
>> http://wiki.apache.org/solr/SolrPerformanceFactors
>>
>> I really like the last idea in
>> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
>> :
>> Use a profiler and ask a more specific question in this forum.
>>
>> Best regards
>>   Karsten
>>
>>
>>
>> manishkbawne wrote:
>>>
>>> I am using solr search to search through xml files. As I am working
>>> on
>>> millions of data, the result output is slower. Can anyone please
>>> suggest
>>> me some way, by which I can increase the search result output?
>>>
>>
>>
>
> --
> View this message in context:
> http://old.nabble.com/solr-search-tp26125183p26128341.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



 --
 Lance Norskog
 goks...@gmail.com


>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/solr-search-tp26125183p26139946.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>>
>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>
>>
>
> --
> View this message in context: 
> http://old.nabble.com/solr-search-tp26125183p26191282.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


RE: Highlighting performance between 1.3 and 1.4rc

2009-11-03 Thread Jake Brownell
Thanks Mark, that did bring the time back down. I'll have to investigate a 
little more, and weigh the pros of each to determine which best suits are needs.

Jake

-Original Message-
From: Mark Miller [mailto:markrmil...@gmail.com] 
Sent: Tuesday, November 03, 2009 11:23 PM
To: solr-user@lucene.apache.org
Cc: solr-user@lucene.apache.org
Subject: Re: Highlighting performance between 1.3 and 1.4rc

The 1.4 highlighter is Now slower if you have multi term queries or  
phrase queries. You can get the old behavior (which is faster) if you  
pass usePhraseHighlighter=false - but you will not get correct phrase  
highlighting and multi term queries won't highlight - eg prefix/ 
wildcard/range.

- Mark

http://www.lucidimagination.com (mobile)

On Nov 3, 2009, at 8:18 PM, Jake Brownell  wrote:

> Hi,
>
> The fix MarkM provided yesterday for the problem I reported  
> encountering with the highlighter appears to be working--I installed  
> the Lucene 2.9.1 rc4 artifacts.
>
> Now I'm running into an oddity regarding performance. Our  
> integration test is running slower than it used to. I've placed some  
> average timings below. I'll try to describe what the test does in  
> the hopes that someone will have some insight.
>
> The indexing time represents the time it takes to load and index/ 
> commit ~43 books. The test then does two sets of searches.
>
> A basic search is a dismax search across several fields including  
> the text of the book. It searches either the exact title (in quotes)  
> or the ISBN. Highlighting is enabled on the field that holds the  
> text of the book.
>
> An advanced search uses a nested dismax (inside a normal Lucene), to  
> search for either the exact title (in quotes) or the ISBN. The main  
> difference is that the title is only matched against fields related  
> to titles, not authors, text of the book, etc. Highlighting is  
> enabled against the text of the book.
>
> The indexing time remained fairly constant. I ran with and without  
> highlighting enabled, to see how much it was contributing. I am most  
> interested in the jumps in time between 1.3 and 1.4 for the  
> highlighting time.
>
> with highlighting enabled
> solr 1.3
> Indexing: 40161ms
> Basic: 12407ms
> Advanced: 1106ms
>
>
> solr 1.4 rc
> Indexing: 41734ms
> Basic: 26346ms
> Advanced: 17067ms
>
>
> without any highlighting
> solr 1.3
> Indexing: 41186ms
> Basic: 1024ms
> Advanced: 265ms
>
> solr 1.4 rc
> Indexing: 40981ms
> Basic: 883ms
> Advanced: 356ms
>
> FWIW, the integration test uses an embedded solr server.
>
> I supposed I should also ask if there are any general tips to speed  
> up highlighting?
>
> Thanks,
> Jake


Sending file to Solr via HTTP POST

2009-11-03 Thread Caroline Tan
Hi,
>From the Solr wiki on ExtractingRequestHandler tutorial, when it comes to
the part to post file to Solr, it always uses the curl command, e.g.
curl 'http://localhost:8983/*solr*/update/extract?literal.id=doc1&commit=true'
-F myfi...@tutorial.html

I have never used curl and i was thinking is  there any replacement to such
method?

Is there any API that i can use to achieve the same thing in a java
project without relying on CURL? Does SolrJ have such method? Thanks

~caroLine


Re: solr search

2009-11-03 Thread manishkbawne

Thank you for your reply. I have corrected this error, but now I am getting
this error --

HTTP Status 500 - Bad version number in .class file
java.lang.UnsupportedClassVersionError: Bad version number in .class file at
java.lang.ClassLoader.defineClass1(Native Method) at
java.lang.ClassLoader.defineClass

I have checked the java -version and javac -version. Both shows the same
version 1.5.0_09.

How to remove this error?



Lance Norskog-2 wrote:
> 
> The problem is in db-dataconfig.xml. You should start with the example
> DataImportHandler configuration fles.
> 
> The structure is wrong. First there is a datasource, then there are
> 'entities' which fetch a document's fields from the datasource.
> 
> On Fri, Oct 30, 2009 at 9:03 PM, manishkbawne 
> wrote:
>>
>> Hi,
>> I have made following changes in solrconfig.xml
>>
>>   > class="org.apache.solr.handler.dataimport.DataImportHandler">
>>    
>>        > name="config">C:/Apache-Tomcat/apache-tomcat-6.0.20/solr/conf/db-data-config.xml
>>    
>>  
>>
>>
>> in db-dataconfig.xml
>> 
>>        
>>                > driver="com.microsoft.sqlserver.jdbc.SQLServerDriver"
>>                url="jdbc:sqlserver://servername:1433/databasename"
>> user="sa"
>> password="p...@123"/>
>>                        
>>                         
>>                        
>>        
>> 
>>
>> in schema.xml files
>> 
>>
>> Please suggest me the possible cause of error??
>>
>>
>>
>>
>> Lance Norskog-2 wrote:
>>>
>>> Please post your dataimporthandler configuration file.
>>>
>>> On Fri, Oct 30, 2009 at 4:17 AM, manishkbawne 
>>> wrote:

 Thanks for your reply .. I am trying to use the database for solr
 search
 but
 getting this error..

 false in null
 -
 java.lang.NullPointerException at
 org.apache.solr.handler.dataimport.DataImporter.(DataImporter.java:95)
 at
 org.apache.solr.handler.dataimport.DataImportHandler.inform(DataImportHandler.java:106)
 at org.apache.solr.core.SolrResourceLoader

 Can you please suggest me some possible solution?








 Karsten F. wrote:
>
> hi manishkbawne,
>
> unspecific ideas of search improvements are her:
> http://wiki.apache.org/solr/SolrPerformanceFactors
>
> I really like the last idea in
> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed
> :
> Use a profiler and ask a more specific question in this forum.
>
> Best regards
>   Karsten
>
>
>
> manishkbawne wrote:
>>
>> I am using solr search to search through xml files. As I am working
>> on
>> millions of data, the result output is slower. Can anyone please
>> suggest
>> me some way, by which I can increase the search result output?
>>
>
>

 --
 View this message in context:
 http://old.nabble.com/solr-search-tp26125183p26128341.html
 Sent from the Solr - User mailing list archive at Nabble.com.


>>>
>>>
>>>
>>> --
>>> Lance Norskog
>>> goks...@gmail.com
>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/solr-search-tp26125183p26139946.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> Lance Norskog
> goks...@gmail.com
> 
> 

-- 
View this message in context: 
http://old.nabble.com/solr-search-tp26125183p26191282.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Highlighting performance between 1.3 and 1.4rc

2009-11-03 Thread Mark Miller
The 1.4 highlighter is Now slower if you have multi term queries or  
phrase queries. You can get the old behavior (which is faster) if you  
pass usePhraseHighlighter=false - but you will not get correct phrase  
highlighting and multi term queries won't highlight - eg prefix/ 
wildcard/range.


- Mark

http://www.lucidimagination.com (mobile)

On Nov 3, 2009, at 8:18 PM, Jake Brownell  wrote:


Hi,

The fix MarkM provided yesterday for the problem I reported  
encountering with the highlighter appears to be working--I installed  
the Lucene 2.9.1 rc4 artifacts.


Now I'm running into an oddity regarding performance. Our  
integration test is running slower than it used to. I've placed some  
average timings below. I'll try to describe what the test does in  
the hopes that someone will have some insight.


The indexing time represents the time it takes to load and index/ 
commit ~43 books. The test then does two sets of searches.


A basic search is a dismax search across several fields including  
the text of the book. It searches either the exact title (in quotes)  
or the ISBN. Highlighting is enabled on the field that holds the  
text of the book.


An advanced search uses a nested dismax (inside a normal Lucene), to  
search for either the exact title (in quotes) or the ISBN. The main  
difference is that the title is only matched against fields related  
to titles, not authors, text of the book, etc. Highlighting is  
enabled against the text of the book.


The indexing time remained fairly constant. I ran with and without  
highlighting enabled, to see how much it was contributing. I am most  
interested in the jumps in time between 1.3 and 1.4 for the  
highlighting time.


with highlighting enabled
solr 1.3
Indexing: 40161ms
Basic: 12407ms
Advanced: 1106ms


solr 1.4 rc
Indexing: 41734ms
Basic: 26346ms
Advanced: 17067ms


without any highlighting
solr 1.3
Indexing: 41186ms
Basic: 1024ms
Advanced: 265ms

solr 1.4 rc
Indexing: 40981ms
Basic: 883ms
Advanced: 356ms

FWIW, the integration test uses an embedded solr server.

I supposed I should also ask if there are any general tips to speed  
up highlighting?


Thanks,
Jake


Re: apply a patch on solr

2009-11-03 Thread michael8

Thanks, but my question here was not about the patch command itself (which I
already know), but about simpler way (if any) to go about guaranteeing a
proper patch with the right file revisions needed by the patch.

Michael


Joe Calderon-2 wrote:
> 
> patch -p0 < /path/to/field-collapse-5.patch
> 
> On Tue, Nov 3, 2009 at 7:48 PM, michael8  wrote:
>>
>> Hmmm, perhaps I jumped the gun.  I just looked over the field collapse
>> patch
>> for SOLR-236 and each file listed in the patch has its own revision #.
>>
>> E.g. from field-collapse-5.patch:
>> --- src/java/org/apache/solr/core/SolrConfig.java       (revision 824364)
>> --- src/solrj/org/apache/solr/client/solrj/response/QueryResponse.java
>> (revision 816372)
>> --- src/solrj/org/apache/solr/client/solrj/SolrQuery.java       (revision
>> 823653)
>> --- src/java/org/apache/solr/search/SolrIndexSearcher.java      (revision
>> 794328)
>> --- src/java/org/apache/solr/search/DocSetHitCollector.java     (revision
>> 794328)
>>
>> Unless there is a better way, it seems like I would need to do "svn up
>> --revision ..." for each of the files to be patched and then apply the
>> patch?  This seems error prone and tedious.  Am I missing something
>> simpler
>> here?
>>
>> Michael
>>
>>
>> michael8 wrote:
>>>
>>> Perfect.  This is what I need to know instead of patching 'in the dark'.
>>> Good thing SVN revision cuts across all files like a tag.
>>>
>>> Thanks Mike!
>>>
>>> Michael
>>>
>>>
>>> cambridgemike wrote:

 You can see what revision the patch was written for at the top of the
 patch,
 it will look like this:

 Index: org/apache/solr/handler/MoreLikeThisHandler.java
 ===
 --- org/apache/solr/handler/MoreLikeThisHandler.java (revision 772437)
 +++ org/apache/solr/handler/MoreLikeThisHandler.java (working copy)

 now check out revision 772437 using the --revision switch in svn, patch
 away, and then svn up to make sure everything merges cleanly.  This is
 a
 good guide to follow as well:
 http://www.mail-archive.com/solr-user@lucene.apache.org/msg10189.html

 cheers,
 -mike

 On Mon, Nov 2, 2009 at 3:55 PM, michael8 
 wrote:

>
> Hi,
>
> First I like to pardon my novice question on patching solr (1.4).
>  What
> I
> like to know is, given a patch, like the one for collapse field, how
> would
> one go about knowing what solr source that patch is meant for since
> this
> is
> a source level patch?  Wouldn't the exact versions of a set of java
> files
> to
> be patched critical for the patch to work properly?
>
> So far what I have done is to pull the latest collapse field patch
> down
> from
> http://issues.apache.org/jira/browse/SOLR-236
> (field-collapse-5.patch),
> and
> then svn up the latest trunk from
> http://svn.apache.org/repos/asf/lucene/solr/trunk/, then patch and
> build.
> Intuitively I was thinking I should be doing svn up to a specific
> revision/tag instead of just latest.  So far everything seems fine,
> but
> I
> just want to make sure I'm doing the right thing and not just being
> lucky.
>
> Thanks,
> Michael
> --
> View this message in context:
> http://old.nabble.com/apply-a-patch-on-solr-tp26157827p26157827.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


>>>
>>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/apply-a-patch-on-solr-tp26157827p26190563.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/apply-a-patch-on-solr-tp26157827p26191073.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: apply a patch on solr

2009-11-03 Thread Joe Calderon
sorry got cut off,
patch, then ant clean dist, will give you the modified solr war file,
if it doesnt apply cleanly (which i dont think is currently the case),
you can go back to the latest revision referenced in the patch,


On Tue, Nov 3, 2009 at 8:17 PM, Joe Calderon  wrote:
> patch -p0 < /path/to/field-collapse-5.patch
>
> On Tue, Nov 3, 2009 at 7:48 PM, michael8  wrote:
>>
>> Hmmm, perhaps I jumped the gun.  I just looked over the field collapse patch
>> for SOLR-236 and each file listed in the patch has its own revision #.
>>
>> E.g. from field-collapse-5.patch:
>> --- src/java/org/apache/solr/core/SolrConfig.java       (revision 824364)
>> --- src/solrj/org/apache/solr/client/solrj/response/QueryResponse.java
>> (revision 816372)
>> --- src/solrj/org/apache/solr/client/solrj/SolrQuery.java       (revision 
>> 823653)
>> --- src/java/org/apache/solr/search/SolrIndexSearcher.java      (revision 
>> 794328)
>> --- src/java/org/apache/solr/search/DocSetHitCollector.java     (revision
>> 794328)
>>
>> Unless there is a better way, it seems like I would need to do "svn up
>> --revision ..." for each of the files to be patched and then apply the
>> patch?  This seems error prone and tedious.  Am I missing something simpler
>> here?
>>
>> Michael
>>
>>
>> michael8 wrote:
>>>
>>> Perfect.  This is what I need to know instead of patching 'in the dark'.
>>> Good thing SVN revision cuts across all files like a tag.
>>>
>>> Thanks Mike!
>>>
>>> Michael
>>>
>>>
>>> cambridgemike wrote:

 You can see what revision the patch was written for at the top of the
 patch,
 it will look like this:

 Index: org/apache/solr/handler/MoreLikeThisHandler.java
 ===
 --- org/apache/solr/handler/MoreLikeThisHandler.java (revision 772437)
 +++ org/apache/solr/handler/MoreLikeThisHandler.java (working copy)

 now check out revision 772437 using the --revision switch in svn, patch
 away, and then svn up to make sure everything merges cleanly.  This is a
 good guide to follow as well:
 http://www.mail-archive.com/solr-user@lucene.apache.org/msg10189.html

 cheers,
 -mike

 On Mon, Nov 2, 2009 at 3:55 PM, michael8  wrote:

>
> Hi,
>
> First I like to pardon my novice question on patching solr (1.4).  What
> I
> like to know is, given a patch, like the one for collapse field, how
> would
> one go about knowing what solr source that patch is meant for since this
> is
> a source level patch?  Wouldn't the exact versions of a set of java
> files
> to
> be patched critical for the patch to work properly?
>
> So far what I have done is to pull the latest collapse field patch down
> from
> http://issues.apache.org/jira/browse/SOLR-236 (field-collapse-5.patch),
> and
> then svn up the latest trunk from
> http://svn.apache.org/repos/asf/lucene/solr/trunk/, then patch and
> build.
> Intuitively I was thinking I should be doing svn up to a specific
> revision/tag instead of just latest.  So far everything seems fine, but
> I
> just want to make sure I'm doing the right thing and not just being
> lucky.
>
> Thanks,
> Michael
> --
> View this message in context:
> http://old.nabble.com/apply-a-patch-on-solr-tp26157827p26157827.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


>>>
>>>
>>
>> --
>> View this message in context: 
>> http://old.nabble.com/apply-a-patch-on-solr-tp26157827p26190563.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>


Highlighting performance between 1.3 and 1.4rc

2009-11-03 Thread Jake Brownell
Hi,

The fix MarkM provided yesterday for the problem I reported encountering with 
the highlighter appears to be working--I installed the Lucene 2.9.1 rc4 
artifacts.

Now I'm running into an oddity regarding performance. Our integration test is 
running slower than it used to. I've placed some average timings below. I'll 
try to describe what the test does in the hopes that someone will have some 
insight.

The indexing time represents the time it takes to load and index/commit ~43 
books. The test then does two sets of searches.

A basic search is a dismax search across several fields including the text of 
the book. It searches either the exact title (in quotes) or the ISBN. 
Highlighting is enabled on the field that holds the text of the book.

An advanced search uses a nested dismax (inside a normal Lucene), to search for 
either the exact title (in quotes) or the ISBN. The main difference is that the 
title is only matched against fields related to titles, not authors, text of 
the book, etc. Highlighting is enabled against the text of the book.

The indexing time remained fairly constant. I ran with and without highlighting 
enabled, to see how much it was contributing. I am most interested in the jumps 
in time between 1.3 and 1.4 for the highlighting time.

with highlighting enabled
solr 1.3
Indexing: 40161ms
Basic: 12407ms
Advanced: 1106ms


solr 1.4 rc
Indexing: 41734ms
Basic: 26346ms
Advanced: 17067ms


without any highlighting
solr 1.3
Indexing: 41186ms
Basic: 1024ms
Advanced: 265ms

solr 1.4 rc
Indexing: 40981ms
Basic: 883ms
Advanced: 356ms

FWIW, the integration test uses an embedded solr server.

I supposed I should also ask if there are any general tips to speed up 
highlighting?

Thanks,
Jake


Re: apply a patch on solr

2009-11-03 Thread Joe Calderon
patch -p0 < /path/to/field-collapse-5.patch

On Tue, Nov 3, 2009 at 7:48 PM, michael8  wrote:
>
> Hmmm, perhaps I jumped the gun.  I just looked over the field collapse patch
> for SOLR-236 and each file listed in the patch has its own revision #.
>
> E.g. from field-collapse-5.patch:
> --- src/java/org/apache/solr/core/SolrConfig.java       (revision 824364)
> --- src/solrj/org/apache/solr/client/solrj/response/QueryResponse.java
> (revision 816372)
> --- src/solrj/org/apache/solr/client/solrj/SolrQuery.java       (revision 
> 823653)
> --- src/java/org/apache/solr/search/SolrIndexSearcher.java      (revision 
> 794328)
> --- src/java/org/apache/solr/search/DocSetHitCollector.java     (revision
> 794328)
>
> Unless there is a better way, it seems like I would need to do "svn up
> --revision ..." for each of the files to be patched and then apply the
> patch?  This seems error prone and tedious.  Am I missing something simpler
> here?
>
> Michael
>
>
> michael8 wrote:
>>
>> Perfect.  This is what I need to know instead of patching 'in the dark'.
>> Good thing SVN revision cuts across all files like a tag.
>>
>> Thanks Mike!
>>
>> Michael
>>
>>
>> cambridgemike wrote:
>>>
>>> You can see what revision the patch was written for at the top of the
>>> patch,
>>> it will look like this:
>>>
>>> Index: org/apache/solr/handler/MoreLikeThisHandler.java
>>> ===
>>> --- org/apache/solr/handler/MoreLikeThisHandler.java (revision 772437)
>>> +++ org/apache/solr/handler/MoreLikeThisHandler.java (working copy)
>>>
>>> now check out revision 772437 using the --revision switch in svn, patch
>>> away, and then svn up to make sure everything merges cleanly.  This is a
>>> good guide to follow as well:
>>> http://www.mail-archive.com/solr-user@lucene.apache.org/msg10189.html
>>>
>>> cheers,
>>> -mike
>>>
>>> On Mon, Nov 2, 2009 at 3:55 PM, michael8  wrote:
>>>

 Hi,

 First I like to pardon my novice question on patching solr (1.4).  What
 I
 like to know is, given a patch, like the one for collapse field, how
 would
 one go about knowing what solr source that patch is meant for since this
 is
 a source level patch?  Wouldn't the exact versions of a set of java
 files
 to
 be patched critical for the patch to work properly?

 So far what I have done is to pull the latest collapse field patch down
 from
 http://issues.apache.org/jira/browse/SOLR-236 (field-collapse-5.patch),
 and
 then svn up the latest trunk from
 http://svn.apache.org/repos/asf/lucene/solr/trunk/, then patch and
 build.
 Intuitively I was thinking I should be doing svn up to a specific
 revision/tag instead of just latest.  So far everything seems fine, but
 I
 just want to make sure I'm doing the right thing and not just being
 lucky.

 Thanks,
 Michael
 --
 View this message in context:
 http://old.nabble.com/apply-a-patch-on-solr-tp26157827p26157827.html
 Sent from the Solr - User mailing list archive at Nabble.com.


>>>
>>>
>>
>>
>
> --
> View this message in context: 
> http://old.nabble.com/apply-a-patch-on-solr-tp26157827p26190563.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: apply a patch on solr

2009-11-03 Thread michael8

Hmmm, perhaps I jumped the gun.  I just looked over the field collapse patch
for SOLR-236 and each file listed in the patch has its own revision #.  

E.g. from field-collapse-5.patch:
--- src/java/org/apache/solr/core/SolrConfig.java   (revision 824364)
--- src/solrj/org/apache/solr/client/solrj/response/QueryResponse.java
(revision 816372)
--- src/solrj/org/apache/solr/client/solrj/SolrQuery.java   (revision 
823653)
--- src/java/org/apache/solr/search/SolrIndexSearcher.java  (revision 
794328)
--- src/java/org/apache/solr/search/DocSetHitCollector.java (revision
794328)

Unless there is a better way, it seems like I would need to do "svn up
--revision ..." for each of the files to be patched and then apply the
patch?  This seems error prone and tedious.  Am I missing something simpler
here?

Michael


michael8 wrote:
> 
> Perfect.  This is what I need to know instead of patching 'in the dark'. 
> Good thing SVN revision cuts across all files like a tag.
> 
> Thanks Mike!
> 
> Michael
> 
> 
> cambridgemike wrote:
>> 
>> You can see what revision the patch was written for at the top of the
>> patch,
>> it will look like this:
>> 
>> Index: org/apache/solr/handler/MoreLikeThisHandler.java
>> ===
>> --- org/apache/solr/handler/MoreLikeThisHandler.java (revision 772437)
>> +++ org/apache/solr/handler/MoreLikeThisHandler.java (working copy)
>> 
>> now check out revision 772437 using the --revision switch in svn, patch
>> away, and then svn up to make sure everything merges cleanly.  This is a
>> good guide to follow as well:
>> http://www.mail-archive.com/solr-user@lucene.apache.org/msg10189.html
>> 
>> cheers,
>> -mike
>> 
>> On Mon, Nov 2, 2009 at 3:55 PM, michael8  wrote:
>> 
>>>
>>> Hi,
>>>
>>> First I like to pardon my novice question on patching solr (1.4).  What
>>> I
>>> like to know is, given a patch, like the one for collapse field, how
>>> would
>>> one go about knowing what solr source that patch is meant for since this
>>> is
>>> a source level patch?  Wouldn't the exact versions of a set of java
>>> files
>>> to
>>> be patched critical for the patch to work properly?
>>>
>>> So far what I have done is to pull the latest collapse field patch down
>>> from
>>> http://issues.apache.org/jira/browse/SOLR-236 (field-collapse-5.patch),
>>> and
>>> then svn up the latest trunk from
>>> http://svn.apache.org/repos/asf/lucene/solr/trunk/, then patch and
>>> build.
>>> Intuitively I was thinking I should be doing svn up to a specific
>>> revision/tag instead of just latest.  So far everything seems fine, but
>>> I
>>> just want to make sure I'm doing the right thing and not just being
>>> lucky.
>>>
>>> Thanks,
>>> Michael
>>> --
>>> View this message in context:
>>> http://old.nabble.com/apply-a-patch-on-solr-tp26157827p26157827.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>>
>> 
>> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/apply-a-patch-on-solr-tp26157827p26190563.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: apply a patch on solr

2009-11-03 Thread michael8

Perfect.  This is what I need to know instead of patching 'in the dark'. 
Good thing SVN revision cuts across all files like a tag.

Thanks Mike!

Michael


cambridgemike wrote:
> 
> You can see what revision the patch was written for at the top of the
> patch,
> it will look like this:
> 
> Index: org/apache/solr/handler/MoreLikeThisHandler.java
> ===
> --- org/apache/solr/handler/MoreLikeThisHandler.java (revision 772437)
> +++ org/apache/solr/handler/MoreLikeThisHandler.java (working copy)
> 
> now check out revision 772437 using the --revision switch in svn, patch
> away, and then svn up to make sure everything merges cleanly.  This is a
> good guide to follow as well:
> http://www.mail-archive.com/solr-user@lucene.apache.org/msg10189.html
> 
> cheers,
> -mike
> 
> On Mon, Nov 2, 2009 at 3:55 PM, michael8  wrote:
> 
>>
>> Hi,
>>
>> First I like to pardon my novice question on patching solr (1.4).  What I
>> like to know is, given a patch, like the one for collapse field, how
>> would
>> one go about knowing what solr source that patch is meant for since this
>> is
>> a source level patch?  Wouldn't the exact versions of a set of java files
>> to
>> be patched critical for the patch to work properly?
>>
>> So far what I have done is to pull the latest collapse field patch down
>> from
>> http://issues.apache.org/jira/browse/SOLR-236 (field-collapse-5.patch),
>> and
>> then svn up the latest trunk from
>> http://svn.apache.org/repos/asf/lucene/solr/trunk/, then patch and build.
>> Intuitively I was thinking I should be doing svn up to a specific
>> revision/tag instead of just latest.  So far everything seems fine, but I
>> just want to make sure I'm doing the right thing and not just being
>> lucky.
>>
>> Thanks,
>> Michael
>> --
>> View this message in context:
>> http://old.nabble.com/apply-a-patch-on-solr-tp26157827p26157827.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/apply-a-patch-on-solr-tp26157827p26189573.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: question about collapse.type = adjacent

2009-11-03 Thread michael8

Hi Martijn,

This clarifies it all for me.  Thanks a lot!

Michael


Martijn v Groningen wrote:
> 
> Hi Micheal,
> 
> Field collapsing is basicly done in two steps. The first step is to
> get the uncollapsed sorted (whether it is score or a field value)
> documents and the second step is to apply the collapse algorithm on
> the uncollapsed documents. So yes, when specifying
> collapse.type=adjacent the documents can get collapsed after the sort
> has been applied, but this also the case when not specifying
> collapse.type=adjacent
> I hope this answers your question.
> 
> Cheers,
> 
> Martijn
> 
> 2009/11/2 michael8 :
>>
>> Hi,
>>
>> I would like to confirm if 'adjacent' in collapse.type means the
>> documents
>> (with the same collapse field value) are considered adjacent *after* the
>> 'sort' param from the query has been applied, or *before*?  I would think
>> it
>> would be *after* since collapse feature primarily is meant for
>> presentation
>> use.
>>
>> Thanks,
>> Michael
>> --
>> View this message in context:
>> http://old.nabble.com/question-about-collapse.type-%3D-adjacent-tp26157114p26157114.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 
> 
> -- 
> Met vriendelijke groet,
> 
> Martijn van Groningen
> 
> 

-- 
View this message in context: 
http://old.nabble.com/question-about-collapse.type-%3D-adjacent-tp26157114p26189401.html
Sent from the Solr - User mailing list archive at Nabble.com.



RE: Getting update/extract RequestHandler to work under Tomcat

2009-11-03 Thread Glock, Thomas
Follow-up - 

This is now working (sadly I'm not sure exactly why!) but I've
successfully used curl (under windows) and the following examples to
parse content

curl
http://localhost:8080/apache-solr-1.4-dev/update/extract?extractOnly=tru
e --data-binary @curl-config.pdf  -H "Content-type:application/pdf"
curl
http://localhost:8080/apache-solr-1.4-dev/update/extract?extractOnly=tru
e --data-binary @curl-config.html  -H "Content-type:text/html"
curl
http://localhost:8080/apache-solr-1.4-dev/update/extract?extractOnly=tru
e --data-binary @c:/EnterpriseSearchSummit.ppt  -H
"Content-type:application/vnd.ms-powerpoint" 

The solr-cell jar is being loaded as well as other jars from the contrib
and dist directories see list below. 

Regarding files being located in the webapps structure - I did that
because I wanted to try and keep 1.3 running under the same instance of
tomcat as 1.4 and thought there might be difficulties specificing Solr
Home via the tomcat java configuration.  I've since removed the 1.3
instance. 

(I removed the replaceClassLoader lines for readability)

.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/asm-
3.1.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/bcma
il-jdk14-136.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/bcpr
ov-jdk14-136.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/comm
ons-codec-1.3.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/comm
ons-compress-1.0.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/comm
ons-io-1.4.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/comm
ons-lang-2.1.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/comm
ons-logging-1.1.1.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/dom4
j-1.6.1.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/font
box-0.1.0.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/gero
nimo-stax-api_1.0_spec-1.0.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/icu4
j-3.8.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/jemp
box-0.2.0.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/log4
j-1.2.14.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/neko
html-1.9.9.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/ooxm
l-schemas-1.0.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/pdfb
ox-0.7.3.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/poi-
3.5-beta6.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/poi-
ooxml-3.5-beta6.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/poi-
scratchpad-3.5-beta6.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/tika
-core-0.4.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/tika
-parsers-0.4.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/xerc
esImpl-2.8.1.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/xml-
apis-1.0.b2.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/extraction/lib/xmlb
eans-2.3.0.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/dist/apache-solr-cell-1.4-d
ev.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/dist/apache-solr-clustering
-1.4-dev.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/clustering/lib/carr
ot2-mini-3.1.0.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/clustering/lib/comm
ons-lang-2.4.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/clustering/lib/ehca
che-1.6.2.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/clustering/lib/goog
le-collections-1.0-rc2.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/clustering/lib/jack
son-core-asl-0.9.9-6.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/clustering/lib/jack
son-mapper-asl-0.9.9-6.jar' to classloader
.../Tomcat%206.0/webapps/apache-solr-1.4-dev/contrib/clustering/lib/log4
j-1.2.14.jar' to classloader
Nov 3, 2009 3:05:03 PM org.apache.solr.core.SolrConfig 
INFO: Loaded SolrConfig: solrconfig.xml
Nov 3, 2009 3:05:03 PM org.apache.solr.core.SolrCore 
INFO: Opening new SolrCore at .\webapps\apache-solr-1.4-dev\solr\,
dataDir=.\webapps\apache-solr-1.4-dev\solr\data\
-Original Message-
From: Chris Hostette

Re: TermsComponent results don't change after documents removed from index

2009-11-03 Thread Bill Au
Thanks for pointing that out.  The TermsComponent prefix query is running
much faster than the facet prefix query.  I guess there is yet another
reason to optimize the index.

Bill

On Tue, Nov 3, 2009 at 5:09 PM, Koji Sekiguchi  wrote:

> Bill Au wrote:
>
>> Should the results of the TermsComponent change after documents have been
>> removed from the index?  I am thinking about using the prefix of
>> TermsComponent to implement auto-suggest.  But I noticed that the prefix
>> counts in TermsComponent don't change after documents have been deleted.
>> The deletes are done with the standard update handler using a
>> delete-by-query.  Since the TermsComponent is showing the number of
>> documents matching the terms, the number should be decreasing when
>> documents
>> are deleted.
>>
>> I can reproduce this using the sample in the tutorial and the
>> TermsComponent
>> prefix query in the Wiki:
>> http://wiki.apache.org/solr/TermsComponent
>>
>> The output of the TermsComponent prefix doesn't change even after I
>> removed
>> all the documents:
>>
>> java -Ddata=args -jar post.jar "id:*"
>>
>> What am I doing wrong?
>>
>> Bill
>>
>>
>>
> This is a feature of Lucene... docFreq is not changed until segments
> containing
> deletions are merged. You can do optimize to correct docFreq.
>
> Koji
>
> --
> http://www.rondhuit.com/en/
>
>


Re: adding and updating a lot of document to Solr, metadata extraction etc

2009-11-03 Thread Lance Norskog
The DIH has improved a great deal from Solr 1.3 to 1.4. You will be
much better off using the DIH from this.

This is the current Solr release candidate binary:
http://people.apache.org/~gsingers/solr/1.4.0/

On Tue, Nov 3, 2009 at 8:08 AM, Eugene Dzhurinsky  wrote:
> On Mon, Nov 02, 2009 at 05:45:37PM -0800, Lance Norskog wrote:
>> About large XML files and http overhead: you can tell solr to load the
>> file directly from a file system. This will stream thousands of
>> documents in one XML file without loading everything in memory at
>> once.
>>
>> This is a new book on Solr. It will help you through this early learning 
>> phase.
>>
>> http://www.packtpub.com/solr-1-4-enterprise-search-server
>
> Thank you, but we have to prepare some proof of concept with the stable
> version. I didn't see any 1.4.0 artifacts released to repo1.maven.org for now.
>
> Additionally, I've learned about http://wiki.apache.org/solr/DataImportHandler
> and looks like this way is preferred in my case.
>
> I do have a lot of HTML pages on disk storage, and some metadata being stored
> in SQL tables. What I seem to need is to provide some sort of EntityProcessor
> and DataSource to DataImportHandler. Additionally I will need to provide some
> sort of properties to instruct data source for data retrieval (table names
> etc).
>
> So may be there is some tutorial or how-to, describing the process of creation
> of custom classes for importing the data into Solr 1.3.0?
>
> Thank you in advance!
>
> --
> Eugene N Dzhurinsky
>



-- 
Lance Norskog
goks...@gmail.com


Re: can't find solr.xml

2009-11-03 Thread Chris Hostetter

: I have downloaded apache-solr-1.3.0.tgz for Linux and don't see solr.xml. can
: someone assist.

example/multicore/solr.xml is an example solr.xml file demonstrating the 
Multi-Core support in solr 1.3.  as mentioned on the wiki, you can run the 
multicore example using this command...

java -Dsolr.solr.home=multicore -jar start.jar

http://wiki.apache.org/solr/CoreAdmin

-Hoss



Re: Programmatically configuring SLF4J for Solr 1.4?

2009-11-03 Thread Chris Hostetter
: it sorted.  I'm using a DirectSolrConnection embedded in a JRuby
: application, and everything works great, except I can't seem to get it to do
: anything except log to the console.  I've tried pointing
: 'java.util.logging.config.file' to a properties file, as well as specifying

how have you tried pointing it at a properties file?

it's a system property that needs to be set before tany code attempts to 
load any of the logging related classes, so in JRuby you may have to do 
soemthing special to specify it before JRUby loads the JVM (normally it 
can only be set on the command line when executing java, 
System.setProperty doesn't work (in any JVM)

worst case scenerio, you can modify the default logging.properties file 
that is in the JVM JRuby runs.

: What I'd like to do is programmatically direct the Solr logs to a logfile,
: so that I can have my app start up, parse its config, and throw the Solr
: logs where they need to go based on that.
: 
: So, I don't suppose anybody has a code snippet (in Java) that sets up SLF4J
: for Solr logging (and that doesn't reference an external properties file)?

SLF4J is just a proxy API, it doesn't know anything baout log files or log 
levels or log filtering ... you can programaticly change the JUL Logging 
using the LogManager API...

http://java.sun.com/j2se/1.4.2/docs/api/java/util/logging/LogManager.html
http://java.sun.com/j2se/1.4.2/docs/api/java/util/logging/LogManager.html#getLogger(java.lang.String)
http://java.sun.com/j2se/1.4.2/docs/api/java/util/logging/Logger.html#addHandler(java.util.logging.Handler)
http://java.sun.com/j2se/1.4.2/docs/api/java/util/logging/FileHandler.html



-Hoss



Re: CPU utilization and query time high on Solr slave when snapshot install

2009-11-03 Thread Mark Miller
Well thats what I get for typing on my iPhone when I'm not sure of my
memory - Lance called me out on this - I put "Hmm...I think" because I
wasn't positive if I remembered right, but I had thought auto warming
just populates based on the old entries (probably because of the javadoc
for the CacheRegenerator) and that it was ignored for the query cache -
whoops - ignore my below comment - the query cache does regenerate by
reissuing the search.

Mark Miller wrote:
> Hmm...I think you have to setup warming queries yourself and that
> autowarm just copies entries from the old cache to the new cache,
> rather than issuing queries - the value is how many entries it will
> copy. Though that's still going to take CPU and time.
>
> - Mark
>
> http://www.lucidimagination.com (mobile)
>
> On Nov 2, 2009, at 12:47 PM, Walter Underwood 
> wrote:
>
>> If you are going to pull a new index every 10 minutes, try turning
>> off cache autowarming.
>>
>> Your caches are never more than 10 minutes old, so spending a minute
>> warming each new cache is a waste of CPU. Autowarm submits queries to
>> the new Searcher before putting it in service. This will create a
>> burst of query load on the new Searcher, often keeping one CPU pretty
>> busy for several seconds.
>>
>> In solrconfig.xml, set autowarmCount to 0.
>>
>> Also, if you want the slaves to always have an optimized index,
>> create the snapshot only in post-optimize. If you create snapshots in
>> both post-commit and post-optimize, you are creating a non-optimized
>> index (post-commit), then replacing it with an optimized one a few
>> minutes later. A slave might get a non-optimized index one time, then
>> an optimized one the next.
>>
>> wunder
>>
>> On Nov 2, 2009, at 1:45 AM, biku...@sapient.com wrote:
>>
>>> Hi Solr Gurus,
>>>
>>> We have solr in 1 master, 2 slave configuration. Snapshot is created
>>> post commit, post optimization. We have autocommit after 50
>>> documents or 5 minutes. Snapshot puller runs as a cron every 10
>>> minutes. What we have observed is that whenever snapshot is
>>> installed on the slave, we see solrj client used to query slave
>>> solr, gets timedout and there is high CPU usage/load avg. on slave
>>> server. If we stop snapshot puller, then slaves work with no issues.
>>> The system has been running since 2 months and this issue has
>>> started to occur only now  when load on website is increasing.
>>>
>>> Following are some details:
>>>
>>> Solr Details:
>>> apache-solr Version: 1.3.0
>>> Lucene - 2.4-dev
>>>
>>> Master/Slave configurations:
>>>
>>> Master:
>>> - for indexing data HTTPRequests are made on Solr server.
>>> - autocommit feature is enabled for 50 docs and 5 minutes
>>> - caching params are disable for this server
>>> - mergeFactor of 10 is set
>>> - we were running optimize script after every 2 hours, but now have
>>> reduced the duration to twice a day but issue still persists
>>>
>>> Slave1/Slave2:
>>> - standard requestHandler is being used
>>> - default values of caching are set
>>> Machine Specifications:
>>>
>>> Master:
>>> - 4GB RAM
>>> - 1GB JVM Heap memory is allocated to Solr
>>>
>>> Slave1/Slave2:
>>> - 4GB RAM
>>> - 2GB JVM Heap memory is allocated to Solr
>>>
>>> Master and Slave1 (solr1)are on single box and Slave2(solr2) on
>>> different box. We use HAProxy to load balance query requests between
>>> 2 slaves. Master is only used for indexing.
>>> Please let us know if somebody has ever faced similar kind of issue
>>> or has some insight into it as we guys are literally struck at the
>>> moment with a very unstable production environment.
>>>
>>> As a workaround, we have started running optimize on master every 7
>>> minutes. This seems to have reduced the severity of the problem but
>>> still issue occurs every 2days now. please suggest what could be the
>>> root cause of this.
>>>
>>> Thanks,
>>> Bipul
>>>
>>>
>>>
>>>
>>


-- 
- Mark

http://www.lucidimagination.com





lucid kstem group and artifact id to put in POM

2009-11-03 Thread darniz

Hello
Right now we are using lucid Kstemmer and it works fine and the two jars
required "lucid-kstem.jar" and "lucid-solr-kstem.jar" are present in our web
app. i am trying to get hold of groupid and artifact so that i can plug it
into maven to download this two files from POM.
i searched the maven repo at http://repo2.maven.org/maven2 and can find the
Kstem definition

Any advice

darniz

-- 
View this message in context: 
http://old.nabble.com/lucid-kstem-group-and-artifact-id-to-put-in-POM-tp26163608p26163608.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Plugin Performance Issues

2009-11-03 Thread Chris Hostetter

: 
:   
: 
: 
: 
: 
:   
: 
...
: only do indexing on the master server.  However, with this schema in place
: on the slaves, as well as our custom.jar in the solrHome/lib directory, we
: run into these issues where the memory usage grows and grows without
: explanation.

...even if you only o indexing on the master, having a single analyzer 
defined for a field means it's used at both index and query time (even 
though you say 'type="index"') so a memory leak in either of your custom 
factories could cause a problem on a query box.

This however concerns me...

: fact, in a previous try, we had simply dropped one of our custom plugin jars
: into the lib directory but forgot to deploy the new solrconfig or schema
: files that referenced the classes in there, and the issue still occurred.

...this i can't think of a rational explanation for.  Can you elaborate on 
what you can do to create this problem .. ie: does the memory usage grow 
even when solr doesn't get any requests? or do it happen when searches are 
executed? or when commits happen? etc...

If the problem is as easy to reproduce as you describe, can you please 
generate some heap dumps against a server that isn't processing any 
queries -- one from when hte server first starts up, and one from when hte 
server crashes from an OOM (there's a JVM option for generating heap dumps 
on OOM that i can't think of off hte top of my head)



-Hoss



Re: Spell check suggestion and correct way of implementation and some Questions

2009-11-03 Thread darniz

Thanks

i included the  buildOncommit and buildOnOptimize as true and indexed some
documents and it automatically builds the dictionary.

Are there any performance issues we should be aware of, with this approach.

Rashid
-- 
View this message in context: 
http://old.nabble.com/Spell-check-suggestion-and-correct-way-of-implementation-and-some-Questions-tp26096664p26162724.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: TermsComponent results don't change after documents removed from index

2009-11-03 Thread Koji Sekiguchi

Bill Au wrote:

Should the results of the TermsComponent change after documents have been
removed from the index?  I am thinking about using the prefix of
TermsComponent to implement auto-suggest.  But I noticed that the prefix
counts in TermsComponent don't change after documents have been deleted.
The deletes are done with the standard update handler using a
delete-by-query.  Since the TermsComponent is showing the number of
documents matching the terms, the number should be decreasing when documents
are deleted.

I can reproduce this using the sample in the tutorial and the TermsComponent
prefix query in the Wiki:
http://wiki.apache.org/solr/TermsComponent

The output of the TermsComponent prefix doesn't change even after I removed
all the documents:

java -Ddata=args -jar post.jar "id:*"

What am I doing wrong?

Bill

  
This is a feature of Lucene... docFreq is not changed until segments 
containing

deletions are merged. You can do optimize to correct docFreq.

Koji

--
http://www.rondhuit.com/en/



Re: MoreLikeThis support Dismax parameters

2009-11-03 Thread Nick Spacek
>
> As i said: that may be what you're looking for (it's hard to tell based on
> your email) but the other possibility is that you want to be able to
> specify bq (and maybe bf) type parrams to influence the MLT portion of the
> request (ie: apply a bias so docs matching a particular query/func are
> mosre likely to be suggested) ... this is an area that hasn't really been
> very well explored as far as i can remember.
>

Right, so I have a field with many terms in it and I want to find similar
documents using this against a number of other fields. In my situation, I
want to take the description field and look in description, city, and
province. I want the city and province fields to be "more important". I have
applied a boost to them, but even though they have higher values they are
not considered by Solr to be as "interesting", I think because they do not
occur as frequently. What ends up happening is that all of the matching
terms in the description field end up pushing the matching terms from city
and province to the bottom of the "interesting" list.

I think that's what you were saying in the second paragraph, right? There
currently doesn't seem to be a way to influence the ordering of the
interesting terms.

Thanks,
Nick


Re: Re : Solr - Plugin : QParserPlugin is not working..

2009-11-03 Thread Chris Hostetter

: plugin.query.parser.QueryParserPluginOne" in logs. I am sure that the
: request handler with which this query parser plugin is linked is
: working,Because I could find the results of System.out.println()s
: (those included in requesthandler) in log, but not query parser
: plugin's System.outs or other effects.

You're going to have to give us more info besudes just a solrconfig.xml 
snippet to help you...

1) what does hte code for your custom plugin(s) look like?
2) exactly what URL are you hitting?
3) what does solr log when you hit that url?
4) what does your plugin(s) log when you hit that url?



-Hoss


Re: Opaque replication failures

2009-11-03 Thread Michael
OK, I've solved this.  For posterity:

The master doesn't make anything available for replication unless you
set replicateAfter="startup", or unless you set
replicateAfter="commit" and then both add a document and execute a
commit.  If you don't do one of those, even manually clicking
"Replicate Now" on the slave will show failures without explaining
why.

With replicateAfter="startup" and "commit" I was able to get a slave
core in the same Solr instance to replicate upon startup and upon
add-doc-and-commit.

Michael

On Tue, Nov 3, 2009 at 11:53 AM, Michael  wrote:
> I just tried setting up replication between two cores with the Nov 2
> nightly, and got the same result as below: replication reports as
> failed but doesn't tell me why.
>
> Is replication not allowed from a master core to a slave core within
> the same Solr instance?  Or is there a way for me to find out if there
> is something wrong with my index (which otherwise appears OK)?
>
> Thanks,
> Michael
>
> On Wed, Oct 14, 2009 at 1:33 PM, Michael  wrote:
>> Hi,
>>
>> I have a multicore Solr 1.4 setup.  core_master is a 3.7G master for
>> replication, and core_slave is a 500 byte slave pointing to the
>> master.  I'm using the example replication configuration from
>> solrconfig.xml, with ${enable.master} and ${enable.slave} properties
>> so that the master and slave can use the same solrconfig.xml.
>>
>> When I attempt to replicate (every 60 seconds or by pressing the
>> button on the slave replication admin page), it doesn't work.
>> Unfortunately, neither the admin page nor the REST API "details"
>> command show anything useful, and the logs show no errors.
>>
>> How can I get insight into what is causing the failure?  I assume it's
>> some configuration problem but don't know where to start.
>>
>> Thanks in advance for any help!  Config files are below.
>> Michael
>>
>>
>>
>> Here is my solr.xml:
>>
>> > persistent="true">
>> 
>>  
>>    
>>  
>>  
>>    
>>  
>> 
>> 
>>
>> And here's the relevant chunk of my solrconfig.xml:
>>
>> 
>>    
>>        ${enable.master:false}
>>        commit
>>    
>>    
>>        ${enable.slave:false}
>>        > name="masterUrl">http://localhost:31000/solr/core_master/replication
>>        00:00:60
>>     
>> 
>>
>> Here's what the "details" command on the slave has to say -- nothing
>> explanatory that I can see.  Is the "isReplicating=false" worrying?
>>
>> 
>>
>>  589 bytes
>>  /home/search/solr/data/1/index
>>  
>>  false
>>  true
>>  1254772638413
>>  2
>>
>>  
>>    
>>      3.75 GB
>>      /home/search/solr/data/5/index
>>      
>>      true
>>      false
>>      1254772639291
>>      156
>>    
>>    > name="masterUrl">http://localhost:31000/solr/core_master/replication
>>    00:00:60
>>    Wed Oct 14 14:25:22 EDT 2009
>>
>>    
>>      Wed Oct 14 14:25:22 EDT 2009
>>      Wed Oct 14 14:25:22 EDT 2009
>>      Wed Oct 14 14:25:21 EDT 2009
>>      Wed Oct 14 14:24:27 EDT 2009
>>      (etc)
>>    
>>    
>>      Wed Oct 14 14:25:22 EDT 2009
>>      Wed Oct 14 14:25:22 EDT 2009
>>      Wed Oct 14 14:25:21 EDT 2009
>>      Wed Oct 14 14:24:27 EDT 2009
>>      (etc)
>>    
>>
>>    1481
>>    0
>>    1481
>>    Wed Oct 14 14:25:22 EDT 2009
>>    0
>>    false
>>  
>>
>> 
>>
>


Re: Highlighting is very slow

2009-11-03 Thread Jaco
Hi,

We had a similar case once (although not with those really long response
times). Fixed by moving to JRE 1.6 and tuning garbage collection.

Bye,

Jaco.

2009/11/3 Andrew Clegg 

>
> Hi everyone,
>
> I'm experimenting with highlighting for the first time, and it seems
> shockingly slow for some queries.
>
> For example, this query:
>
>
> http://server:8080/solr/select/?q=transferase&qt=dismax&version=2.2&start=0&rows=10&indent=on
>
> takes 313ms. But when I add highlighting:
>
>
> http://server:8080/solr/select/?q=transferase&qt=dismax&version=2.2&start=0&rows=10&indent=on&hl=true&hl.fl=*&fl=id
>
> it takes 305212ms = 5mins!
>
> Some of my documents are slightly large -- the 10 hits for that query
> contain between 362 bytes and 1.4 megabytes of text each. All fields are
> stored and indexed, and most are termvectored. But this doesn't seem
> excessively large!
>
> Has anyone else seen this sort of behaviour before? This is with a nightly
> from 2009-10-26.
>
> All suggestions would be appreciated. My schema and config files are
> attached...
>
> http://old.nabble.com/file/p26160217/schema.xml schema.xml
> http://old.nabble.com/file/p26160217/solrconfig.xml solrconfig.xml
>
> Thanks (once again),
>
> Andrew.
>
> --
> View this message in context:
> http://old.nabble.com/Highlighting-is-very-slow-tp26160217p26160217.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: Lock problems: Lock obtain timed out

2009-11-03 Thread Chris Hostetter

: 02-Nov-2009 10:35:27 org.apache.solr.update.SolrIndexWriter finalize
: SEVERE: SolrIndexWriter was not closed prior to finalize(), indicates
: a bug -- POSSIBLE RESOURCE LEAK!!!

can you post some context showing what the logs look like just before 
these errors?

I'm not sure what might be causing lock collision but your guess about
commit's taking too long and overlapping is a good one -- what do the log
messages about the commits say arround the time these errors start? the
commit logs when it finishes and how long it takes so it's easy to spot.

increasing your writeLockTimeout is probably a good idea, but i'm still 
confused as to why the whole server would lock up until you delete the 
index and restart, at worst i would expect the update/commit attempts that 
time out getting the lock to complain loudly, but then the "slow" one 
would eventually finish and subsequent attempts would work ok.

...very odd.

-Hoss



Re: TermVectorComponent : Required / Optional Parameters

2009-11-03 Thread Grant Ingersoll


On Nov 3, 2009, at 7:59 AM, Chantal Ackermann wrote:


Hi Grant,

I'd be glad to help update the wiki.
But just to make sure I'm understanding correctly:
for TermVectorComponent to work correctly, all these three  
attributes (termVectors, termPositions, termOffsets) must be set to  
"true"?


No, you only need termVectors=true to get back results.  You need  
offsets and positions stored to get them back when requested.






No matter how extensive the termVector request will look like?

Because I tried this request also and it doesn't return the  
termVector part, neither:
.../solr/epg/select?q=* 
%3A 
*&version=2.2&start=0&rows=0&indent=on&tv=true&tv.tf=true&tv.df=true


I notice in your request, you were asking for rows=0.  TVC works off  
the rows returned by the search.




(I tried removing and adding the last two parameters, as well.)

Is there anything else I have to be aware of? How about
indexed
stored
multiValued
omitNorms

Is there a combination that does not work?

Once I get it to work, I'll update the wiki. But I don't want to  
publish my ignorance. ;-)


Thanks,
Chantal


Grant Ingersoll schrieb:

On Nov 3, 2009, at 6:37 AM, Chantal Ackermann wrote:

Hi all!

Are these attributes required to make TermVectorComponent requests
work?
termPositions="true" termOffsets="true"

I have quite a lot of fields with termVectors="true" (for
facetting), but I don't get any results when requesting:
.../solr/epg/select?q=*%3A*&version=2.2&start=0&rows=1
&indent 
=on&tv=true&tv.tf=true&tv.df=true&tv.positions&tv.offsets=true

Indeed, you must have stored positions and offsets for the TVC to
return positions and offsets.

(I don't have a special RequestHandler configured for it. Using
1.4.0RC)

Would it be possible to add the use case "TermVectorComponent" to
that Wiki page?

Yep, please do.  Anyone can edit the wiki, you just need an account.

http://wiki.apache.org/solr/FieldOptionsByUseCase

(And also add that info to the TermVectorComponent wiki page.)

Thanks!
Chantal


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: TermVectorComponent : Required / Optional Parameters

2009-11-03 Thread Chris Hostetter

: Indeed, you must have stored positions and offsets for the TVC to return
: positions and offsets.

we should probably make the TermVectorCOmponent more resilient and 
actaully assert these things about the field (using the schema metadata) 
... it can add warning/error info to the output if it's asked for 
soemthing it can't generate.




-Hoss



Customizing Field Score (Multivalued Field)

2009-11-03 Thread Stephen Duncan Jr
We've had a customized score calculator for one of our fields that we
developed when we were using Lucene instead of Solr (lucene 2.4).  During
our switch to Solr, we simply continued to use that code.  However, as the
version of Lucene used in Solr changed to 2.9, somewhere along the way
(unfortunately during our last release of code), that customization broke.
I'd previously tried to keep it up to date by dealing with deprecation
warnings, but managed to break things.  Now I'm pretty lost with regards to
that code.

Our customization is conceptually pretty simple, so rather than try to fix
up our code, I'd like some advice on the best way to implement this with
Solr 2.4 starting fresh.

We have a multi-valued field, where each value is basically the id of a
category.  Along with the id, there's a score for how well the document fit
into that category (between 0.0 and 1.0).  I'm looking for that
category-score to affect the score of documents when searching on that
field.  Any suggestions on the best way to attack this in Solr 2.4?

Here's how we did it in Lucene: we had an extension of Query, with a custom
scorer.   In the index we stored the category id's as single-valued
space-separated string.  We also stored a space-separated string of scores
in another field.  We made of these fields stored.  We simply delegated the
search to the normal searcher, then we calculated the score we retrieved the
values of both fields for the document.  Then we turned the space-separated
strings into arrays, searched the id array for the index of the desired id,
then scanned the score array for the matching score, and returned.

-- 
Stephen Duncan Jr
www.stephenduncanjr.com


Re: CPU utilization and query time high on Solr slave when snapshot install

2009-11-03 Thread Walter Underwood
Optimizing an index takes CPU, but if you are doing it on a machine  
dedicated to indexing, that does not matter. It will make queries  
faster.


wunder

On Nov 3, 2009, at 2:12 AM, biku...@sapient.com wrote:


Hi Walter,

When the issue occurred, we did try to set autowarming off, but it  
did not solve the problem. The only thing which worked, was  
optimizing the slave index.

But, what you say is logical and I will try it again.

But, the basic question I have is, our solr index is not huge by any  
means. Secondly, I have read in wiki etc. that optmize has adverse  
impact on performance and hence should be done once a day. Then what  
is wrong in our case, that is the cause of performance (we serve  
just 4 req/sec)? Why is optimize fixing the issue contrary to normal  
belief. What will this workaround impact us as the index size  
increase?


Regds,
Bipul

-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org]
Sent: Monday, November 02, 2009 11:18 PM
To: solr-user@lucene.apache.org
Subject: Re: CPU utilization and query time high on Solr slave when  
snapshot install


If you are going to pull a new index every 10 minutes, try turning off
cache autowarming.

Your caches are never more than 10 minutes old, so spending a minute
warming each new cache is a waste of CPU. Autowarm submits queries to
the new Searcher before putting it in service. This will create a
burst of query load on the new Searcher, often keeping one CPU pretty
busy for several seconds.

In solrconfig.xml, set autowarmCount to 0.

Also, if you want the slaves to always have an optimized index, create
the snapshot only in post-optimize. If you create snapshots in both
post-commit and post-optimize, you are creating a non-optimized index
(post-commit), then replacing it with an optimized one a few minutes
later. A slave might get a non-optimized index one time, then an
optimized one the next.

wunder

On Nov 2, 2009, at 1:45 AM, biku...@sapient.com wrote:


Hi Solr Gurus,

We have solr in 1 master, 2 slave configuration. Snapshot is created
post commit, post optimization. We have autocommit after 50
documents or 5 minutes. Snapshot puller runs as a cron every 10
minutes. What we have observed is that whenever snapshot is
installed on the slave, we see solrj client used to query slave
solr, gets timedout and there is high CPU usage/load avg. on slave
server. If we stop snapshot puller, then slaves work with no issues.
The system has been running since 2 months and this issue has
started to occur only now  when load on website is increasing.

Following are some details:

Solr Details:
apache-solr Version: 1.3.0
Lucene - 2.4-dev

Master/Slave configurations:

Master:
- for indexing data HTTPRequests are made on Solr server.
- autocommit feature is enabled for 50 docs and 5 minutes
- caching params are disable for this server
- mergeFactor of 10 is set
- we were running optimize script after every 2 hours, but now have
reduced the duration to twice a day but issue still persists

Slave1/Slave2:
- standard requestHandler is being used
- default values of caching are set
Machine Specifications:

Master:
- 4GB RAM
- 1GB JVM Heap memory is allocated to Solr

Slave1/Slave2:
- 4GB RAM
- 2GB JVM Heap memory is allocated to Solr

Master and Slave1 (solr1)are on single box and Slave2(solr2) on
different box. We use HAProxy to load balance query requests between
2 slaves. Master is only used for indexing.
Please let us know if somebody has ever faced similar kind of issue
or has some insight into it as we guys are literally struck at the
moment with a very unstable production environment.

As a workaround, we have started running optimize on master every 7
minutes. This seems to have reduced the severity of the problem but
still issue occurs every 2days now. please suggest what could be the
root cause of this.

Thanks,
Bipul










Re: logging options for 1.3

2009-11-03 Thread Chris Hostetter

: Is there any way to get 1.3 Solr to use something other than java logging?

Solr 1.3 is compiled directly against the JUL logging APIs, so no.

: Am running solr inside tomcat and would like logging for solr to be directed
: to one set of (rotated) log files and leave tomcat logging in its own log 
files.

that should be very easy if you just configure a seperate FileHandler for 
the org.apache.solr logger ... 
http://java.sun.com/j2se/1.4.2/docs/guide/util/logging/overview.html

: Also, with 1.4, I see it requires removal of jar and swapping in slf4j-log4j 
jar.
: Will I also have to copy in a log4j config file to the war?  Would not want to
: configure it at the tomcat server level as each app will have its own 
configuration.

then yes, i think you would ... but i'm not certain about that.




-Hoss



Re: Date Facet Giving Count more than actual

2009-11-03 Thread Chris Hostetter

: 
q=&facet=true&facet.date=daysForFilter&facet.date.start=2009-10-23T18:30:01Z&facet.date.gap=%2B1DAY&facet.date.end=2009-10-28T18:30:01Z

: For example I get total 18 documents for my query, and the facet count for
: date 2009-10-23T18:30:01Z is 11; whereas there are only 5 documents
: containing this field value. I have verified this in result. Also when I
: query for daysForFilter:2009-10-23T18:30:01Z, it gives me 5 results.

I think you are missunderstanding what date faceting does.  you have a 
facet.date.gap of +1DAY, which means the facet count is anything between 
2009-10-23T18:30:01Z and 2009-10-24T18:30:01Z inclusively.  you can verify 
this using a range query (not a term query) ...

 daysForFilter:[2009-10-23T18:30:01Z TO 2009-10-23T18:30:01Z+1DAY]

if you only want to facet on a unique moment in time (not a range) then 
you cna use facet.query ... or you can set the facet gap smaller.

you should also take a look at facet.date.hardend...
http://wiki.apache.org/solr/SimpleFacetParameters#facet.date.hardend


-Hoss



Re: adding and updating a lot of document to Solr, metadata extraction etc

2009-11-03 Thread Eugene Dzhurinsky
On Mon, Nov 02, 2009 at 05:45:37PM -0800, Lance Norskog wrote:
> About large XML files and http overhead: you can tell solr to load the
> file directly from a file system. This will stream thousands of
> documents in one XML file without loading everything in memory at
> once.
> 
> This is a new book on Solr. It will help you through this early learning 
> phase.
> 
> http://www.packtpub.com/solr-1-4-enterprise-search-server

Thank you, but we have to prepare some proof of concept with the stable
version. I didn't see any 1.4.0 artifacts released to repo1.maven.org for now.

Additionally, I've learned about http://wiki.apache.org/solr/DataImportHandler
and looks like this way is preferred in my case.

I do have a lot of HTML pages on disk storage, and some metadata being stored
in SQL tables. What I seem to need is to provide some sort of EntityProcessor
and DataSource to DataImportHandler. Additionally I will need to provide some
sort of properties to instruct data source for data retrieval (table names
etc).

So may be there is some tutorial or how-to, describing the process of creation
of custom classes for importing the data into Solr 1.3.0?

Thank you in advance!

-- 
Eugene N Dzhurinsky


pgpN3WZoxS6be.pgp
Description: PGP signature


Highlighting is very slow

2009-11-03 Thread Andrew Clegg

Hi everyone,

I'm experimenting with highlighting for the first time, and it seems
shockingly slow for some queries.

For example, this query:

http://server:8080/solr/select/?q=transferase&qt=dismax&version=2.2&start=0&rows=10&indent=on

takes 313ms. But when I add highlighting:

http://server:8080/solr/select/?q=transferase&qt=dismax&version=2.2&start=0&rows=10&indent=on&hl=true&hl.fl=*&fl=id

it takes 305212ms = 5mins!

Some of my documents are slightly large -- the 10 hits for that query
contain between 362 bytes and 1.4 megabytes of text each. All fields are
stored and indexed, and most are termvectored. But this doesn't seem
excessively large!

Has anyone else seen this sort of behaviour before? This is with a nightly
from 2009-10-26.

All suggestions would be appreciated. My schema and config files are
attached...

http://old.nabble.com/file/p26160217/schema.xml schema.xml 
http://old.nabble.com/file/p26160217/solrconfig.xml solrconfig.xml 

Thanks (once again),

Andrew.

-- 
View this message in context: 
http://old.nabble.com/Highlighting-is-very-slow-tp26160217p26160217.html
Sent from the Solr - User mailing list archive at Nabble.com.



Highlighting is very slow

2009-11-03 Thread Andrew Clegg

Hi everyone,

I'm experimenting with highlighting for the first time, and it seems
shockingly slow for some queries.

For example, this query:

http://server:8080/solr/select/?q=transferase&qt=dismax&version=2.2&start=0&rows=10&indent=on

takes 313ms. But when I add highlighting:

http://server:8080/solr/select/?q=transferase&qt=dismax&version=2.2&start=0&rows=10&indent=on&hl=true&hl.fl=*&fl=id

it takes 305212ms = 5mins!

Some of my documents are slightly large -- the 10 hits for that query
contain between 362 bytes and 1.4 megabytes of text each. All fields are
stored and indexed, and most are termvectored. But this doesn't seem
excessively large!

Has anyone else seen this sort of behaviour before? This is with a nightly
from 2009-10-26.

All suggestions would be appreciated. My schema and config files are
attached...

http://old.nabble.com/file/p26160216/schema.xml schema.xml 
http://old.nabble.com/file/p26160216/solrconfig.xml solrconfig.xml 

Thanks (once again),

Andrew.

-- 
View this message in context: 
http://old.nabble.com/Highlighting-is-very-slow-tp26160216p26160216.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: TermVectorComponent : Required / Optional Parameters

2009-11-03 Thread Chantal Ackermann

Hi Grant,

I'd be glad to help update the wiki.
But just to make sure I'm understanding correctly:
for TermVectorComponent to work correctly, all these three attributes 
(termVectors, termPositions, termOffsets) must be set to "true"?


No matter how extensive the termVector request will look like?

Because I tried this request also and it doesn't return the termVector 
part, neither:

.../solr/epg/select?q=*%3A*&version=2.2&start=0&rows=0&indent=on&tv=true&tv.tf=true&tv.df=true

(I tried removing and adding the last two parameters, as well.)

Is there anything else I have to be aware of? How about
indexed
stored
multiValued
omitNorms

Is there a combination that does not work?

Once I get it to work, I'll update the wiki. But I don't want to publish 
my ignorance. ;-)


Thanks,
Chantal


Grant Ingersoll schrieb:

On Nov 3, 2009, at 6:37 AM, Chantal Ackermann wrote:


Hi all!

Are these attributes required to make TermVectorComponent requests
work?
termPositions="true" termOffsets="true"

I have quite a lot of fields with termVectors="true" (for
facetting), but I don't get any results when requesting:
.../solr/epg/select?q=*%3A*&version=2.2&start=0&rows=1
&indent=on&tv=true&tv.tf=true&tv.df=true&tv.positions&tv.offsets=true


Indeed, you must have stored positions and offsets for the TVC to
return positions and offsets.


(I don't have a special RequestHandler configured for it. Using
1.4.0RC)

Would it be possible to add the use case "TermVectorComponent" to
that Wiki page?


Yep, please do.  Anyone can edit the wiki, you just need an account.


http://wiki.apache.org/solr/FieldOptionsByUseCase

(And also add that info to the TermVectorComponent wiki page.)

Thanks!
Chantal





Re: apply a patch on solr

2009-11-03 Thread Chris Hostetter

: --- org/apache/solr/handler/MoreLikeThisHandler.java (revision 772437)
: +++ org/apache/solr/handler/MoreLikeThisHandler.java (working copy)
: 
: now check out revision 772437 using the --revision switch in svn, patch
: away, and then svn up to make sure everything merges cleanly.  This is a
: good guide to follow as well:

...that line only tells you which revision that specific file was last 
updated in ... if you check for all such lines in a patch, and look for 
the highest number, that ill give you a good idea which version you can 
apply it against (but unfortunately, doesn't tell you anything baout other 
files the patch generator may have had checked out, and what revision they 
were last modified in ... the patch may depend on code in other files that 
wasn't modified so it didn't make it ito the patch)

generally speaking: most patches for Solr in Jira are against the trunk 
(we don't tend to have much branch development or backporting) and looking 
at the date the patch was attached to Jira will help you identify what it 
was generated against.

Ieally people will post the output of "svnversion" when they attach a 
patch, but that usually doens't happen.

-Hoss



RE: Lucene FieldCache memory requirements

2009-11-03 Thread Fuad Efendi
Sorry Mike, Mark, I am confused again...

Yes, I need some more memory for processing ("while FieldCache is being
loaded"), obviously, but it was not main subject...

With StringIndexCache, I have 10 arrays (cardinality of this field is 10)
storing  (int) Lucene Document ID.

> Except: as Mark said, you'll also need transient memory = pointer (4
> or 8 bytes) * (1+maxdoc), while the FieldCache is being loaded.

Ok, I see it:
  final int[] retArray = new int[reader.maxDoc()];
  String[] mterms = new String[reader.maxDoc()+1];

I can't track right now (limited in time), I think mterms is local variable
and will size down to 0...



So that correct formula is... weird one... if you don't want unexpected OOM
or overloaded GC (WeakHashMaps...):

  [some heap] + [Non-Tokenized_Field_Count] x [maxdoc] x [4 bytes + 8
bytes]

(for 64-bit)


-Fuad


> -Original Message-
> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> Sent: November-03-09 5:00 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Lucene FieldCache memory requirements
> 
> On Mon, Nov 2, 2009 at 9:27 PM, Fuad Efendi  wrote:
> > I believe this is correct estimate:
> >
> >> C. [maxdoc] x [4 bytes ~ (int) Lucene Document ID]
> >>
> >>   same as
> >> [String1_Document_Count + ... + String10_Document_Count + ...]
> >> x [4 bytes per DocumentID]
> 
> That's right.
> 
> Except: as Mark said, you'll also need transient memory = pointer (4
> or 8 bytes) * (1+maxdoc), while the FieldCache is being loaded.  After
> it's done being loaded, this sizes down to the number of unique terms.
> 
> But, if Lucene did the basic int packing, which really we should do,
> since you only have 10 unique values, with a naive 4 bits per doc
> encoding, you'd only need 1/8th the memory usage.  We could do a bit
> better by encoding more than one document at a time...
> 
> Mike




Re: TermVectorComponent : Required / Optional Parameters

2009-11-03 Thread Grant Ingersoll


On Nov 3, 2009, at 6:37 AM, Chantal Ackermann wrote:


Hi all!

Are these attributes required to make TermVectorComponent requests  
work?

termPositions="true" termOffsets="true"

I have quite a lot of fields with termVectors="true" (for  
facetting), but I don't get any results when requesting:

.../solr/epg/select?q=*%3A*&version=2.2&start=0&rows=1
&indent=on&tv=true&tv.tf=true&tv.df=true&tv.positions&tv.offsets=true


Indeed, you must have stored positions and offsets for the TVC to  
return positions and offsets.




(I don't have a special RequestHandler configured for it. Using  
1.4.0RC)


Would it be possible to add the use case "TermVectorComponent" to  
that Wiki page?


Yep, please do.  Anyone can edit the wiki, you just need an account.



http://wiki.apache.org/solr/FieldOptionsByUseCase

(And also add that info to the TermVectorComponent wiki page.)

Thanks!
Chantal





Re: tf*idf scoring

2009-11-03 Thread Grant Ingersoll


On Nov 3, 2009, at 5:54 AM, Markus Jelsma - Buyways B.V. wrote:



I see, but why not return the true values of Lucene?


I'm not sure what you mean by this.  The TVC returns the term  
frequency and the document frequency and TF/DF as reported by  
Lucene.   The actual raw values.   What you are asking for is for the  
TVC to return some other normalized values above and beyond the  
literal interpretation TF/IDF.  This can be done, it's not  
particularly hard, but it will require a patch or you can just do it  
in your application.   I personally don't think the TVC should do it b/ 
c there are other calculations/interpretations that one might do  
beyond/besides what you propose, so I'd rather just give back the raw  
data and let the user decide.

Re: Multifield query parser and phrase query behaviour from 1.3 to 1.4

2009-11-03 Thread Chris Hostetter

: However, even when it's set to 'false' , the highlighting of a field
: continues to work even if the search doesn't.
: Does the highlighter use a different strategy to match the query terms
: in the fields?

if it has term vectors, it uses them, otherwise it re analyzes the stored 
fields.


-Hoss



DIH : RegexTransformer with groupNames requires all groups to be not empty?

2009-11-03 Thread Chantal Ackermann
Ok, I can confirm that the following configuration for RegexTransformer 
works as I would expect it:




regex="[^\|]+\|\d+,\d+,\d+,(.+)" />


To the multivalued fields participant and role, values are only added if 
their corresponding regex matches.




The following configuration does not add any matched value to any field 
if one (or more) of the groups is not matched. It only adds values to 
all fields in groupNames if all groups are matched:






Chantal


Chantal Ackermann schrieb:

follow-up:


regex="([^\|]+)\|\d+,\d+,\d+,(.+)"

is the version I chose after I had the following problems with
regex="([^\|]+)\|\d+,\d+,\d+,(.*)"
(changed * into + for the second group):

The role field contained empty values even if I added a
TrimFilterFactory with minimum length of 1. So, I changed the regular
expression to find only non-empty values. Well, it does now - but if it
cannot find a value for the second group it doesn't even add the value
for the first group.

Any help on getting this solved is greatly appreciated.
It boils down to this question:

- How can I achieve that the RegexTransformer adds a value only if
it contains a non-empty value and avoiding at the same time that it only
adds values when all of the groups contain values.

Maybe the configuration with groupNames is meant to work like that. If
that is the case, it's probably worth adding this information to the
Wiki. I will change back to using the sourceCol attribute as
https://issues.apache.org/jira/browse/SOLR-1498
should be fixed with this 1.4.0RC version, now.

Thanks!
Chantal

Chantal Ackermann schrieb:

Dear all,

my DIH config contains the following directive for the RegexTransformer:



(this is SOLR 1.4.0 RC downloaded yesterday from Grant's URL)

It expects input of the kind (version A):
Daniel Radcliffe|24897,1,1,Harry Potter

It should also work with (version B):
Daniel Radcliffe|24897,1,1,

In my index, however, I can only find documents that either contain
participant and role or neither. Of course, I didn't check all
documents. But for both fields, Luke shows the same number of documents:
Docs:  47015

(There are definitely datasets that contain participants without role.)

I'll check the code and try with a different configuration (using
sourceCol). But I thought I'd spread the news before the release is definit.

Thanks,
Chantal




Re: very slow add/commit time

2009-11-03 Thread Bruno
Try raising you ramBufferSize (it helped a lot when my team had performance
issues)

And also try checkin this link (helps a lot)
http://wiki.apache.org/solr/SolrPerformanceFactors

Regards

On Tue, Nov 3, 2009 at 12:38 PM, Marc Des Garets wrote:

> If you mean ramBufferSizeMB, I have it set on 512. The maxBufferedDocs
> is commented. If you mean queryResultMaxDocsCached, it is set on 200 but
> is it used when indexing?
>
> -Original Message-
> From: Bruno [mailto:brun...@gmail.com]
> Sent: 03 November 2009 14:27
> To: solr-user@lucene.apache.org
> Subject: Re: very slow add/commit time
>
> How many MB have you set of cache on your solrconfig.xml?
>
> On Tue, Nov 3, 2009 at 12:24 PM, Marc Des Garets
> wrote:
>
> > Hi,
> >
> >
> >
> > I am experiencing a problem with an index of about 80 millions
> documents
> > (41Gb). I am trying to update documents in this index using Solrj.
> >
> >
> >
> > When I do:
> >
> > solrServer.add(docs);  //docs is a List that
> contains
> > 1000 SolrInputDocument (takes 36sec)
> >
> > solrServer.commit(false,false); //either never ends with a OutOfMemory
> > error or takes forever
> >
> >
> >
> > I have -Xms4g -Xmx4g
> >
> >
> >
> > Any idea what could be the problem?
> >
> >
> >
> > Thanks for your help.
> >
> >
> > --
> > This transmission is strictly confidential, possibly legally
> privileged,
> > and intended solely for the
> > addressee.  Any views or opinions expressed within it are those of the
> > author and do not necessarily
> > represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's
> > subsidiary companies.  If you
> > are not the intended recipient then you must not disclose, copy or
> take any
> > action in reliance of this
> > transmission. If you have received this transmission in error, please
> > notify the sender as soon as
> > possible.  No employee or agent is authorised to conclude any binding
> > agreement on behalf of
> > i-CD Publishing (UK) Ltd with another party by email without express
> > written confirmation by an
> > authorised employee of the Company. http://www.192.com (Tel: 08000 192
> > 192).  i-CD Publishing (UK) Ltd
> > is incorporated in England and Wales, company number 3148549, VAT No.
> GB
> > 673128728.
>
>
>
>
> --
> Bruno Morelli Vargas
> Mail: brun...@gmail.com
> Msn: brun...@hotmail.com
> Icq: 165055101
> Skype: morellibmv
> --
> This transmission is strictly confidential, possibly legally privileged,
> and intended solely for the
> addressee.  Any views or opinions expressed within it are those of the
> author and do not necessarily
> represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's
> subsidiary companies.  If you
> are not the intended recipient then you must not disclose, copy or take any
> action in reliance of this
> transmission. If you have received this transmission in error, please
> notify the sender as soon as
> possible.  No employee or agent is authorised to conclude any binding
> agreement on behalf of
> i-CD Publishing (UK) Ltd with another party by email without express
> written confirmation by an
> authorised employee of the Company. http://www.192.com (Tel: 08000 192
> 192).  i-CD Publishing (UK) Ltd
> is incorporated in England and Wales, company number 3148549, VAT No. GB
> 673128728.




-- 
Bruno Morelli Vargas
Mail: brun...@gmail.com
Msn: brun...@hotmail.com
Icq: 165055101
Skype: morellibmv


RE: Getting update/extract RequestHandler to work under Tomcat

2009-11-03 Thread Chris Hostetter

: I see the source - but no classes or jar that seems to fit the bill.  
: 
: I've had problems getting ant to build from the nightly trunk.  I'm of
...
: If there is an existing jar of the ExtractingRequestHandler classes that
: I might download - please point me to it.

If you are downloading a nightly (or a 1.4 release candidate) there is 
*nothing* you should need to build ... all of the compiled jars (including 
for all of hte contribs) can be found in the "./dist" directory.  

(the only jars not included in the releases are the third-party 
clustering libraries not released under ASL compatible licenses, but those 
aren't neeed for extraction)



-Hoss



RE: very slow add/commit time

2009-11-03 Thread Marc Des Garets
If you mean ramBufferSizeMB, I have it set on 512. The maxBufferedDocs
is commented. If you mean queryResultMaxDocsCached, it is set on 200 but
is it used when indexing?

-Original Message-
From: Bruno [mailto:brun...@gmail.com] 
Sent: 03 November 2009 14:27
To: solr-user@lucene.apache.org
Subject: Re: very slow add/commit time

How many MB have you set of cache on your solrconfig.xml?

On Tue, Nov 3, 2009 at 12:24 PM, Marc Des Garets
wrote:

> Hi,
>
>
>
> I am experiencing a problem with an index of about 80 millions
documents
> (41Gb). I am trying to update documents in this index using Solrj.
>
>
>
> When I do:
>
> solrServer.add(docs);  //docs is a List that
contains
> 1000 SolrInputDocument (takes 36sec)
>
> solrServer.commit(false,false); //either never ends with a OutOfMemory
> error or takes forever
>
>
>
> I have -Xms4g -Xmx4g
>
>
>
> Any idea what could be the problem?
>
>
>
> Thanks for your help.
>
>
> --
> This transmission is strictly confidential, possibly legally
privileged,
> and intended solely for the
> addressee.  Any views or opinions expressed within it are those of the
> author and do not necessarily
> represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's
> subsidiary companies.  If you
> are not the intended recipient then you must not disclose, copy or
take any
> action in reliance of this
> transmission. If you have received this transmission in error, please
> notify the sender as soon as
> possible.  No employee or agent is authorised to conclude any binding
> agreement on behalf of
> i-CD Publishing (UK) Ltd with another party by email without express
> written confirmation by an
> authorised employee of the Company. http://www.192.com (Tel: 08000 192
> 192).  i-CD Publishing (UK) Ltd
> is incorporated in England and Wales, company number 3148549, VAT No.
GB
> 673128728.




-- 
Bruno Morelli Vargas
Mail: brun...@gmail.com
Msn: brun...@hotmail.com
Icq: 165055101
Skype: morellibmv
--
This transmission is strictly confidential, possibly legally privileged, and 
intended solely for the 
addressee.  Any views or opinions expressed within it are those of the author 
and do not necessarily 
represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's subsidiary 
companies.  If you 
are not the intended recipient then you must not disclose, copy or take any 
action in reliance of this 
transmission. If you have received this transmission in error, please notify 
the sender as soon as 
possible.  No employee or agent is authorised to conclude any binding agreement 
on behalf of 
i-CD Publishing (UK) Ltd with another party by email without express written 
confirmation by an 
authorised employee of the Company. http://www.192.com (Tel: 08000 192 192).  
i-CD Publishing (UK) Ltd 
is incorporated in England and Wales, company number 3148549, VAT No. GB 
673128728.

TermVectorComponent : Required / Optional Parameters

2009-11-03 Thread Chantal Ackermann

Hi all!

Are these attributes required to make TermVectorComponent requests work?
 termPositions="true" termOffsets="true"

I have quite a lot of fields with termVectors="true" (for facetting), 
but I don't get any results when requesting:

.../solr/epg/select?q=*%3A*&version=2.2&start=0&rows=1
&indent=on&tv=true&tv.tf=true&tv.df=true&tv.positions&tv.offsets=true

(I don't have a special RequestHandler configured for it. Using 1.4.0RC)

Would it be possible to add the use case "TermVectorComponent" to that 
Wiki page?


http://wiki.apache.org/solr/FieldOptionsByUseCase

(And also add that info to the TermVectorComponent wiki page.)

Thanks!
Chantal


Re: Getting update/extract RequestHandler to work under Tomcat

2009-11-03 Thread Chris Hostetter

: The \contrib and \dist directories were copied directly below the
: "webapps\apache-solr-1.4-dev" unchanged from the example.

...that doesn't sound right, they shouldn't be copied into webapps at all.  
can you show a full directory structure...

: Im the catalina log I see all the "Adding specified lib dirs..." added
: without error:
: 
:   INFO: Adding specified lib dirs to ClassLoader
...
:   (...many more...)

...can you elaborate on "many more" ... specificly do you ever see it say 
it's loading anything from "contrib/extraction" or 
"apache-solr-cell-1.4.jar" ?



-Hoss



RE: Getting update/extract RequestHandler to work under Tomcat

2009-11-03 Thread Glock, Thomas

Thanks -

Looked at it last night and I think the problem is that I need to
compile the ExtractingRequestHandler classes/jar.  

I see the source - but no classes or jar that seems to fit the bill.  

I've had problems getting ant to build from the nightly trunk.  I'm of
the opinion I simply need to get the latest source and perform an ant
build.  But this is the first I've worked with ant and so I'm sure I
don't have things set up correctly.

If there is an existing jar of the ExtractingRequestHandler classes that
I might download - please point me to it.

I'll look at this today - thanks again - much appreciated.


-Original Message-
From: Grant Ingersoll [mailto:gsing...@apache.org] 
Sent: Tuesday, November 03, 2009 8:12 AM
To: solr-user@lucene.apache.org
Subject: Re: Getting update/extract RequestHandler to work under Tomcat

Try making it a non-Lazy loaded handler. Does that help?


On Nov 2, 2009, at 4:37 PM, Glock, Thomas wrote:

>
> Hoping someone might help with getting /update/extract RequestHandler 
> to work under Tomcat.
>
> Error 500 happens when trying to access 
> http://localhost:8080/apache-solr-1.4-dev/update/extract/  (see below)
>
> Note /update/extract DOES work correctly under the Jetty provided 
> example.
>
> I think I must have a directory path incorrectly specified but not 
> sure where.
>
> No errors in the Catalina log on startup - only this:
>
>   Nov 2, 2009 7:10:49 PM org.apache.solr.core.RequestHandlers
> initHandlersFromConfig
>   INFO: created /update/extract:
> org.apache.solr.handler.extraction.ExtractingRequestHandler
>
> Solrconfig.xml under tomcat is slightly changed from the example with 
> regards to  elements:
>
> regex="apache-solr-cell-\d.*\.jar" />   regex="apache-solr-clustering-\d.*\.jar" />:
>
> The \contrib and \dist directories were copied directly below the 
> "webapps\apache-solr-1.4-dev" unchanged from the example.
>
> Im the catalina log I see all the "Adding specified lib dirs..." added

> without error:
>
>   INFO: Adding specified lib dirs to ClassLoader
>   Nov 2, 2009 7:31:20 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
>   INFO: Adding
> 'file:/C:/Program%20Files/Apache%20Software%20Foundation/Tomcat
> %206.0/we
> bapps/apache-solr-1.4-dev/contrib/extraction/lib/asm-3.1.jar' to 
> classloader
>   Nov 2, 2009 7:31:20 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
>   INFO: Adding
> 'file:/C:/Program%20Files/Apache%20Software%20Foundation/Tomcat
> %206.0/we
> bapps/apache-solr-1.4-dev/contrib/extraction/lib/bcmail-jdk14-136.jar'
> to classloader
>   Nov 2, 2009 7:31:20 PM org.apache.solr.core.SolrResourceLoader
> replaceClassLoader
>   INFO: Adding
> 'file:/C:/Program%20Files/Apache%20Software%20Foundation/Tomcat
> %206.0/we
> bapps/apache-solr-1.4-dev/contrib/extraction/lib/bcprov-jdk14-136.jar'
> to classloader
>
>   (...many more...)
>
> Solr Home is mapped to:
>
>   INFO: SolrDispatchFilter.init()
>   Nov 2, 2009 7:10:47 PM org.apache.solr.core.SolrResourceLoader
> locateSolrHome
>   INFO: Using JNDI solr.home: .\webapps\apache-solr-1.4-dev\solr
>   Nov 2, 2009 7:10:47 PM
> org.apache.solr.core.CoreContainer$Initializer initialize
>   INFO: looking for solr.xml: C:\Program Files\Apache Software 
> Foundation\Tomcat 6.0\.\webapps\apache-solr-1.4-dev\solr\solr.xml
>   Nov 2, 2009 7:10:47 PM org.apache.solr.core.SolrResourceLoader
> 
>   INFO: Solr home set to '.\webapps\apache-solr-1.4-dev\solr\'
>
> 500 Error:
>
> HTTP Status 500 - lazy loading error
> org.apache.solr.common.SolrException: lazy loading error at 
> org.apache.solr.core.RequestHandlers
> $LazyRequestHandlerWrapper.getWrappe
> dHandler(RequestHandlers.java:249) at
> org.apache.solr.core.RequestHandlers
> $LazyRequestHandlerWrapper.handleReq
> uest(RequestHandlers.java:231) at
> org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at 
> org.apache.solr.servlet.SolrDispatchFilter.execute
> (SolrDispatchFilter.ja
> va:338) at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter
> (SolrDispatchFilter.j
> ava:241) at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter
> (Applica
> tionFilterChain.java:235) at
> org.apache.catalina.core.ApplicationFilterChain.doFilter
> (ApplicationFilt
> erChain.java:206) at
> org.apache.catalina.core.StandardWrapperValve.invoke
> (StandardWrapperValv
> e.java:233) at
> org.apache.catalina.core.StandardContextValve.invoke
> (StandardContextValv
> e.java:191) at
> org.apache.catalina.authenticator.AuthenticatorBase.invoke
> (Authenticator
> Base.java:433) at
> org.apache.catalina.core.StandardHostValve.invoke
> (StandardHostValve.java
> :128) at
> org.apache.catalina.valves.ErrorReportValve.invoke
> (ErrorReportValve.java
> :102) at
> org.apache.catalina.core.StandardEngineValve.invoke
> (StandardEngineValve.
> java:109) at
> org.apache.catalina.connector.CoyoteAdapter.service
> (CoyoteAdapter.java:2
> 93) at
> org.apa

Re: very slow add/commit time

2009-11-03 Thread Bruno
How many MB have you set of cache on your solrconfig.xml?

On Tue, Nov 3, 2009 at 12:24 PM, Marc Des Garets wrote:

> Hi,
>
>
>
> I am experiencing a problem with an index of about 80 millions documents
> (41Gb). I am trying to update documents in this index using Solrj.
>
>
>
> When I do:
>
> solrServer.add(docs);  //docs is a List that contains
> 1000 SolrInputDocument (takes 36sec)
>
> solrServer.commit(false,false); //either never ends with a OutOfMemory
> error or takes forever
>
>
>
> I have -Xms4g -Xmx4g
>
>
>
> Any idea what could be the problem?
>
>
>
> Thanks for your help.
>
>
> --
> This transmission is strictly confidential, possibly legally privileged,
> and intended solely for the
> addressee.  Any views or opinions expressed within it are those of the
> author and do not necessarily
> represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's
> subsidiary companies.  If you
> are not the intended recipient then you must not disclose, copy or take any
> action in reliance of this
> transmission. If you have received this transmission in error, please
> notify the sender as soon as
> possible.  No employee or agent is authorised to conclude any binding
> agreement on behalf of
> i-CD Publishing (UK) Ltd with another party by email without express
> written confirmation by an
> authorised employee of the Company. http://www.192.com (Tel: 08000 192
> 192).  i-CD Publishing (UK) Ltd
> is incorporated in England and Wales, company number 3148549, VAT No. GB
> 673128728.




-- 
Bruno Morelli Vargas
Mail: brun...@gmail.com
Msn: brun...@hotmail.com
Icq: 165055101
Skype: morellibmv


very slow add/commit time

2009-11-03 Thread Marc Des Garets
Hi,

 

I am experiencing a problem with an index of about 80 millions documents
(41Gb). I am trying to update documents in this index using Solrj.

 

When I do:

solrServer.add(docs);  //docs is a List that contains
1000 SolrInputDocument (takes 36sec)

solrServer.commit(false,false); //either never ends with a OutOfMemory
error or takes forever

 

I have -Xms4g -Xmx4g

 

Any idea what could be the problem?

 

Thanks for your help.

 
--
This transmission is strictly confidential, possibly legally privileged, and 
intended solely for the 
addressee.  Any views or opinions expressed within it are those of the author 
and do not necessarily 
represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's subsidiary 
companies.  If you 
are not the intended recipient then you must not disclose, copy or take any 
action in reliance of this 
transmission. If you have received this transmission in error, please notify 
the sender as soon as 
possible.  No employee or agent is authorised to conclude any binding agreement 
on behalf of 
i-CD Publishing (UK) Ltd with another party by email without express written 
confirmation by an 
authorised employee of the Company. http://www.192.com (Tel: 08000 192 192).  
i-CD Publishing (UK) Ltd 
is incorporated in England and Wales, company number 3148549, VAT No. GB 
673128728.

Re: tf*idf scoring

2009-11-03 Thread Markus Jelsma - Buyways B.V.


> >
> >
> > According to different algorithms, the tf for term c would be 3 / 1 =
> > 0.33 instead of 1 returned by Solr.
> 
> I don't follow.  The TF (term frequency) is the number of times the  
> term c occurs in that particular document, i.e. 1 time.


I see that above, and below, i made some typo's.  I wrote 3 / 1 = 0.3
instead of 1 / 3 = 0.33. Term c has a #occurences of 1 which the other
algorithms normalize by dividing by the number of terms. So instead of a
tf = #occurences (1) other algorithms do tf = #occurences / #terms
(0.33). 


> 
> > Also, the tf*idf value i get is 0.5
> > for term c and i get 0.333 for term a. It looks like tf*idf is  
> > quotient
> > of document frequency and term frequency.
> 
> Yes, indeed.  IDF == Inverse Document Frequency, in other words, 1/DF.


Indeed, but most algorithms i have seen on this topic calculate idf by
ln(#docs / df), this is also true for Lucene as i read
http://lucene.apache.org/java/2_9_0/api/core/org/apache/lucene/search/Similarity.html

idf(t)  =   1 + log (numDocs / df + 1)


> 
> >
> > If i calculate tf*idf, for term c in the first document, according to
> > other algorithms it would be:
> >
> > tf = 3 / 1 = 0.333
> 
> 3/1 = 3, no?  I don't see where in your docs above you could even get  
> a 3 for the letter c.


Here's the other typo, i wrote again 3 / 1 = 0.33 what should've been
1 / 3 = 0.33, of course. The differences i see are:

tf (solr) = #occurences_of_term_T in document_D
tf (other) = #occurences_of_term_T in document_D / #terms_document_D

df (solr) = #occurences_of_term_T in all_documents
df (other) = #occurences_of_term_T in all_documents

idf (solr) = tf / df
idf (other) = ln(#documents / df)

tf*idf (solr) = tf / df
tf*idf (other) = tf * idf


> 
> > idf = ln(6 / 3) = 1.0986
> > tf*idf = 0.333 * 1.0986 = 0.3658
> >
> 
> I think the formulas you are looking at are doing operations to  
> normalize the values, whereas the Solr/Lucene stuff above is telling  
> you their raw values.  Note, Lucene/Solr does length normalization,  
> etc. too, it just isn't encoded into the TF or DF.  For more on  
> Lucene's scoring, see http://lucene.apache.org/java/2_9_0/scoring.html
> 


I see, but why not return the true values of Lucene? I did not
reconfigure Solr's scheme to use another algorithm for similarity and
the above Lucene similarity docs state that they use similar
calculations as i have in DefaultSimilarty.



> --
> Grant Ingersoll
> http://www.lucidimagination.com/
> 
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
> using Solr/Lucene:
> http://www.lucidimagination.com/search
> 


Re: solrj query size limit?

2009-11-03 Thread Gregg Horan

That was it.  Didn't see that optional parameter - the POST works.

Thanks!


On Nov 3, 2009, at 1:57 AM, Avlesh Singh wrote:

Did you hit the limit for maximum number of characters in a GET  
request?


Cheers
Avlesh

On Tue, Nov 3, 2009 at 9:36 AM, Gregg Horan   
wrote:


I'm constructing a query using solrj that has a fairly large number  
of 'OR'
clauses.  I'm just adding it as a big string to setQuery(), in the  
format

"accountId:(this OR that OR yada)".

This works all day long with 300 values.  When I push it up to  
350-400
values, I get a "Bad Request" SolrServerException.  It appears to  
just be a
client error - nothing reaching the server logs.  Very  
repeatable... dial

it
back down and it goes through again fine.

The total string length of the query (including a handful of other  
faceting
entries) is about 9500chars.   I do have the maxBooleanClauses  
jacked up to

2048.  Using javabin.  1.4-dev.

Are there any other options or settings I might be overlooking?

-Gregg





Re: tracking solr response time

2009-11-03 Thread bharath venkatesh
>I didn't see where you said what Solr version you were using.

below is the solr version info :-
Solr Specification Version: 1.2.2008.07.22.15.48.39
Solr Implementation Version: 1.3-dev
Lucene Specification Version: 2.3.1
Lucene Implementation Version: 2.3.1 629191 - buschmi - 2008-02-19 19:15:48

>this can happen with really big indexes that can't all fit in memory

one of our index is  pretty big its about 16 GB ,  other  indexes are
small (for other applications )  our servers have 32 GB ram

>There are some pretty big concurrency differences between 1.3 and 1.4 too (if 
>your tests involve many concurrent requests).

as I said we observed latency in our live(production) system that is
when we started logging response at the client to identify the problem
, so in our live (production) system there is considerable concurrency
during peak times







On 11/3/09, Yonik Seeley  wrote:
> On Mon, Nov 2, 2009 at 2:21 PM, bharath venkatesh
>  wrote:
>> we observed many times there is huge mismatch between qtime and
>> time measured at the client for the response
>
> Long times to stream back the result to the client could be due to
>  - client not reading fast enough
>  - network congestion
>  - reading the stored fields takes a long time
> - this can happen with really big indexes that can't all fit in
> memory, and stored fields tend to not be cached well by the OS
> (essentially random access patterns over a huge area).  This ends up
> causing a disk seek per document being
> streamed back.
>  - locking contention for reading the index (under Solr 1.3, but not
> under 1.4 on non-windows platforms)
>
> I didn't see where you said what Solr version you were using.  There
> are some pretty big concurrency differences between 1.3 and 1.4 too
> (if your tests involve many concurrent requests).
>
> -Yonik
> http://www.lucidimagination.com
>


DIH : RegexTransformer with groupNames requires all groups to be not empty?

2009-11-03 Thread Chantal Ackermann

follow-up:


regex="([^\|]+)\|\d+,\d+,\d+,(.+)"

is the version I chose after I had the following problems with
regex="([^\|]+)\|\d+,\d+,\d+,(.*)"
(changed * into + for the second group):

The role field contained empty values even if I added a 
TrimFilterFactory with minimum length of 1. So, I changed the regular 
expression to find only non-empty values. Well, it does now - but if it 
cannot find a value for the second group it doesn't even add the value 
for the first group.


Any help on getting this solved is greatly appreciated.
It boils down to this question:

- How can I achieve that the RegexTransformer adds a value only if
it contains a non-empty value and avoiding at the same time that it only 
adds values when all of the groups contain values.


Maybe the configuration with groupNames is meant to work like that. If 
that is the case, it's probably worth adding this information to the 
Wiki. I will change back to using the sourceCol attribute as

https://issues.apache.org/jira/browse/SOLR-1498
should be fixed with this 1.4.0RC version, now.

Thanks!
Chantal

Chantal Ackermann schrieb:

Dear all,

my DIH config contains the following directive for the RegexTransformer:



(this is SOLR 1.4.0 RC downloaded yesterday from Grant's URL)

It expects input of the kind (version A):
Daniel Radcliffe|24897,1,1,Harry Potter

It should also work with (version B):
Daniel Radcliffe|24897,1,1,

In my index, however, I can only find documents that either contain
participant and role or neither. Of course, I didn't check all
documents. But for both fields, Luke shows the same number of documents:
Docs:  47015

(There are definitely datasets that contain participants without role.)

I'll check the code and try with a different configuration (using
sourceCol). But I thought I'd spread the news before the release is definit.

Thanks,
Chantal




DIH : RegexTransformer with groupNames requires all groups to be not empty?

2009-11-03 Thread Chantal Ackermann

Dear all,

my DIH config contains the following directive for the RegexTransformer:



(this is SOLR 1.4.0 RC downloaded yesterday from Grant's URL)

It expects input of the kind (version A):
Daniel Radcliffe|24897,1,1,Harry Potter

It should also work with (version B):
Daniel Radcliffe|24897,1,1,

In my index, however, I can only find documents that either contain
participant and role or neither. Of course, I didn't check all
documents. But for both fields, Luke shows the same number of documents:
Docs:  47015

(There are definitely datasets that contain participants without role.)

I'll check the code and try with a different configuration (using
sourceCol). But I thought I'd spread the news before the release is definit.

Thanks,
Chantal




Re: Problems downloading lucene 2.9.1

2009-11-03 Thread Grant Ingersoll
Yeah, that would be useful.  We could hook into the new Apache  
Repository stuff, but that will take some work.  Also, I think 1.4 is  
in all likelihood a special case in terms of dealing with the Lucene  
release.


-Grant

On Nov 3, 2009, at 12:45 AM, Ian Ibbotson wrote:


Heya Ryan...

For me the big problem with adding
http://people.apache.org/~mikemccand/staging-area/rc3_lucene2.9.1/maven 
/to

my build config is that the artifact names of the interim release are
the
same as the final objects will be.. thus once they are copied to a  
local
repo maven won't bother to go looking for more recent versions, even  
if you
blow away that temporary repo. Would it be possible to publish  
tagged rc-N

releases to a public and more permanent repository where people can
reference them and upgrade to the final release when it's available.

Just a thought, cheers for all your hard work.

Ian.

2009/11/2 Ryan McKinley 



On Nov 2, 2009, at 8:29 AM, Grant Ingersoll wrote:



On Nov 2, 2009, at 12:12 AM, Licinio Fernández Maurelo wrote:

Hi folks,


as we are using an snapshot dependecy to solr1.4, today we are  
getting
problems when maven try to download lucene 2.9.1 (there isn't a  
any 2.9.1

there).

Which repository can i use to download it?



They won't be there until 2.9.1 is officially released.  We are  
trying to
speed up the Solr release by piggybacking on the Lucene release,  
but this

little bit is the one downside.



Until then, you can add a repo to:

http://people.apache.org/~mikemccand/staging-area/rc3_lucene2.9.1/maven 
/






--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: tf*idf scoring

2009-11-03 Thread Grant Ingersoll

Inline below

On Nov 3, 2009, at 2:30 AM, Markus Jelsma - Buyways B.V. wrote:


Hello list,


I have a question about Lucene's calculation of tf*idf value. I first
noticed that Solr's tf does not compare to tf values based on
calculation elsewhere such as
http://odin.himinbi.org/idf_to_item:item/comparing_tf%3Aidf_to_item%
3Aitem_similarity.xhtml or http://en.wikipedia.org/wiki/Tf%E2%80%93idf

The tf values returned by Solr are always integers and not normalized
against the length of the corpus whilst the field in which it resides
does not have omitNorms="true".

Consider the following documents where the field subject is of the
standard text_ws type:


   
   a b c
   
   
   d e f
   
   
   x y z
   
   
   a d x
   
   
   a e z
   
   
   c f z
   


Now, Solr's TermVector results for the first document:


   0
   
   
   1
   
   0

   3
   0.
   
   
   1
   
   1
   
   1
   1.0
   
   
   1
   
   2
   
   2
   0.5
   
   



According to different algorithms, the tf for term c would be 3 / 1 =
0.33 instead of 1 returned by Solr.


I don't follow.  The TF (term frequency) is the number of times the  
term c occurs in that particular document, i.e. 1 time.



Also, the tf*idf value i get is 0.5
for term c and i get 0.333 for term a. It looks like tf*idf is  
quotient

of document frequency and term frequency.


Yes, indeed.  IDF == Inverse Document Frequency, in other words, 1/DF.



If i calculate tf*idf, for term c in the first document, according to
other algorithms it would be:

tf = 3 / 1 = 0.333


3/1 = 3, no?  I don't see where in your docs above you could even get  
a 3 for the letter c.



idf = ln(6 / 3) = 1.0986
tf*idf = 0.333 * 1.0986 = 0.3658



I think the formulas you are looking at are doing operations to  
normalize the values, whereas the Solr/Lucene stuff above is telling  
you their raw values.  Note, Lucene/Solr does length normalization,  
etc. too, it just isn't encoded into the TF or DF.  For more on  
Lucene's scoring, see http://lucene.apache.org/java/2_9_0/scoring.html


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: Getting update/extract RequestHandler to work under Tomcat

2009-11-03 Thread Grant Ingersoll

Try making it a non-Lazy loaded handler. Does that help?


On Nov 2, 2009, at 4:37 PM, Glock, Thomas wrote:



Hoping someone might help with getting /update/extract  
RequestHandler to

work under Tomcat.

Error 500 happens when trying to access
http://localhost:8080/apache-solr-1.4-dev/update/extract/  (see below)

Note /update/extract DOES work correctly under the Jetty provided
example.

I think I must have a directory path incorrectly specified but not  
sure

where.

No errors in the Catalina log on startup - only this:

Nov 2, 2009 7:10:49 PM org.apache.solr.core.RequestHandlers
initHandlersFromConfig
INFO: created /update/extract:
org.apache.solr.handler.extraction.ExtractingRequestHandler

Solrconfig.xml under tomcat is slightly changed from the example with
regards to  elements:

 
 
 :

The \contrib and \dist directories were copied directly below the
"webapps\apache-solr-1.4-dev" unchanged from the example.

Im the catalina log I see all the "Adding specified lib dirs..." added
without error:

INFO: Adding specified lib dirs to ClassLoader
Nov 2, 2009 7:31:20 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/C:/Program%20Files/Apache%20Software%20Foundation/Tomcat 
%206.0/we

bapps/apache-solr-1.4-dev/contrib/extraction/lib/asm-3.1.jar' to
classloader
Nov 2, 2009 7:31:20 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/C:/Program%20Files/Apache%20Software%20Foundation/Tomcat 
%206.0/we

bapps/apache-solr-1.4-dev/contrib/extraction/lib/bcmail-jdk14-136.jar'
to classloader
Nov 2, 2009 7:31:20 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
INFO: Adding
'file:/C:/Program%20Files/Apache%20Software%20Foundation/Tomcat 
%206.0/we

bapps/apache-solr-1.4-dev/contrib/extraction/lib/bcprov-jdk14-136.jar'
to classloader

(...many more...)

Solr Home is mapped to:

INFO: SolrDispatchFilter.init()
Nov 2, 2009 7:10:47 PM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: Using JNDI solr.home: .\webapps\apache-solr-1.4-dev\solr
Nov 2, 2009 7:10:47 PM
org.apache.solr.core.CoreContainer$Initializer initialize
INFO: looking for solr.xml: C:\Program Files\Apache Software
Foundation\Tomcat 6.0\.\webapps\apache-solr-1.4-dev\solr\solr.xml
Nov 2, 2009 7:10:47 PM org.apache.solr.core.SolrResourceLoader

INFO: Solr home set to '.\webapps\apache-solr-1.4-dev\solr\'

500 Error:

HTTP Status 500 - lazy loading error
org.apache.solr.common.SolrException: lazy loading error at
org.apache.solr.core.RequestHandlers 
$LazyRequestHandlerWrapper.getWrappe

dHandler(RequestHandlers.java:249) at
org.apache.solr.core.RequestHandlers 
$LazyRequestHandlerWrapper.handleReq

uest(RequestHandlers.java:231) at
org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at
org.apache.solr.servlet.SolrDispatchFilter.execute 
(SolrDispatchFilter.ja

va:338) at
org.apache.solr.servlet.SolrDispatchFilter.doFilter 
(SolrDispatchFilter.j

ava:241) at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter 
(Applica

tionFilterChain.java:235) at
org.apache.catalina.core.ApplicationFilterChain.doFilter 
(ApplicationFilt

erChain.java:206) at
org.apache.catalina.core.StandardWrapperValve.invoke 
(StandardWrapperValv

e.java:233) at
org.apache.catalina.core.StandardContextValve.invoke 
(StandardContextValv

e.java:191) at
org.apache.catalina.authenticator.AuthenticatorBase.invoke 
(Authenticator

Base.java:433) at
org.apache.catalina.core.StandardHostValve.invoke 
(StandardHostValve.java

:128) at
org.apache.catalina.valves.ErrorReportValve.invoke 
(ErrorReportValve.java

:102) at
org.apache.catalina.core.StandardEngineValve.invoke 
(StandardEngineValve.

java:109) at
org.apache.catalina.connector.CoyoteAdapter.service 
(CoyoteAdapter.java:2

93) at
org.apache.coyote.http11.Http11AprProcessor.process 
(Http11AprProcessor.j

ava:859) at
org.apache.coyote.http11.Http11AprProtocol 
$Http11ConnectionHandler.proce

ss(Http11AprProtocol.java:574) at
org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java: 
1527)

at java.lang.Thread.run(Unknown Source) Caused by:
org.apache.solr.common.SolrException: Error loading class
'org.apache.solr.handler.extraction.ExtractingRequestHandler' at
org.apache.solr.core.SolrResourceLoader.findClass 
(SolrResourceLoader.jav

a:373) at
org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at
org.apache.solr.core.SolrCore.createRequestHandler(SolrCore.java: 
449) at
org.apache.solr.core.RequestHandlers 
$LazyRequestHandlerWrapper.getWrappe

dHandler(RequestHandlers.java:240) ... 17 more Caused by:
java.lang.ClassNotFoundException:
org.apache.solr.handler.extraction.ExtractingRequestHandler at
java.net.URLClassLoader$1.run(Unknown Source) at
java.security.AccessController.doPrivileged(Native Method) at
java.net.URLClassLoader.findClass(Unknown Source) at
java.lang.ClassLoade

Re: Indexing multiple entities

2009-11-03 Thread Christian López Espínola
On Mon, Nov 2, 2009 at 11:07 AM, Chantal Ackermann
 wrote:
>> I'm using a code generator for my entities, and I cannot modify the
>> generation.
>> I need to work out another option :(
>
> shouldn't code generators help development and not make it more complex and
> difficult? oO

Yeah, they do. But I wasn't clever enough for solving this until today ;-)

> (sry off topic)
>
> chantal
>



-- 
Cheers,

Christian López Espínola 


Re: Indexing multiple entities

2009-11-03 Thread Christian López Espínola
On Sun, Nov 1, 2009 at 10:34 AM, Christian López Espínola
 wrote:
> On Sun, Nov 1, 2009 at 5:30 AM, Avlesh Singh  wrote:
>>>
>>> The use case on DocumentObjectBinder is that I could override
>>> toSolrInputDocument, and if field = ID, I could do: setField("id",
>>> obj.getClass().getName() + obj.getId()) or something like that.
>>>
>>
>> Unless I am missing something here, can't you write the getter of id field
>> in your solr bean as underneath?
>>
>> @Field
>> private String id;
>> public getId(){
>>  return (this.getClass().getName() + this.id);
>> }
>
> I'm using a code generator for my entities, and I cannot modify the 
> generation.
> I need to work out another option :(

Finally, I've been able of modifying my code generation scripts
without any side effects.
Thanks everyone for the suggestions.

>> Cheers
>> Avlesh
>>
>> On Fri, Oct 30, 2009 at 1:33 PM, Christian López Espínola <
>> penyask...@gmail.com> wrote:
>>
>>> On Fri, Oct 30, 2009 at 2:04 AM, Avlesh Singh  wrote:
>>> >>
>>> >> One thing I thought about is if I can define my own
>>> >> DocumentObjectBinder, so I can concatenate my entity names with the
>>> >> IDs in the XML creation.
>>> >>
>>> >> Anyone knows if something like this can be done without modifying
>>> >> Solrj sources? Is there any injection or plugin mecanism for this?
>>> >>
>>> > More details on the use-case please.
>>>
>>> If I index a Book with ID=3, and then a Magazine with ID=3, I'll be
>>> really removing my Book3 and indexing Magazine3. I want both entities
>>> to be in the index.
>>>
>>> The use case on DocumentObjectBinder is that I could override
>>> toSolrInputDocument, and if field = ID, I could do: setField("id",
>>> obj.getClass().getName() + obj.getId()) or something like that.
>>>
>>> The goal is avoiding creating all the XMLs to be sent to Solr but
>>> having the possibility of modifying them in some way.
>>>
>>> Do you know how can I do that, or a better way of achieving the same
>>> results?
>>>
>>>
>>> > Cheers
>>> > Avlesh
>>> >
>>> > On Fri, Oct 30, 2009 at 2:16 AM, Christian López Espínola <
>>> > penyask...@gmail.com> wrote:
>>> >
>>> >> Hi Israel,
>>> >>
>>> >> Thanks for your suggestion,
>>> >>
>>> >> On Thu, Oct 29, 2009 at 9:37 PM, Israel Ekpo 
>>> wrote:
>>> >> > On Thu, Oct 29, 2009 at 3:31 PM, Christian López Espínola <
>>> >> > penyask...@gmail.com> wrote:
>>> >> >
>>> >> >> Hi, my name is Christian and I'm a newbie introducing to solr (and
>>> >> solrj).
>>> >> >>
>>> >> >> I'm working on a website where I want to index multiple entities,
>>> like
>>> >> >> Book or Magazine.
>>> >> >> The issue I'm facing is both of them have an attribute ID, which I
>>> >> >> want to use as the uniqueKey on my schema, so I cannot identify
>>> >> >> uniquely a document (because ID is saved in a database too, and it's
>>> >> >> autonumeric).
>>> >> >>
>>> >> >> I'm sure that this is a common pattern, but I don't find the way of
>>> >> solving
>>> >> >> it.
>>> >> >>
>>> >> >> How do you usually solve this? Thanks in advance.
>>> >> >>
>>> >> >>
>>> >> >> --
>>> >> >> Cheers,
>>> >> >>
>>> >> >> Christian López Espínola 
>>> >> >>
>>> >> >
>>> >> > Hi Christian,
>>> >> >
>>> >> > It looks like you are bringing in data to Solr from a database where
>>> >> there
>>> >> > are two separate tables.
>>> >> >
>>> >> > One for *Books* and another one for *Magazines*.
>>> >> >
>>> >> > If this is the case, you could define your uniqueKey element in Solr
>>> >> schema
>>> >> > to be a "string" instead of an integer then you can still load
>>> documents
>>> >> > from both the books and magazines database tables but your could
>>> prefix
>>> >> the
>>> >> > uniqueKey field with "B" for books and "M" for magazines
>>> >> >
>>> >> > Like so :
>>> >> >
>>> >> > >> >> > required="true"/>
>>> >> >
>>> >> > id
>>> >> >
>>> >> > Then when loading the books or magazines into Solr you can create the
>>> >> > documents with id fields like this
>>> >> >
>>> >> > 
>>> >> >  
>>> >> >    B14000
>>> >> >  
>>> >> >  
>>> >> >    M14000
>>> >> >  
>>> >> >  
>>> >> >    B14001
>>> >> >  
>>> >> >  
>>> >> >    M14001
>>> >> >  
>>> >> > 
>>> >> >
>>> >> > I hope this helps
>>> >>
>>> >> This was my first thought, but in practice there isn't Book and
>>> >> Magazine, but about 50 different entities, so I'm using the Field
>>> >> annotation of solrj for simplifying my code (it manages for me the XML
>>> >> creation, etc).
>>> >> One thing I thought about is if I can define my own
>>> >> DocumentObjectBinder, so I can concatenate my entity names with the
>>> >> IDs in the XML creation.
>>> >>
>>> >> Anyone knows if something like this can be done without modifying
>>> >> Solrj sources? Is there any injection or plugin mecanism for this?
>>> >>
>>> >> Thanks in advance.
>>> >>
>>> >>
>>> >> > --
>>> >> > "Good Enough" is not good enough.
>>> >> > To give anything less than your best is to sacrifice the gift.
>>> >> > Quality First. Measure Twice. Cut Once.
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >

Re: How to integrate Solr into my project

2009-11-03 Thread Caroline Tan
Thanks guy..
~caroLine


On Tue, Nov 3, 2009 at 7:32 PM, Israel Ekpo  wrote:

> 2009/11/3 Licinio Fernández Maurelo 
>
> > Hi Caroline,
> >
> > i think that you must take an overview tour ;-) , solrj is just a solr
> java
> > client ...
> >
> > Some clues:
> >
> >
> >   - Define your own index schema
> > (it's just like a SQL DDL) .
> >   - There are different ways to put docs in your index:
> >  - SolrJ (Solr client for java env)
> >  - DIH  (Data Import
> >  Handler) this one is prefered when doing a huge data import from
> > DB's, many
> >  source formats are supported.
> >   - Try to perform queries over your fancy-new index ;-). Learn about
> >   searching syntax and
> > faceting
> >   .
> >
> >
> >
> >
> >
> >
> > 2009/11/3 Caroline Tan 
> >
> > > Ya, it's a Java projecti just browse this site you suggested...
> > > http://wiki.apache.org/solr/Solrj
> > >
> > > Which means, i declared the dependancy to solr-solrj and solr-core
> jars,
> > > have those jars added to my project lib and by following the Solrj
> > > tutorial,
> > > i will be able to even index a DB table into Solr as well? thanks
> > >
> > > ~caroLine
> > >
> > >
> > > 2009/11/3 Noble Paul നോബിള്‍ नोब्ळ् 
> > >
> > > > is it a java project ?
> > > > did you see this page http://wiki.apache.org/solr/Solrj ?
> > > >
> > > > On Tue, Nov 3, 2009 at 2:25 PM, Caroline Tan  >
> > > > wrote:
> > > > > Hi,
> > > > > I wish to intergrate Solr into my current working project. I've
> > played
> > > > > around the Solr example and get it started in my tomcat. But the
> next
> > > > step
> > > > > is HOW do i integrate that into my working project? You see,
> Lucence
> > > > > provides API and tutorial on what class i need to instanstiate in
> > order
> > > > to
> > > > > index and search. But Solr seems to be pretty vague on this..as it
> is
> > a
> > > > > working solr search server. Can anybody help me by stating the
> steps
> > by
> > > > > steps, what classes that i should look into in order to assimiliate
> > > Solr
> > > > > into my project?
> > > > > Thanks.
> > > > >
> > > > > regards
> > > > > ~caroLine
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > -
> > > > Noble Paul | Principal Engineer| AOL | http://aol.com
> > > >
> > >
> >
> >
> >
> > --
> > Lici
> >
>
>
> I would also recommend buying the Solr 1.4 Enterprise Search Server.
>
> It will give you some tips
>
>
> http://www.amazon.com/Solr-1-4-Enterprise-Search-Server/dp/1847195881/ref=sr_1_1?ie=UTF8&s=books&qid=1257247932&sr=1-1
> --
> "Good Enough" is not good enough.
> To give anything less than your best is to sacrifice the gift.
> Quality First. Measure Twice. Cut Once.
>


Re: How to integrate Solr into my project

2009-11-03 Thread Israel Ekpo
2009/11/3 Licinio Fernández Maurelo 

> Hi Caroline,
>
> i think that you must take an overview tour ;-) , solrj is just a solr java
> client ...
>
> Some clues:
>
>
>   - Define your own index schema
> (it's just like a SQL DDL) .
>   - There are different ways to put docs in your index:
>  - SolrJ (Solr client for java env)
>  - DIH  (Data Import
>  Handler) this one is prefered when doing a huge data import from
> DB's, many
>  source formats are supported.
>   - Try to perform queries over your fancy-new index ;-). Learn about
>   searching syntax and
> faceting
>   .
>
>
>
>
>
>
> 2009/11/3 Caroline Tan 
>
> > Ya, it's a Java projecti just browse this site you suggested...
> > http://wiki.apache.org/solr/Solrj
> >
> > Which means, i declared the dependancy to solr-solrj and solr-core jars,
> > have those jars added to my project lib and by following the Solrj
> > tutorial,
> > i will be able to even index a DB table into Solr as well? thanks
> >
> > ~caroLine
> >
> >
> > 2009/11/3 Noble Paul നോബിള്‍ नोब्ळ् 
> >
> > > is it a java project ?
> > > did you see this page http://wiki.apache.org/solr/Solrj ?
> > >
> > > On Tue, Nov 3, 2009 at 2:25 PM, Caroline Tan 
> > > wrote:
> > > > Hi,
> > > > I wish to intergrate Solr into my current working project. I've
> played
> > > > around the Solr example and get it started in my tomcat. But the next
> > > step
> > > > is HOW do i integrate that into my working project? You see, Lucence
> > > > provides API and tutorial on what class i need to instanstiate in
> order
> > > to
> > > > index and search. But Solr seems to be pretty vague on this..as it is
> a
> > > > working solr search server. Can anybody help me by stating the steps
> by
> > > > steps, what classes that i should look into in order to assimiliate
> > Solr
> > > > into my project?
> > > > Thanks.
> > > >
> > > > regards
> > > > ~caroLine
> > > >
> > >
> > >
> > >
> > > --
> > > -
> > > Noble Paul | Principal Engineer| AOL | http://aol.com
> > >
> >
>
>
>
> --
> Lici
>


I would also recommend buying the Solr 1.4 Enterprise Search Server.

It will give you some tips

http://www.amazon.com/Solr-1-4-Enterprise-Search-Server/dp/1847195881/ref=sr_1_1?ie=UTF8&s=books&qid=1257247932&sr=1-1
-- 
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


Re: How to integrate Solr into my project

2009-11-03 Thread Licinio Fernández Maurelo
Hi Caroline,

i think that you must take an overview tour ;-) , solrj is just a solr java
client ...

Some clues:


   - Define your own index schema
(it's just like a SQL DDL) .
   - There are different ways to put docs in your index:
  - SolrJ (Solr client for java env)
  - DIH  (Data Import
  Handler) this one is prefered when doing a huge data import from
DB's, many
  source formats are supported.
   - Try to perform queries over your fancy-new index ;-). Learn about
   searching syntax and
faceting
   .






2009/11/3 Caroline Tan 

> Ya, it's a Java projecti just browse this site you suggested...
> http://wiki.apache.org/solr/Solrj
>
> Which means, i declared the dependancy to solr-solrj and solr-core jars,
> have those jars added to my project lib and by following the Solrj
> tutorial,
> i will be able to even index a DB table into Solr as well? thanks
>
> ~caroLine
>
>
> 2009/11/3 Noble Paul നോബിള്‍ नोब्ळ् 
>
> > is it a java project ?
> > did you see this page http://wiki.apache.org/solr/Solrj ?
> >
> > On Tue, Nov 3, 2009 at 2:25 PM, Caroline Tan 
> > wrote:
> > > Hi,
> > > I wish to intergrate Solr into my current working project. I've played
> > > around the Solr example and get it started in my tomcat. But the next
> > step
> > > is HOW do i integrate that into my working project? You see, Lucence
> > > provides API and tutorial on what class i need to instanstiate in order
> > to
> > > index and search. But Solr seems to be pretty vague on this..as it is a
> > > working solr search server. Can anybody help me by stating the steps by
> > > steps, what classes that i should look into in order to assimiliate
> Solr
> > > into my project?
> > > Thanks.
> > >
> > > regards
> > > ~caroLine
> > >
> >
> >
> >
> > --
> > -
> > Noble Paul | Principal Engineer| AOL | http://aol.com
> >
>



-- 
Lici


tf*idf scoring

2009-11-03 Thread Markus Jelsma - Buyways B.V.
Hello list,


I have a question about Lucene's calculation of tf*idf value. I first
noticed that Solr's tf does not compare to tf values based on
calculation elsewhere such as
http://odin.himinbi.org/idf_to_item:item/comparing_tf%3Aidf_to_item%
3Aitem_similarity.xhtml or http://en.wikipedia.org/wiki/Tf%E2%80%93idf 

The tf values returned by Solr are always integers and not normalized
against the length of the corpus whilst the field in which it resides
does not have omitNorms="true". 

Consider the following documents where the field subject is of the
standard text_ws type:



a b c


d e f


x y z


a d x


a e z


c f z



Now, Solr's TermVector results for the first document:


0


1

0
 
3
0.


1

1

1
1.0


1

2

2
0.5





According to different algorithms, the tf for term c would be 3 / 1 =
0.33 instead of 1 returned by Solr. Also, the tf*idf value i get is 0.5
for term c and i get 0.333 for term a. It looks like tf*idf is quotient
of document frequency and term frequency.

If i calculate tf*idf, for term c in the first document, according to
other algorithms it would be:

tf = 3 / 1 = 0.333
idf = ln(6 / 3) = 1.0986
tf*idf = 0.333 * 1.0986 = 0.3658

Can someone explain either the difference demonstrated or tell me what i
am possibly doing wrong?



Cheers,

-  
Markus Jelsma  Buyways B.V.
Technisch ArchitectFriesestraatweg 215c
http://www.buyways.nl  9743 AD Groningen   


Alg. 050-853 6600  KvK  01074105
Tel. 050-853 6620  Fax. 050-3118124
Mob. 06-5025 8350  In: http://www.linkedin.com/in/markus17



RE: CPU utilization and query time high on Solr slave when snapshot install

2009-11-03 Thread biku...@sapient.com
Hi Walter,

When the issue occurred, we did try to set autowarming off, but it did not 
solve the problem. The only thing which worked, was optimizing the slave index.
But, what you say is logical and I will try it again.

But, the basic question I have is, our solr index is not huge by any means. 
Secondly, I have read in wiki etc. that optmize has adverse impact on 
performance and hence should be done once a day. Then what is wrong in our 
case, that is the cause of performance (we serve just 4 req/sec)? Why is 
optimize fixing the issue contrary to normal belief. What will this workaround 
impact us as the index size increase?

Regds,
Bipul 

-Original Message-
From: Walter Underwood [mailto:wun...@wunderwood.org] 
Sent: Monday, November 02, 2009 11:18 PM
To: solr-user@lucene.apache.org
Subject: Re: CPU utilization and query time high on Solr slave when snapshot 
install

If you are going to pull a new index every 10 minutes, try turning off  
cache autowarming.

Your caches are never more than 10 minutes old, so spending a minute  
warming each new cache is a waste of CPU. Autowarm submits queries to  
the new Searcher before putting it in service. This will create a  
burst of query load on the new Searcher, often keeping one CPU pretty  
busy for several seconds.

In solrconfig.xml, set autowarmCount to 0.

Also, if you want the slaves to always have an optimized index, create  
the snapshot only in post-optimize. If you create snapshots in both  
post-commit and post-optimize, you are creating a non-optimized index  
(post-commit), then replacing it with an optimized one a few minutes  
later. A slave might get a non-optimized index one time, then an  
optimized one the next.

wunder

On Nov 2, 2009, at 1:45 AM, biku...@sapient.com wrote:

> Hi Solr Gurus,
>
> We have solr in 1 master, 2 slave configuration. Snapshot is created  
> post commit, post optimization. We have autocommit after 50  
> documents or 5 minutes. Snapshot puller runs as a cron every 10  
> minutes. What we have observed is that whenever snapshot is  
> installed on the slave, we see solrj client used to query slave  
> solr, gets timedout and there is high CPU usage/load avg. on slave  
> server. If we stop snapshot puller, then slaves work with no issues.  
> The system has been running since 2 months and this issue has  
> started to occur only now  when load on website is increasing.
>
> Following are some details:
>
> Solr Details:
> apache-solr Version: 1.3.0
> Lucene - 2.4-dev
>
> Master/Slave configurations:
>
> Master:
> - for indexing data HTTPRequests are made on Solr server.
> - autocommit feature is enabled for 50 docs and 5 minutes
> - caching params are disable for this server
> - mergeFactor of 10 is set
> - we were running optimize script after every 2 hours, but now have  
> reduced the duration to twice a day but issue still persists
>
> Slave1/Slave2:
> - standard requestHandler is being used
> - default values of caching are set
> Machine Specifications:
>
> Master:
> - 4GB RAM
> - 1GB JVM Heap memory is allocated to Solr
>
> Slave1/Slave2:
> - 4GB RAM
> - 2GB JVM Heap memory is allocated to Solr
>
> Master and Slave1 (solr1)are on single box and Slave2(solr2) on  
> different box. We use HAProxy to load balance query requests between  
> 2 slaves. Master is only used for indexing.
> Please let us know if somebody has ever faced similar kind of issue  
> or has some insight into it as we guys are literally struck at the  
> moment with a very unstable production environment.
>
> As a workaround, we have started running optimize on master every 7  
> minutes. This seems to have reduced the severity of the problem but  
> still issue occurs every 2days now. please suggest what could be the  
> root cause of this.
>
> Thanks,
> Bipul
>
>
>
>



Re: Lucene FieldCache memory requirements

2009-11-03 Thread Michael McCandless
On Mon, Nov 2, 2009 at 9:27 PM, Fuad Efendi  wrote:
> I believe this is correct estimate:
>
>> C. [maxdoc] x [4 bytes ~ (int) Lucene Document ID]
>>
>>   same as
>> [String1_Document_Count + ... + String10_Document_Count + ...]
>> x [4 bytes per DocumentID]

That's right.

Except: as Mark said, you'll also need transient memory = pointer (4
or 8 bytes) * (1+maxdoc), while the FieldCache is being loaded.  After
it's done being loaded, this sizes down to the number of unique terms.

But, if Lucene did the basic int packing, which really we should do,
since you only have 10 unique values, with a naive 4 bits per doc
encoding, you'd only need 1/8th the memory usage.  We could do a bit
better by encoding more than one document at a time...

Mike


Re: How to integrate Solr into my project

2009-11-03 Thread Caroline Tan
Ya, it's a Java projecti just browse this site you suggested...
http://wiki.apache.org/solr/Solrj

Which means, i declared the dependancy to solr-solrj and solr-core jars,
have those jars added to my project lib and by following the Solrj tutorial,
i will be able to even index a DB table into Solr as well? thanks

~caroLine


2009/11/3 Noble Paul നോബിള്‍ नोब्ळ् 

> is it a java project ?
> did you see this page http://wiki.apache.org/solr/Solrj ?
>
> On Tue, Nov 3, 2009 at 2:25 PM, Caroline Tan 
> wrote:
> > Hi,
> > I wish to intergrate Solr into my current working project. I've played
> > around the Solr example and get it started in my tomcat. But the next
> step
> > is HOW do i integrate that into my working project? You see, Lucence
> > provides API and tutorial on what class i need to instanstiate in order
> to
> > index and search. But Solr seems to be pretty vague on this..as it is a
> > working solr search server. Can anybody help me by stating the steps by
> > steps, what classes that i should look into in order to assimiliate Solr
> > into my project?
> > Thanks.
> >
> > regards
> > ~caroLine
> >
>
>
>
> --
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
>


Re: How to integrate Solr into my project

2009-11-03 Thread Avlesh Singh
Take a look at this - http://wiki.apache.org/solr/Solrj

Cheers
Avlesh

On Tue, Nov 3, 2009 at 2:25 PM, Caroline Tan  wrote:

> Hi,
> I wish to intergrate Solr into my current working project. I've played
> around the Solr example and get it started in my tomcat. But the next step
> is HOW do i integrate that into my working project? You see, Lucence
> provides API and tutorial on what class i need to instanstiate in order to
> index and search. But Solr seems to be pretty vague on this..as it is a
> working solr search server. Can anybody help me by stating the steps by
> steps, what classes that i should look into in order to assimiliate Solr
> into my project?
> Thanks.
>
> regards
> ~caroLine
>


Re: How to integrate Solr into my project

2009-11-03 Thread Noble Paul നോബിള്‍ नोब्ळ्
is it a java project ?
did you see this page http://wiki.apache.org/solr/Solrj ?

On Tue, Nov 3, 2009 at 2:25 PM, Caroline Tan  wrote:
> Hi,
> I wish to intergrate Solr into my current working project. I've played
> around the Solr example and get it started in my tomcat. But the next step
> is HOW do i integrate that into my working project? You see, Lucence
> provides API and tutorial on what class i need to instanstiate in order to
> index and search. But Solr seems to be pretty vague on this..as it is a
> working solr search server. Can anybody help me by stating the steps by
> steps, what classes that i should look into in order to assimiliate Solr
> into my project?
> Thanks.
>
> regards
> ~caroLine
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: Dismax and Standard Queries together

2009-11-03 Thread Alexey Serba
Hi Ram,

You can add another field total ( catchall field ) and copy all other
fields into this field ( using copyField directive )
http://wiki.apache.org/solr/SchemaXml#Copy_Fields

and use this field in DisMax qf parameter, for example
qf=business_name^2.0 category_name^1.0 sub_category_name^1.0 total^0.0
and
mm=100%

Thus, it requires occurrence of all search keywords in any field of
your document, but you can control relevance of returned results via
boosting in qf parameter.

HIH,
Alex

On Tue, Nov 3, 2009 at 12:02 AM, ram_sj  wrote:
>
> Hi,
>
> I have three fields, business_name, category_name, sub_category_name in my
> solrconfig file.
>
> my query = "pet clinic"
>
> example sub_category_names: Veterinarians, Kennels, Veterinary Clinics
> Hospitals, Pet Grooming, Pet Stores, Clinics
>
> my ideal requirement is dismax searching on
>
> a. dismax over three or two fields
> b. followed by a Boolean match over any one of the field is acceptable.
>
> I played around with minimum match attributes, but doesn't seems to be
> helpful, I guess the dismax requires at-least two fields.
>
> The nest queries takes only one qf filed, so it doesn't help much either.
>
> Any suggestions will be helpful.
>
> Thanks
> Ram
> --
> View this message in context: 
> http://old.nabble.com/Dismax-and-Standard-Queries-together-tp26157830p26157830.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>


Re: weird behaviour while inserting records into solr

2009-11-03 Thread Rakhi Khatwani
HI,
   am inserting the records one by one. 1st i create a solr input
document, add it into solr, perform a commit.
i loop this entire process for a million times.
Regards,
Raakhi

On Wed, Oct 28, 2009 at 1:45 AM, Grant Ingersoll wrote:

>
> On Oct 26, 2009, at 1:14 AM, Rakhi Khatwani wrote:
>
>  Hi,
>>i was trying to insert one million records in solr (keeping the id from
>> 0 to 100). things were fine till it inserted (id =  523932). after
>> that
>> it started inserting it from 1 (i.e updating). i am not able to understand
>> this behaviour. any pointers??
>>
>
> That seems pretty random.  How are you inserting records?
>
>  Regards,
>> Raakhi
>>
>
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>


How to integrate Solr into my project

2009-11-03 Thread Caroline Tan
Hi,
I wish to intergrate Solr into my current working project. I've played
around the Solr example and get it started in my tomcat. But the next step
is HOW do i integrate that into my working project? You see, Lucence
provides API and tutorial on what class i need to instanstiate in order to
index and search. But Solr seems to be pretty vague on this..as it is a
working solr search server. Can anybody help me by stating the steps by
steps, what classes that i should look into in order to assimiliate Solr
into my project?
Thanks.

regards
~caroLine


Re: Problems downloading lucene 2.9.1

2009-11-03 Thread Ian Ibbotson
Heya Ryan...

For me the big problem with adding
http://people.apache.org/~mikemccand/staging-area/rc3_lucene2.9.1/maven/to
my build config is that the artifact names of the interim release are
the
same as the final objects will be.. thus once they are copied to a local
repo maven won't bother to go looking for more recent versions, even if you
blow away that temporary repo. Would it be possible to publish tagged rc-N
releases to a public and more permanent repository where people can
reference them and upgrade to the final release when it's available.

Just a thought, cheers for all your hard work.

Ian.

2009/11/2 Ryan McKinley 

>
> On Nov 2, 2009, at 8:29 AM, Grant Ingersoll wrote:
>
>
>> On Nov 2, 2009, at 12:12 AM, Licinio Fernández Maurelo wrote:
>>
>>  Hi folks,
>>>
>>> as we are using an snapshot dependecy to solr1.4, today we are getting
>>> problems when maven try to download lucene 2.9.1 (there isn't a any 2.9.1
>>> there).
>>>
>>> Which repository can i use to download it?
>>>
>>
>> They won't be there until 2.9.1 is officially released.  We are trying to
>> speed up the Solr release by piggybacking on the Lucene release, but this
>> little bit is the one downside.
>>
>
> Until then, you can add a repo to:
>
> http://people.apache.org/~mikemccand/staging-area/rc3_lucene2.9.1/maven/
>
>
>


Re: Match all terms in doc

2009-11-03 Thread Magnus Eklund


On Nov 1, 2009, at 1:20 AM, AHMET ARSLAN wrote:


Hi

How do I restrict hits to documents containing all words
(regardless of order) of a query in particular field?

Suppose I have two documents with a field called name in my
index:

doc1 => name: Pink
doc2 => name: Pink Floyd

When querying for "Pink" I want only doc1 and when querying
for "Pink Floyd" or "Floyd Pink" I want doc2.

Thanks

- Magnus



I would implement this kind of functionality by preprocessing  
documents and queries to calculate number of unique terms in each  
document and query before I sent them to solr. I would add an extra  
integer field to hold that number.


For example when indexing document

doc1 =>
name: Pink
numberOfuniqueTerms: 1

doc2 =>
name: Pink Floyd
numberOfuniqueTerms: 2

You will set query parser's default operator to AND, that will  
guarantee  that all query terms will appear in returned document.  
And numberOfuniqueTerms criteria will guarantee that returned  
document does not contain any additional terms.


query: pink will be expanded as => name:Pink AND numberOfuniqueTerms:1
query: Pink Floyd will be expanded as  => name:(Pink AND Floyd) AND  
numberOfuniqueTerms:2



Your preporecessor program can use Lucene API, TermVectors. Since  
you are interested only size of it


TermFreqVector nameTV = indexSearcher.getIndexReader 
().getTermFreqVector(docId, "name");

numberOfuniqueTerms = nameTV.size()

should give you that number.

But this requires pre-indexing a document in Lucene using the same  
analyzer defined in schema.xml - just to get number of unique terms  
in it -

Obviously it is not the best solution. And you must use JAVA.


The second solution can be: (without pre-processing and without  
adding integer field)



Since storing term vectors at index time, allows you to access  
termvectors at query time there should be easier way  
[TermVectorComponent] to access a returned document's term vector  
size, but i do not know how to query that size.


http://wiki.apache.org/solr/TermVectorComponent will give you unique  
terms in a particular field of a returned document, but you will  
need to iterate that list to check if it contains all query terms  
and nothing else.





Thank you very much for the reply. Sorry for the late answer, it took  
some time before I had a chance to try your suggestions.


I decided to try your second solution and it works very well!

- Magnus



Re: highlighting error using 1.4rc

2009-11-03 Thread Shalin Shekhar Mangar
Mark, do you have more details on what kind of queries will make this bug
show up?

On Tue, Nov 3, 2009 at 5:33 AM, Mark Miller  wrote:

> Sorry - it was a bug in the backport from trunk to 2.9.1 - didn't
> realize that code didn't get hit because we didn't pass a null field -
> else the tests would have caught it. Fix has been committed but I don't
> know whether it will make 2.9.1 or 1.4 because both have gotten the
> votes and time needed for release.
>
> Mark Miller wrote:
> > Umm - crap. This looks looks like a bug in a fix that just went in. My
> > fault on the review. I'll fix it tonight when I get home -
> > unfortunetly, both lucene and sold are about to be released...
> >
> > - Mark
> >
> > http://www.lucidimagination.com (mobile)
> >
> > On Nov 2, 2009, at 5:17 PM, Jake Brownell  wrote:
> >
> >> Hi,
> >>
> >> I've tried installing the latest (3rd) RC for Solr 1.4 and Lucene
> >> 2.9.1. One of our integration tests, which runs against and embedded
> >> server appears to be failing on highlighting. I've included the stack
> >> trace and the configuration from solrconf. I'd appreciate any
> >> insights. Please let me know what additional information would be
> >> useful.
> >>
> >>
> >> Caused by: org.apache.solr.client.solrj.SolrServerException:
> >> org.apache.solr.client.solrj.SolrServerException:
> >> java.lang.ClassCastException:
> >> org.apache.lucene.search.spans.SpanOrQuery cannot be cast to
> >> org.apache.lucene.search.spans.SpanNearQuery
> >>at
> >>
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:153)
> >>
> >>at
> >>
> org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
> >>
> >>at
> >> org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
> >>at
> >>
> org.bookshare.search.solr.SolrSearchServerWrapper.query(SolrSearchServerWrapper.java:96)
> >>
> >>... 29 more
> >> Caused by: org.apache.solr.client.solrj.SolrServerException:
> >> java.lang.ClassCastException:
> >> org.apache.lucene.search.spans.SpanOrQuery cannot be cast to
> >> org.apache.lucene.search.spans.SpanNearQuery
> >>at
> >>
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:141)
> >>
> >>... 32 more
> >> Caused by: java.lang.ClassCastException:
> >> org.apache.lucene.search.spans.SpanOrQuery cannot be cast to
> >> org.apache.lucene.search.spans.SpanNearQuery
> >>at
> >>
> org.apache.lucene.search.highlight.WeightedSpanTermExtractor.collectSpanQueryFields(WeightedSpanTermExtractor.java:489)
> >>
> >>at
> >>
> org.apache.lucene.search.highlight.WeightedSpanTermExtractor.collectSpanQueryFields(WeightedSpanTermExtractor.java:484)
> >>
> >>at
> >>
> org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extractWeightedSpanTerms(WeightedSpanTermExtractor.java:249)
> >>
> >>at
> >>
> org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:230)
> >>
> >>at
> >>
> org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:158)
> >>
> >>at
> >>
> org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:414)
> >>
> >>at
> >>
> org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:216)
> >>
> >>at
> >>
> org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:184)
> >>
> >>at
> >>
> org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:226)
> >>
> >>at
> >>
> org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:335)
> >>
> >>at
> >>
> org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
> >>
> >>at
> >>
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:203)
> >>
> >>at
> >>
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
> >>
> >>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
> >>at
> >>
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:139)
> >>
> >>... 32 more
> >>
> >> I see in our solrconf the following for highlighting.
> >>
> >>  
> >>   
> >>   
> >>>> class="org.apache.solr.highlight.GapFragmenter" default="true">
> >>
> >> 100
> >>
> >>   
> >>
> >>   
> >>>> class="org.apache.solr.highlight.RegexFragmenter">
> >>
> >>  
> >>  70
> >>  
> >>  0.5
> >>  
> >>  [-\w ,/\n\"']{20,200}
> >>
> >>   
> >>
> >>   
> >>>> class="org.apache.solr.highlight.HtmlFormatter" default="true">
> >>
> >> 
> >> 
> >>
> >>   
> >>  
> >>
> >>
> >>
> >> Thanks,
> >> Jake
>