Re: solr performance problem from 4.3.0 with sorting

2013-06-20 Thread Shane Perry
Ariel,

I just went up against a similar issue with upgrading from 3.6.1 to 4.3.0.
 In my case, my solrconfig.xml for 4.3.0 (which was based on my 3.6.1 file)
did not provide a newSearcher or firstSearcher warming query.  After adding
a query to each listener, my query speeds drastically increased.  Check
your config file and if you aren't defining a query (make sure to sort it
on the field in question) do so.

Shane

On Thu, Jun 20, 2013 at 3:45 AM, Ariel Zerbib ariel.zer...@gmail.comwrote:

 Hi,

 We updated to version 4.3.0 from 4.2.1 and we have some performance
 problem with the sorting.


 A query that returns 1 hits has a query time more than 100ms (can be
 more than 1s) against less than 10ms for the same query without the
 sort parameter:

 query with sorting option:
 q=level_4_id:531044sort=level_4_id+asc
 response:
 - int name=QTime1/int
 - int name=QTime106/int


 query without sorting option: q=level_4_id:531024
 - int name=QTime1/int
 - result name=response numFound=1 start=0

 the field level_4_id is unique and defined as a long.

 In version 4.2.1, the performances were identical. The 4.3.1 version
 has the same behavior than the version 4.3.0.

 Thanks,
 Ariel



Re: Sorting by field is slow

2013-06-17 Thread Shane Perry
Using 4.3.1-SNAPSHOT I have identified where the issue is occurring.  For a
query in the format (it returns one document, sorted by field4)

+(field0:UUID0) -field1:string0 +field2:string1 +field3:text0
+field4:text1


with the field types

fieldType name=uuid class=solr.UUIDField indexed=true/

fieldType name=string class=solr.StrField sortMissingFirst=true
omitNorms=true/

fieldType name=text class=solr.TextField positionIncrementGap=100
  analyzer
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.TrimFilterFactory/
filter class=solr.ICUFoldingFilterFactory/
  /analyzer
/fieldType


the method FieldCacheImpl$SortedDocValuesCache#createValue, the reader
reports 2640449 terms.  As a result, the loop on line 1198 is
executed 2640449 and the inner loop is executed a total of 658310778.  My
index contains 56180128 documents.

My configuration file sets the queries for the newSearcher and
firstSearcher listeners to the value

lst
   str name=qstatic firstSearcher warming in solrconfig.xml/str
   str name=sortfield4/str
/lst


which does not appear to affect the speed.  I'm not sure how replication
plays into the equation outside the fact that we are relatively aggressive
on the replication (every 60 seconds).  I fear I may be at the end of my
knowledge without really getting into the code so any help at this point
would be greatly appreciated.

Shane

On Thu, Jun 13, 2013 at 4:11 PM, Shane Perry thry...@gmail.com wrote:

 I've dug through the code and have narrowed the delay down
 to TopFieldCollector$OneComparatorNonScoringCollector.setNextReader() at
 the point where the comparator's setNextReader() method is called (line 98
 in the lucene_solr_4_3 branch).  That line is actually two method calls so
 I'm not yet certain which path is the cause.  I'll continue to dig through
 the code but am on thin ice so input would be great.

 Shane


 On Thu, Jun 13, 2013 at 7:56 AM, Shane Perry thry...@gmail.com wrote:

 Erick,

 We do have soft commits turned.  Initially, autoCommit was set at 15000
 and autoSoftCommit at 1000.  We did up those to 120 and 60
 respectively.  However, since the core in question is a slave, we don't
 actually do writes to the core but rely on replication only to populate the
 index.  In this case wouldn't autoCommit and autoSoftCommit essentially be
 no-ops?  I thought I had pulled out all hard commits but a double check
 shows one instance where it still occurs.

 Thanks for your time.

 Shane

 On Thu, Jun 13, 2013 at 5:19 AM, Erick Erickson 
 erickerick...@gmail.comwrote:

 Shane:

 You've covered all the config stuff that I can think of. There's one
 other possibility. Do you have the soft commits turned on and are
 they very short? Although soft commits shouldn't invalidate any
 segment-level caches (but I'm not sure whether the sorting buffers
 are low-level or not).

 About the only other thing I can think of is that you're somehow
 doing hard commits from, say, the client but that's really
 stretching.

 All I can really say at this point is that this isn't a problem I've seen
 before, so it's _likely_ some innocent-seeming config has changed.
 I'm sure it'll be obvious once you find it G...

 Erick

 On Wed, Jun 12, 2013 at 11:51 PM, Shane Perry thry...@gmail.com wrote:
  Erick,
 
  I agree, it doesn't make sense.  I manually merged the solrconfig.xml
 from
  the distribution example with my 3.6 solrconfig.xml, pulling out what I
  didn't need.  There is the possibility I removed something I shouldn't
 have
  though I don't know what it would be.  Minus removing the dynamic
 fields, a
  custom tokenizer class, and changing all my fields to be stored, the
  schema.xml file should be the same as well.  I'm not currently in the
  position to do so, but I'll double check those two files.  Finally, the
  data was re-indexed when I moved to 4.3.
 
  My statement about field values wasn't stated very well.  What I meant
 is
  that the 'text' field has more unique terms than some of my other
 fields.
 
  As for this being an edge case, I'm not sure why it would manifest
 itself
  in 4.3 but not in 3.6 (short of me having a screwy configuration
 setting).
   If I get a chance, I'll see if I can duplicate the behavior with a
 small
  document count in a sandboxed environment.
 
  Shane
 
  On Wed, Jun 12, 2013 at 5:14 PM, Erick Erickson 
 erickerick...@gmail.comwrote:
 
  This doesn't make much sense, particularly the fact
  that you added first/new searchers. I'm assuming that
  these are sorting on the same field as your slow query.
 
  But sorting on a text field for which
  Overall, the values of the field are unique
  is a red-flag. Solr doesn't sort on fields that have
  more than one term, so you might as well use a
  string field and be done with it, it's possible you're
  hitting some edge case.
 
  Did you just copy your 3.6 schema and configs

Re: Sorting by field is slow

2013-06-17 Thread Shane Perry
Turns out it was a case of an oversite.  My warming queries weren't setting
the sort order and as a result don't successfully complete.  After setting
the sort order things appear to be responding quickly.

Thanks for the help.

On Mon, Jun 17, 2013 at 9:45 AM, Shane Perry thry...@gmail.com wrote:

 Using 4.3.1-SNAPSHOT I have identified where the issue is occurring.  For
 a query in the format (it returns one document, sorted by field4)

 +(field0:UUID0) -field1:string0 +field2:string1 +field3:text0
 +field4:text1


 with the field types

 fieldType name=uuid class=solr.UUIDField indexed=true/

 fieldType name=string class=solr.StrField sortMissingFirst=true
 omitNorms=true/

 fieldType name=text class=solr.TextField positionIncrementGap=100
   analyzer
 charFilter class=solr.HTMLStripCharFilterFactory/
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StandardFilterFactory/
 filter class=solr.TrimFilterFactory/
 filter class=solr.ICUFoldingFilterFactory/
   /analyzer
 /fieldType


 the method FieldCacheImpl$SortedDocValuesCache#createValue, the reader
 reports 2640449 terms.  As a result, the loop on line 1198 is
 executed 2640449 and the inner loop is executed a total of 658310778.  My
 index contains 56180128 documents.

 My configuration file sets the queries for the newSearcher and
 firstSearcher listeners to the value

 lst
str name=qstatic firstSearcher warming in solrconfig.xml/str
str name=sortfield4/str
 /lst


 which does not appear to affect the speed.  I'm not sure how replication
 plays into the equation outside the fact that we are relatively aggressive
 on the replication (every 60 seconds).  I fear I may be at the end of my
 knowledge without really getting into the code so any help at this point
 would be greatly appreciated.

 Shane

 On Thu, Jun 13, 2013 at 4:11 PM, Shane Perry thry...@gmail.com wrote:

 I've dug through the code and have narrowed the delay down
 to TopFieldCollector$OneComparatorNonScoringCollector.setNextReader() at
 the point where the comparator's setNextReader() method is called (line 98
 in the lucene_solr_4_3 branch).  That line is actually two method calls so
 I'm not yet certain which path is the cause.  I'll continue to dig through
 the code but am on thin ice so input would be great.

 Shane


 On Thu, Jun 13, 2013 at 7:56 AM, Shane Perry thry...@gmail.com wrote:

 Erick,

 We do have soft commits turned.  Initially, autoCommit was set at 15000
 and autoSoftCommit at 1000.  We did up those to 120 and 60
 respectively.  However, since the core in question is a slave, we don't
 actually do writes to the core but rely on replication only to populate the
 index.  In this case wouldn't autoCommit and autoSoftCommit essentially be
 no-ops?  I thought I had pulled out all hard commits but a double check
 shows one instance where it still occurs.

 Thanks for your time.

 Shane

 On Thu, Jun 13, 2013 at 5:19 AM, Erick Erickson erickerick...@gmail.com
  wrote:

 Shane:

 You've covered all the config stuff that I can think of. There's one
 other possibility. Do you have the soft commits turned on and are
 they very short? Although soft commits shouldn't invalidate any
 segment-level caches (but I'm not sure whether the sorting buffers
 are low-level or not).

 About the only other thing I can think of is that you're somehow
 doing hard commits from, say, the client but that's really
 stretching.

 All I can really say at this point is that this isn't a problem I've
 seen
 before, so it's _likely_ some innocent-seeming config has changed.
 I'm sure it'll be obvious once you find it G...

 Erick

 On Wed, Jun 12, 2013 at 11:51 PM, Shane Perry thry...@gmail.com
 wrote:
  Erick,
 
  I agree, it doesn't make sense.  I manually merged the solrconfig.xml
 from
  the distribution example with my 3.6 solrconfig.xml, pulling out what
 I
  didn't need.  There is the possibility I removed something I
 shouldn't have
  though I don't know what it would be.  Minus removing the dynamic
 fields, a
  custom tokenizer class, and changing all my fields to be stored, the
  schema.xml file should be the same as well.  I'm not currently in the
  position to do so, but I'll double check those two files.  Finally,
 the
  data was re-indexed when I moved to 4.3.
 
  My statement about field values wasn't stated very well.  What I
 meant is
  that the 'text' field has more unique terms than some of my other
 fields.
 
  As for this being an edge case, I'm not sure why it would manifest
 itself
  in 4.3 but not in 3.6 (short of me having a screwy configuration
 setting).
   If I get a chance, I'll see if I can duplicate the behavior with a
 small
  document count in a sandboxed environment.
 
  Shane
 
  On Wed, Jun 12, 2013 at 5:14 PM, Erick Erickson 
 erickerick...@gmail.comwrote:
 
  This doesn't make much sense, particularly the fact
  that you added first/new searchers. I'm assuming that
  these are sorting on the same field as your

Re: Sorting by field is slow

2013-06-13 Thread Shane Perry
Erick,

We do have soft commits turned.  Initially, autoCommit was set at 15000 and
autoSoftCommit at 1000.  We did up those to 120 and 60
respectively.  However, since the core in question is a slave, we don't
actually do writes to the core but rely on replication only to populate the
index.  In this case wouldn't autoCommit and autoSoftCommit essentially be
no-ops?  I thought I had pulled out all hard commits but a double check
shows one instance where it still occurs.

Thanks for your time.

Shane

On Thu, Jun 13, 2013 at 5:19 AM, Erick Erickson erickerick...@gmail.comwrote:

 Shane:

 You've covered all the config stuff that I can think of. There's one
 other possibility. Do you have the soft commits turned on and are
 they very short? Although soft commits shouldn't invalidate any
 segment-level caches (but I'm not sure whether the sorting buffers
 are low-level or not).

 About the only other thing I can think of is that you're somehow
 doing hard commits from, say, the client but that's really
 stretching.

 All I can really say at this point is that this isn't a problem I've seen
 before, so it's _likely_ some innocent-seeming config has changed.
 I'm sure it'll be obvious once you find it G...

 Erick

 On Wed, Jun 12, 2013 at 11:51 PM, Shane Perry thry...@gmail.com wrote:
  Erick,
 
  I agree, it doesn't make sense.  I manually merged the solrconfig.xml
 from
  the distribution example with my 3.6 solrconfig.xml, pulling out what I
  didn't need.  There is the possibility I removed something I shouldn't
 have
  though I don't know what it would be.  Minus removing the dynamic
 fields, a
  custom tokenizer class, and changing all my fields to be stored, the
  schema.xml file should be the same as well.  I'm not currently in the
  position to do so, but I'll double check those two files.  Finally, the
  data was re-indexed when I moved to 4.3.
 
  My statement about field values wasn't stated very well.  What I meant is
  that the 'text' field has more unique terms than some of my other fields.
 
  As for this being an edge case, I'm not sure why it would manifest itself
  in 4.3 but not in 3.6 (short of me having a screwy configuration
 setting).
   If I get a chance, I'll see if I can duplicate the behavior with a small
  document count in a sandboxed environment.
 
  Shane
 
  On Wed, Jun 12, 2013 at 5:14 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  This doesn't make much sense, particularly the fact
  that you added first/new searchers. I'm assuming that
  these are sorting on the same field as your slow query.
 
  But sorting on a text field for which
  Overall, the values of the field are unique
  is a red-flag. Solr doesn't sort on fields that have
  more than one term, so you might as well use a
  string field and be done with it, it's possible you're
  hitting some edge case.
 
  Did you just copy your 3.6 schema and configs to
  4.3? Did you re-index?
 
  Best
  Erick
 
  On Wed, Jun 12, 2013 at 5:11 PM, Shane Perry thry...@gmail.com wrote:
   Thanks for the responses.
  
   Setting first/newSearcher had no noticeable effect.  I'm sorting on a
   stored/indexed field named 'text' who's fieldType is solr.TextField.
Overall, the values of the field are unique. The JVM is only using
 about
   2G of the available 12G, so no OOM/GC issue (at least on the surface).
   The
   server is question is a slave with approximately 56 million documents.
Additionally, sorting on a field of the same type but with
 significantly
   less uniqueness results quick response times.
  
   The following is a sample of *debugQuery=true* for a query which
 returns
  1
   document:
  
   lst name=process
 double name=time61458.0/double
 lst name=query
   double name=time61452.0/double
 /lst
 lst name=facet
   double name=time0.0/double
 /lst
 lst name=mlt
   double name=time0.0/double
 /lst
 lst name=highlight
   double name=time0.0/double
 /lst
 lst name=stats
   double name=time0.0/double
 /lst
 lst name=debug
   double name=time6.0/double
 /lst
   /lst
  
  
   -- Update --
  
   Out of desperation, I turned off replication by commenting out the
 *list
   name=slave* element in the replication requestHandler block.  After
   restarting tomcat I was surprised to find that the replication admin
 UI
   still reported the core as replicating.  Search queries were still
 slow.
   I
   then disabled replication via the UI and the display updated to report
  the
   core was no longer replicating.  Queries are now fast so it appears
 that
   the sorting may be a red-herring.
  
   It's may be of note to also mention that the slow queries don't
 appear to
   be getting cached.
  
   Thanks again for the feed back.
  
   On Wed, Jun 12, 2013 at 2:33 PM, Jack Krupansky 
 j...@basetechnology.com
  wrote:
  
   Rerun the sorted query with debugQuery=true and look at the module
   timings. See what stands out
  
   Are you

Re: Sorting by field is slow

2013-06-13 Thread Shane Perry
I've dug through the code and have narrowed the delay down
to TopFieldCollector$OneComparatorNonScoringCollector.setNextReader() at
the point where the comparator's setNextReader() method is called (line 98
in the lucene_solr_4_3 branch).  That line is actually two method calls so
I'm not yet certain which path is the cause.  I'll continue to dig through
the code but am on thin ice so input would be great.

Shane

On Thu, Jun 13, 2013 at 7:56 AM, Shane Perry thry...@gmail.com wrote:

 Erick,

 We do have soft commits turned.  Initially, autoCommit was set at 15000
 and autoSoftCommit at 1000.  We did up those to 120 and 60
 respectively.  However, since the core in question is a slave, we don't
 actually do writes to the core but rely on replication only to populate the
 index.  In this case wouldn't autoCommit and autoSoftCommit essentially be
 no-ops?  I thought I had pulled out all hard commits but a double check
 shows one instance where it still occurs.

 Thanks for your time.

 Shane

 On Thu, Jun 13, 2013 at 5:19 AM, Erick Erickson 
 erickerick...@gmail.comwrote:

 Shane:

 You've covered all the config stuff that I can think of. There's one
 other possibility. Do you have the soft commits turned on and are
 they very short? Although soft commits shouldn't invalidate any
 segment-level caches (but I'm not sure whether the sorting buffers
 are low-level or not).

 About the only other thing I can think of is that you're somehow
 doing hard commits from, say, the client but that's really
 stretching.

 All I can really say at this point is that this isn't a problem I've seen
 before, so it's _likely_ some innocent-seeming config has changed.
 I'm sure it'll be obvious once you find it G...

 Erick

 On Wed, Jun 12, 2013 at 11:51 PM, Shane Perry thry...@gmail.com wrote:
  Erick,
 
  I agree, it doesn't make sense.  I manually merged the solrconfig.xml
 from
  the distribution example with my 3.6 solrconfig.xml, pulling out what I
  didn't need.  There is the possibility I removed something I shouldn't
 have
  though I don't know what it would be.  Minus removing the dynamic
 fields, a
  custom tokenizer class, and changing all my fields to be stored, the
  schema.xml file should be the same as well.  I'm not currently in the
  position to do so, but I'll double check those two files.  Finally, the
  data was re-indexed when I moved to 4.3.
 
  My statement about field values wasn't stated very well.  What I meant
 is
  that the 'text' field has more unique terms than some of my other
 fields.
 
  As for this being an edge case, I'm not sure why it would manifest
 itself
  in 4.3 but not in 3.6 (short of me having a screwy configuration
 setting).
   If I get a chance, I'll see if I can duplicate the behavior with a
 small
  document count in a sandboxed environment.
 
  Shane
 
  On Wed, Jun 12, 2013 at 5:14 PM, Erick Erickson 
 erickerick...@gmail.comwrote:
 
  This doesn't make much sense, particularly the fact
  that you added first/new searchers. I'm assuming that
  these are sorting on the same field as your slow query.
 
  But sorting on a text field for which
  Overall, the values of the field are unique
  is a red-flag. Solr doesn't sort on fields that have
  more than one term, so you might as well use a
  string field and be done with it, it's possible you're
  hitting some edge case.
 
  Did you just copy your 3.6 schema and configs to
  4.3? Did you re-index?
 
  Best
  Erick
 
  On Wed, Jun 12, 2013 at 5:11 PM, Shane Perry thry...@gmail.com
 wrote:
   Thanks for the responses.
  
   Setting first/newSearcher had no noticeable effect.  I'm sorting on a
   stored/indexed field named 'text' who's fieldType is solr.TextField.
Overall, the values of the field are unique. The JVM is only using
 about
   2G of the available 12G, so no OOM/GC issue (at least on the
 surface).
   The
   server is question is a slave with approximately 56 million
 documents.
Additionally, sorting on a field of the same type but with
 significantly
   less uniqueness results quick response times.
  
   The following is a sample of *debugQuery=true* for a query which
 returns
  1
   document:
  
   lst name=process
 double name=time61458.0/double
 lst name=query
   double name=time61452.0/double
 /lst
 lst name=facet
   double name=time0.0/double
 /lst
 lst name=mlt
   double name=time0.0/double
 /lst
 lst name=highlight
   double name=time0.0/double
 /lst
 lst name=stats
   double name=time0.0/double
 /lst
 lst name=debug
   double name=time6.0/double
 /lst
   /lst
  
  
   -- Update --
  
   Out of desperation, I turned off replication by commenting out the
 *list
   name=slave* element in the replication requestHandler block.
  After
   restarting tomcat I was surprised to find that the replication admin
 UI
   still reported the core as replicating.  Search queries were still
 slow.
   I
   then disabled replication via

Sorting by field is slow

2013-06-12 Thread Shane Perry
In upgrading from Solr 3.6.1 to 4.3.0, our query response time has
increased exponentially.  After testing in 4.3.0 it appears the same query
(with 1 matching document) returns after 100 ms without sorting but takes 1
minute when sorting by a text field.  I've looked around but haven't yet
found a reason for the degradation.  Can someone give me some insight or
point me in the right direction for resolving this?  In most cases, I can
change my code to do client-side sorting but I do have a couple of
situations where pagination prevents client-side sorting.  Any help would
be greatly appreciated.

Thanks,

Shane


Re: Sorting by field is slow

2013-06-12 Thread Shane Perry
Thanks for the responses.

Setting first/newSearcher had no noticeable effect.  I'm sorting on a
stored/indexed field named 'text' who's fieldType is solr.TextField.
 Overall, the values of the field are unique. The JVM is only using about
2G of the available 12G, so no OOM/GC issue (at least on the surface).  The
server is question is a slave with approximately 56 million documents.
 Additionally, sorting on a field of the same type but with significantly
less uniqueness results quick response times.

The following is a sample of *debugQuery=true* for a query which returns 1
document:

lst name=process
  double name=time61458.0/double
  lst name=query
double name=time61452.0/double
  /lst
  lst name=facet
double name=time0.0/double
  /lst
  lst name=mlt
double name=time0.0/double
  /lst
  lst name=highlight
double name=time0.0/double
  /lst
  lst name=stats
double name=time0.0/double
  /lst
  lst name=debug
double name=time6.0/double
  /lst
/lst


-- Update --

Out of desperation, I turned off replication by commenting out the *list
name=slave* element in the replication requestHandler block.  After
restarting tomcat I was surprised to find that the replication admin UI
still reported the core as replicating.  Search queries were still slow.  I
then disabled replication via the UI and the display updated to report the
core was no longer replicating.  Queries are now fast so it appears that
the sorting may be a red-herring.

It's may be of note to also mention that the slow queries don't appear to
be getting cached.

Thanks again for the feed back.

On Wed, Jun 12, 2013 at 2:33 PM, Jack Krupansky j...@basetechnology.comwrote:

 Rerun the sorted query with debugQuery=true and look at the module
 timings. See what stands out

 Are you actually sorting on a text field, as opposed to a string field?

 Of course, it's always possible that maybe you're hitting some odd OOM/GC
 condition as a result of Solr growing  between releases.

 -- Jack Krupansky

 -Original Message- From: Shane Perry
 Sent: Wednesday, June 12, 2013 3:00 PM
 To: solr-user@lucene.apache.org
 Subject: Sorting by field is slow


 In upgrading from Solr 3.6.1 to 4.3.0, our query response time has
 increased exponentially.  After testing in 4.3.0 it appears the same query
 (with 1 matching document) returns after 100 ms without sorting but takes 1
 minute when sorting by a text field.  I've looked around but haven't yet
 found a reason for the degradation.  Can someone give me some insight or
 point me in the right direction for resolving this?  In most cases, I can
 change my code to do client-side sorting but I do have a couple of
 situations where pagination prevents client-side sorting.  Any help would
 be greatly appreciated.

 Thanks,

 Shane



Re: Sorting by field is slow

2013-06-12 Thread Shane Perry
Erick,

I agree, it doesn't make sense.  I manually merged the solrconfig.xml from
the distribution example with my 3.6 solrconfig.xml, pulling out what I
didn't need.  There is the possibility I removed something I shouldn't have
though I don't know what it would be.  Minus removing the dynamic fields, a
custom tokenizer class, and changing all my fields to be stored, the
schema.xml file should be the same as well.  I'm not currently in the
position to do so, but I'll double check those two files.  Finally, the
data was re-indexed when I moved to 4.3.

My statement about field values wasn't stated very well.  What I meant is
that the 'text' field has more unique terms than some of my other fields.

As for this being an edge case, I'm not sure why it would manifest itself
in 4.3 but not in 3.6 (short of me having a screwy configuration setting).
 If I get a chance, I'll see if I can duplicate the behavior with a small
document count in a sandboxed environment.

Shane

On Wed, Jun 12, 2013 at 5:14 PM, Erick Erickson erickerick...@gmail.comwrote:

 This doesn't make much sense, particularly the fact
 that you added first/new searchers. I'm assuming that
 these are sorting on the same field as your slow query.

 But sorting on a text field for which
 Overall, the values of the field are unique
 is a red-flag. Solr doesn't sort on fields that have
 more than one term, so you might as well use a
 string field and be done with it, it's possible you're
 hitting some edge case.

 Did you just copy your 3.6 schema and configs to
 4.3? Did you re-index?

 Best
 Erick

 On Wed, Jun 12, 2013 at 5:11 PM, Shane Perry thry...@gmail.com wrote:
  Thanks for the responses.
 
  Setting first/newSearcher had no noticeable effect.  I'm sorting on a
  stored/indexed field named 'text' who's fieldType is solr.TextField.
   Overall, the values of the field are unique. The JVM is only using about
  2G of the available 12G, so no OOM/GC issue (at least on the surface).
  The
  server is question is a slave with approximately 56 million documents.
   Additionally, sorting on a field of the same type but with significantly
  less uniqueness results quick response times.
 
  The following is a sample of *debugQuery=true* for a query which returns
 1
  document:
 
  lst name=process
double name=time61458.0/double
lst name=query
  double name=time61452.0/double
/lst
lst name=facet
  double name=time0.0/double
/lst
lst name=mlt
  double name=time0.0/double
/lst
lst name=highlight
  double name=time0.0/double
/lst
lst name=stats
  double name=time0.0/double
/lst
lst name=debug
  double name=time6.0/double
/lst
  /lst
 
 
  -- Update --
 
  Out of desperation, I turned off replication by commenting out the *list
  name=slave* element in the replication requestHandler block.  After
  restarting tomcat I was surprised to find that the replication admin UI
  still reported the core as replicating.  Search queries were still slow.
  I
  then disabled replication via the UI and the display updated to report
 the
  core was no longer replicating.  Queries are now fast so it appears that
  the sorting may be a red-herring.
 
  It's may be of note to also mention that the slow queries don't appear to
  be getting cached.
 
  Thanks again for the feed back.
 
  On Wed, Jun 12, 2013 at 2:33 PM, Jack Krupansky j...@basetechnology.com
 wrote:
 
  Rerun the sorted query with debugQuery=true and look at the module
  timings. See what stands out
 
  Are you actually sorting on a text field, as opposed to a string
 field?
 
  Of course, it's always possible that maybe you're hitting some odd
 OOM/GC
  condition as a result of Solr growing  between releases.
 
  -- Jack Krupansky
 
  -Original Message- From: Shane Perry
  Sent: Wednesday, June 12, 2013 3:00 PM
  To: solr-user@lucene.apache.org
  Subject: Sorting by field is slow
 
 
  In upgrading from Solr 3.6.1 to 4.3.0, our query response time has
  increased exponentially.  After testing in 4.3.0 it appears the same
 query
  (with 1 matching document) returns after 100 ms without sorting but
 takes 1
  minute when sorting by a text field.  I've looked around but haven't yet
  found a reason for the degradation.  Can someone give me some insight or
  point me in the right direction for resolving this?  In most cases, I
 can
  change my code to do client-side sorting but I do have a couple of
  situations where pagination prevents client-side sorting.  Any help
 would
  be greatly appreciated.
 
  Thanks,
 
  Shane
 



Inaccurate wiki documentation?

2013-05-20 Thread Shane Perry
I am in the process of setting up a core using Solr 4.3.  On the Core
Discoveryhttp://wiki.apache.org/solr/Core%20Discovery%20(4.3%20and%20beyond)
wiki
page it states:

As of SOLR-4196, there's a new way of defining cores. Essentially, it is no
longer necessary to define cores in solr.xml. In fact, solr.xml is no
longer necessary at all and will be obsoleted in Solr 5.x. As of Solr 4.3
the process is as follows:


   - If a solr.xml file is found in SOLR_HOME, then it is expected to be
  the old-style solr.xml that defines cores etc.
  - If there is no solr.xml but there is a solr.properties file, then
  exploration-based core enumeration is assumed.
  - If neither a solr.xml nor an solr.properties file is found, a
  default solr.xml file is assumed. NOTE: as of 5.0, this will not be true
  and an error will be thrown if no solr.properties file is found.

Using the 4.3 war available for download, I attempted to set up my core
using the solr.properties file (in anticipation of moving to 5.0).  When I
start the context, logging shows that the process is falling back to the
default solr.xml file (essentially the second bullet does not occur).
 After digging through the 4_3 branch it looks like solr.properties is not
yet part of the library.  Am I missing something (I'm able to get the
context started using a solr.xml file with solr/solr as the contents)?

I'm going with a basic solr.xml for now, but any insight would be
appreciated.

Thanks in advance.


Outstanding Jira issue

2013-05-08 Thread Shane Perry
I opened a Jira issue in Oct of 2011 which is still outstanding. I've
boosted the priority to Critical as each time I've upgraded Solr, I've had
to manually patch and build the jars.   There is a patch (for 3.6) attached
to the ticket. Is there someone with commit access who can take a look and
poke the fix through (preferably on 4.2 as well as 4.3)?  The ticket is
https://issues.apache.org/jira/browse/SOLR-2834.

Thanks in advance.

Shane


Re: Outstanding Jira issue

2013-05-08 Thread Shane Perry
Yeah, I realize my fix is more a bandage.  While it wouldn't be a good
long-term solution, how about going the path of ignoring unrecognized types
and logging a warning message so the handler does crash.  The Jira ticket
could then be left open (and hopefully assigned) to fix the actual problem.
 This would allow consumers from having to avoid the scenario or manually
patching the file to ignore the problem.

On Wed, May 8, 2013 at 11:49 AM, Shawn Heisey s...@elyograg.org wrote:

 On 5/8/2013 9:20 AM, Shane Perry wrote:

 I opened a Jira issue in Oct of 2011 which is still outstanding. I've
 boosted the priority to Critical as each time I've upgraded Solr, I've had
 to manually patch and build the jars.   There is a patch (for 3.6)
 attached
 to the ticket. Is there someone with commit access who can take a look and
 poke the fix through (preferably on 4.2 as well as 4.3)?  The ticket is
 https://issues.apache.org/**jira/browse/SOLR-2834https://issues.apache.org/jira/browse/SOLR-2834
 .


 Your patch just ignores the problem so the request doesn't crash, it
 doesn't fix it.  We need to fix whatever the problem is in
 HTMLStripCharFilter.

 I had hoped I could come up with a quick fix, but it's proving too
 difficult for me to unravel.  I can't even figure out it works on good
 analysis components like WhiteSpaceTokenizer, so I definitely can't see
 what the problem is for HTMLStripCharFilter.

 Thanks,
 Shawn




ICUTokenizer ArrayIndexOutOfBounds

2012-10-17 Thread Shane Perry
Hi,

I've been playing around with using the ICUTokenizer from 4.0.0.
Using the code below, I was receiving an ArrayIndexOutOfBounds
exception on the call to tokenizer.incrementToken().  Looking at the
ICUTokenizer source, I can see why this is occuring (usableLength
defaults to -1).

ICUTokenizer tokenizer = new ICUTokenizer(myReader);
CharTermAttribute termAtt = 
tokenizer.getAttribute(CharTermAttribute.class);

while(tokenizer.incrementToken())
{
System.out.println(termAtt.toString());
}

After poking around a little more, I found that I can just call
tokenizer.reset() (initializes usableLength to 0) right after
constructing the object
(org.apache.lucene.analysis.icu.segmentation.TestICUTokenizer does a
similar step in it's super class).  I was wondering if someone could
explain why I need to call tokenizer.reset() prior to using the
tokenizer for the first time.

Thanks in advance,

Shane


ClassCastException when using FieldAnalysisRequest

2011-10-14 Thread Shane Perry
Hi,

Using Solr 3.4.0, I am trying to do a field analysis via the
FieldAnalysisRequest feature in solrj.  During the process() call, the
following ClassCastException is thrown:

java.lang.ClassCastException: java.lang.String cannot be cast to java.util.List
       at 
org.apache.solr.client.solrj.response.AnalysisResponseBase.buildPhases(AnalysisResponseBase.java:69)
       at 
org.apache.solr.client.solrj.response.FieldAnalysisResponse.setResponse(FieldAnalysisResponse.java:66)
       at 
org.apache.solr.client.solrj.request.FieldAnalysisRequest.process(FieldAnalysisRequest.java:107)

My code is as follows:

FieldAnalysisRequest request = new FieldAnalysisRequest(myUri).
  addFieldName(field).
  setFieldValue(text).
  setQuery(text);

request.process(myServer);

Is there something I am doing wrong?  Any help would be appreciated.

Sincerely,

Shane


Re: ClassCastException when using FieldAnalysisRequest

2011-10-14 Thread Shane Perry
After looking at this more, it appears that
solr.HTMLStripCharFilterFactory does not return a list which
AnalysisResponseBase is expecting.  I have created a bug ticket
(https://issues.apache.org/jira/browse/SOLR-2834)

On Fri, Oct 14, 2011 at 8:28 AM, Shane Perry thry...@gmail.com wrote:
 Hi,

 Using Solr 3.4.0, I am trying to do a field analysis via the
 FieldAnalysisRequest feature in solrj.  During the process() call, the
 following ClassCastException is thrown:

 java.lang.ClassCastException: java.lang.String cannot be cast to 
 java.util.List
        at 
 org.apache.solr.client.solrj.response.AnalysisResponseBase.buildPhases(AnalysisResponseBase.java:69)
        at 
 org.apache.solr.client.solrj.response.FieldAnalysisResponse.setResponse(FieldAnalysisResponse.java:66)
        at 
 org.apache.solr.client.solrj.request.FieldAnalysisRequest.process(FieldAnalysisRequest.java:107)

 My code is as follows:

 FieldAnalysisRequest request = new FieldAnalysisRequest(myUri).
  addFieldName(field).
  setFieldValue(text).
  setQuery(text);

 request.process(myServer);

 Is there something I am doing wrong?  Any help would be appreciated.

 Sincerely,

 Shane



Re: Omit hour-min-sec in search?

2011-03-03 Thread Shane Perry
Not sure if there is a means of doing explicitly what you ask, but you
could do a date range:

+mydate:[-MM-DD 0:0:0 TO -MM-DD 11:59:59]

On Thu, Mar 3, 2011 at 9:14 AM, bbarani bbar...@gmail.com wrote:
 Hi,

 Is there a way to omit hour-min-sec in SOLR date field during search?

 I have indexed a field using TrieDateField and seems like it uses UTC
 format. The dates get stored as below,

 lastupdateddate2008-02-26T20:40:30.94Z

 I want to do a search based on just -MM-DD and omit T20:40:30.94Z.. Not
 sure if its feasible, just want to check if its possible.

 Also most of the data in our source doesnt have time information hence we
 are very much interested in just storing the date without time or even if
 its stored with some default timestamp we want to search just using date
 without using the timestamp.

 Thanks,
 Barani



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Omit-hour-min-sec-in-search-tp2625840p2625840.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Writing on master while replicating to slave

2011-02-10 Thread Shane Perry
Hi,

When a slave is replicating from the master instance, it appears a
write lock is created. Will this lock cause issues with writing to the
master while the replication is occurring or does SOLR have some
queuing that occurs to prevent the actual write until the replication
is complete?  I've been looking around but can't seem to find anything
definitive.

My application's data is user centric and as a result the application
does a lot of updates and commits.  Additionally, we want to provide
near real-time searching and so replication would have to occur
aggressively.  Does anybody have any strategies for handling such an
application which they would be willing to share?

Thanks,

Shane


Re: rejected email

2011-02-10 Thread Shane Perry
I tried posting from gmail this morning and had it rejected.  When I
resent as plaintext, it went through.

On Thu, Feb 10, 2011 at 11:51 AM, Erick Erickson
erickerick...@gmail.com wrote:
 Anyone else having problems with the Solr users list suddenly deciding
 everything you send is spam? For the last couple of days I've had this
 happening from gmail, and as far as I know I haven't changed anything that
 would give my mails a different spam score which is being exceeded
 according to the bounced message...

 Thanks,
 Erick



Re: DIH - Closing ResultSet in JdbcDataSource

2011-01-12 Thread Shane Perry
I have found where a root entity has completed processing and added the
logic to clear the entity's cache at that point (didn't change any of the
logic for clearing all entity caches once the import has completed).  I have
also created an enhancement request found at
https://issues.apache.org/jira/browse/SOLR-2313.

On Tue, Jan 11, 2011 at 2:54 PM, Shane Perry thry...@gmail.com wrote:

 By placing some strategic debug messages, I have found that the JDBC
 connections are not being closed until all entity elements have been
 processed (in the entire config file).  A simplified example would be:

 dataConfig
   dataSource name=ds1 driver=org.postgresql.Driver
 url=jdbc:postgresql://localhost:5432/db1 user=... password=... /
   dataSource name=ds2 driver=org.postgresql.Driver
 url=jdbc:postgresql://localhost:5432/db2 user=... password=... /

   document
 entity name=entity1 datasource=ds1 ...
   ... field list ...
   entity name=entity1a datasource=ds1 ...
 ... field list ...
   /entity
/entity
 entity name=entity2 datasource=ds2 ...
   ... field list ...
   entity name=entity2a datasource=ds2 ...
 ... field list ...
   /entity
/entity
   /document
 /dataConfig

 The behavior is:

 JDBC connection opened for entity1 and entity1a - Applicable queries run
 and ResultSet objects processed
 All open ResultSet and Statement objects closed for entity1 and entity1a
 JDBC connection opened for entity2 and entity2a - Applicable queries run
 and ResultSet objects processed
 All open ResultSet and Statement objects closed for entity2 and entity2a
 All JDBC connections (none are closed at this point) are closed.

 In my instance, I have some 95 unique entity elements (19 parents with 5
 children each), resulting in 95 open JDBC connections.  If I understand the
 process correctly, it should be safe to close the JDBC connection for a
 root entity (immediate children of document) and all descendant
 entity elements once the parent has been successfully completed.  I have
 been digging around the code, but due to my unfamiliarity with the code, I'm
 not sure where this would occur.

 Is this a valid solution?  It's looking like I should probably open a
 defect and I'm willing to do so along with submitting a patch, but need a
 little more direction on where the fix would best reside.

 Thanks,

 Shane



 On Mon, Jan 10, 2011 at 7:14 AM, Shane Perry thry...@gmail.com wrote:

 Gora,

 Thanks for the response.  After taking another look, you are correct about
 the hasnext() closing the ResultSet object (1.4.1 as well as 1.4.0).  I
 didn't recognize the case difference in the two function calls, so missed
 it.  I'll keep looking into the original issue and reply if I find a
 cause/solution.

 Shane


 On Sat, Jan 8, 2011 at 4:04 AM, Gora Mohanty g...@mimirtech.com wrote:

 On Sat, Jan 8, 2011 at 1:10 AM, Shane Perry thry...@gmail.com wrote:
  Hi,
 
  I am in the process of migrating our system from Postgres 8.4 to Solr
  1.4.1.  Our system is fairly complex and as a result, I have had to
 define
  19 base entities in the data-config.xml definition file.  Each of these
  entities executes 5 queries.  When doing a full-import, as each entity
  completes, the server hosting Postgres shows 5 idle in transaction
 for the
  entity.
 
  In digging through the code, I found that the JdbcDataSource wraps the
  ResultSet object in a custom ResultSetIterator object, leaving the
 ResultSet
  open.  Walking through the code I can't find a close() call anywhere on
 the
  ResultSet.  I believe this results in the idle in transaction
 processes.
 [...]

 Have not examined the idle in transaction issue that you
 mention, but the ResultSet object in a ResultSetIterator is
 closed in the private hasnext() method, when there are no
 more results, or if there is an exception. hasnext() is called
 by the public hasNext() method that should be used in
 iterating over the results, so I see no issue there.

 Regards,
 Gora

 P.S. This is from Solr 1.4.0 code, but I would not think that
this part of the code would have changed.






Re: DIH - Closing ResultSet in JdbcDataSource

2011-01-11 Thread Shane Perry
By placing some strategic debug messages, I have found that the JDBC
connections are not being closed until all entity elements have been
processed (in the entire config file).  A simplified example would be:

dataConfig
  dataSource name=ds1 driver=org.postgresql.Driver
url=jdbc:postgresql://localhost:5432/db1 user=... password=... /
  dataSource name=ds2 driver=org.postgresql.Driver
url=jdbc:postgresql://localhost:5432/db2 user=... password=... /

  document
entity name=entity1 datasource=ds1 ...
  ... field list ...
  entity name=entity1a datasource=ds1 ...
... field list ...
  /entity
   /entity
entity name=entity2 datasource=ds2 ...
  ... field list ...
  entity name=entity2a datasource=ds2 ...
... field list ...
  /entity
   /entity
  /document
/dataConfig

The behavior is:

JDBC connection opened for entity1 and entity1a - Applicable queries run and
ResultSet objects processed
All open ResultSet and Statement objects closed for entity1 and entity1a
JDBC connection opened for entity2 and entity2a - Applicable queries run and
ResultSet objects processed
All open ResultSet and Statement objects closed for entity2 and entity2a
All JDBC connections (none are closed at this point) are closed.

In my instance, I have some 95 unique entity elements (19 parents with 5
children each), resulting in 95 open JDBC connections.  If I understand the
process correctly, it should be safe to close the JDBC connection for a
root entity (immediate children of document) and all descendant
entity elements once the parent has been successfully completed.  I have
been digging around the code, but due to my unfamiliarity with the code, I'm
not sure where this would occur.

Is this a valid solution?  It's looking like I should probably open a defect
and I'm willing to do so along with submitting a patch, but need a little
more direction on where the fix would best reside.

Thanks,

Shane


On Mon, Jan 10, 2011 at 7:14 AM, Shane Perry thry...@gmail.com wrote:

 Gora,

 Thanks for the response.  After taking another look, you are correct about
 the hasnext() closing the ResultSet object (1.4.1 as well as 1.4.0).  I
 didn't recognize the case difference in the two function calls, so missed
 it.  I'll keep looking into the original issue and reply if I find a
 cause/solution.

 Shane


 On Sat, Jan 8, 2011 at 4:04 AM, Gora Mohanty g...@mimirtech.com wrote:

 On Sat, Jan 8, 2011 at 1:10 AM, Shane Perry thry...@gmail.com wrote:
  Hi,
 
  I am in the process of migrating our system from Postgres 8.4 to Solr
  1.4.1.  Our system is fairly complex and as a result, I have had to
 define
  19 base entities in the data-config.xml definition file.  Each of these
  entities executes 5 queries.  When doing a full-import, as each entity
  completes, the server hosting Postgres shows 5 idle in transaction for
 the
  entity.
 
  In digging through the code, I found that the JdbcDataSource wraps the
  ResultSet object in a custom ResultSetIterator object, leaving the
 ResultSet
  open.  Walking through the code I can't find a close() call anywhere on
 the
  ResultSet.  I believe this results in the idle in transaction
 processes.
 [...]

 Have not examined the idle in transaction issue that you
 mention, but the ResultSet object in a ResultSetIterator is
 closed in the private hasnext() method, when there are no
 more results, or if there is an exception. hasnext() is called
 by the public hasNext() method that should be used in
 iterating over the results, so I see no issue there.

 Regards,
 Gora

 P.S. This is from Solr 1.4.0 code, but I would not think that
this part of the code would have changed.





Re: DIH - Closing ResultSet in JdbcDataSource

2011-01-10 Thread Shane Perry
Gora,

Thanks for the response.  After taking another look, you are correct about
the hasnext() closing the ResultSet object (1.4.1 as well as 1.4.0).  I
didn't recognize the case difference in the two function calls, so missed
it.  I'll keep looking into the original issue and reply if I find a
cause/solution.

Shane

On Sat, Jan 8, 2011 at 4:04 AM, Gora Mohanty g...@mimirtech.com wrote:

 On Sat, Jan 8, 2011 at 1:10 AM, Shane Perry thry...@gmail.com wrote:
  Hi,
 
  I am in the process of migrating our system from Postgres 8.4 to Solr
  1.4.1.  Our system is fairly complex and as a result, I have had to
 define
  19 base entities in the data-config.xml definition file.  Each of these
  entities executes 5 queries.  When doing a full-import, as each entity
  completes, the server hosting Postgres shows 5 idle in transaction for
 the
  entity.
 
  In digging through the code, I found that the JdbcDataSource wraps the
  ResultSet object in a custom ResultSetIterator object, leaving the
 ResultSet
  open.  Walking through the code I can't find a close() call anywhere on
 the
  ResultSet.  I believe this results in the idle in transaction
 processes.
 [...]

 Have not examined the idle in transaction issue that you
 mention, but the ResultSet object in a ResultSetIterator is
 closed in the private hasnext() method, when there are no
 more results, or if there is an exception. hasnext() is called
 by the public hasNext() method that should be used in
 iterating over the results, so I see no issue there.

 Regards,
 Gora

 P.S. This is from Solr 1.4.0 code, but I would not think that
this part of the code would have changed.



DIH - Closing ResultSet in JdbcDataSource

2011-01-07 Thread Shane Perry
Hi,

I am in the process of migrating our system from Postgres 8.4 to Solr
1.4.1.  Our system is fairly complex and as a result, I have had to define
19 base entities in the data-config.xml definition file.  Each of these
entities executes 5 queries.  When doing a full-import, as each entity
completes, the server hosting Postgres shows 5 idle in transaction for the
entity.

In digging through the code, I found that the JdbcDataSource wraps the
ResultSet object in a custom ResultSetIterator object, leaving the ResultSet
open.  Walking through the code I can't find a close() call anywhere on the
ResultSet.  I believe this results in the idle in transaction processes.

Am I off base here?  I'm not sure what the overall implications are of the
idle in transaction processes, but is there a way I can get around the
issue without importing each entity manually?  Any feedback would be greatly
appreciated.

Thanks in advance,

Shane