Re: solr performance problem from 4.3.0 with sorting
Ariel, I just went up against a similar issue with upgrading from 3.6.1 to 4.3.0. In my case, my solrconfig.xml for 4.3.0 (which was based on my 3.6.1 file) did not provide a newSearcher or firstSearcher warming query. After adding a query to each listener, my query speeds drastically increased. Check your config file and if you aren't defining a query (make sure to sort it on the field in question) do so. Shane On Thu, Jun 20, 2013 at 3:45 AM, Ariel Zerbib ariel.zer...@gmail.comwrote: Hi, We updated to version 4.3.0 from 4.2.1 and we have some performance problem with the sorting. A query that returns 1 hits has a query time more than 100ms (can be more than 1s) against less than 10ms for the same query without the sort parameter: query with sorting option: q=level_4_id:531044sort=level_4_id+asc response: - int name=QTime1/int - int name=QTime106/int query without sorting option: q=level_4_id:531024 - int name=QTime1/int - result name=response numFound=1 start=0 the field level_4_id is unique and defined as a long. In version 4.2.1, the performances were identical. The 4.3.1 version has the same behavior than the version 4.3.0. Thanks, Ariel
Re: Sorting by field is slow
Using 4.3.1-SNAPSHOT I have identified where the issue is occurring. For a query in the format (it returns one document, sorted by field4) +(field0:UUID0) -field1:string0 +field2:string1 +field3:text0 +field4:text1 with the field types fieldType name=uuid class=solr.UUIDField indexed=true/ fieldType name=string class=solr.StrField sortMissingFirst=true omitNorms=true/ fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.ICUFoldingFilterFactory/ /analyzer /fieldType the method FieldCacheImpl$SortedDocValuesCache#createValue, the reader reports 2640449 terms. As a result, the loop on line 1198 is executed 2640449 and the inner loop is executed a total of 658310778. My index contains 56180128 documents. My configuration file sets the queries for the newSearcher and firstSearcher listeners to the value lst str name=qstatic firstSearcher warming in solrconfig.xml/str str name=sortfield4/str /lst which does not appear to affect the speed. I'm not sure how replication plays into the equation outside the fact that we are relatively aggressive on the replication (every 60 seconds). I fear I may be at the end of my knowledge without really getting into the code so any help at this point would be greatly appreciated. Shane On Thu, Jun 13, 2013 at 4:11 PM, Shane Perry thry...@gmail.com wrote: I've dug through the code and have narrowed the delay down to TopFieldCollector$OneComparatorNonScoringCollector.setNextReader() at the point where the comparator's setNextReader() method is called (line 98 in the lucene_solr_4_3 branch). That line is actually two method calls so I'm not yet certain which path is the cause. I'll continue to dig through the code but am on thin ice so input would be great. Shane On Thu, Jun 13, 2013 at 7:56 AM, Shane Perry thry...@gmail.com wrote: Erick, We do have soft commits turned. Initially, autoCommit was set at 15000 and autoSoftCommit at 1000. We did up those to 120 and 60 respectively. However, since the core in question is a slave, we don't actually do writes to the core but rely on replication only to populate the index. In this case wouldn't autoCommit and autoSoftCommit essentially be no-ops? I thought I had pulled out all hard commits but a double check shows one instance where it still occurs. Thanks for your time. Shane On Thu, Jun 13, 2013 at 5:19 AM, Erick Erickson erickerick...@gmail.comwrote: Shane: You've covered all the config stuff that I can think of. There's one other possibility. Do you have the soft commits turned on and are they very short? Although soft commits shouldn't invalidate any segment-level caches (but I'm not sure whether the sorting buffers are low-level or not). About the only other thing I can think of is that you're somehow doing hard commits from, say, the client but that's really stretching. All I can really say at this point is that this isn't a problem I've seen before, so it's _likely_ some innocent-seeming config has changed. I'm sure it'll be obvious once you find it G... Erick On Wed, Jun 12, 2013 at 11:51 PM, Shane Perry thry...@gmail.com wrote: Erick, I agree, it doesn't make sense. I manually merged the solrconfig.xml from the distribution example with my 3.6 solrconfig.xml, pulling out what I didn't need. There is the possibility I removed something I shouldn't have though I don't know what it would be. Minus removing the dynamic fields, a custom tokenizer class, and changing all my fields to be stored, the schema.xml file should be the same as well. I'm not currently in the position to do so, but I'll double check those two files. Finally, the data was re-indexed when I moved to 4.3. My statement about field values wasn't stated very well. What I meant is that the 'text' field has more unique terms than some of my other fields. As for this being an edge case, I'm not sure why it would manifest itself in 4.3 but not in 3.6 (short of me having a screwy configuration setting). If I get a chance, I'll see if I can duplicate the behavior with a small document count in a sandboxed environment. Shane On Wed, Jun 12, 2013 at 5:14 PM, Erick Erickson erickerick...@gmail.comwrote: This doesn't make much sense, particularly the fact that you added first/new searchers. I'm assuming that these are sorting on the same field as your slow query. But sorting on a text field for which Overall, the values of the field are unique is a red-flag. Solr doesn't sort on fields that have more than one term, so you might as well use a string field and be done with it, it's possible you're hitting some edge case. Did you just copy your 3.6 schema and configs
Re: Sorting by field is slow
Turns out it was a case of an oversite. My warming queries weren't setting the sort order and as a result don't successfully complete. After setting the sort order things appear to be responding quickly. Thanks for the help. On Mon, Jun 17, 2013 at 9:45 AM, Shane Perry thry...@gmail.com wrote: Using 4.3.1-SNAPSHOT I have identified where the issue is occurring. For a query in the format (it returns one document, sorted by field4) +(field0:UUID0) -field1:string0 +field2:string1 +field3:text0 +field4:text1 with the field types fieldType name=uuid class=solr.UUIDField indexed=true/ fieldType name=string class=solr.StrField sortMissingFirst=true omitNorms=true/ fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.TrimFilterFactory/ filter class=solr.ICUFoldingFilterFactory/ /analyzer /fieldType the method FieldCacheImpl$SortedDocValuesCache#createValue, the reader reports 2640449 terms. As a result, the loop on line 1198 is executed 2640449 and the inner loop is executed a total of 658310778. My index contains 56180128 documents. My configuration file sets the queries for the newSearcher and firstSearcher listeners to the value lst str name=qstatic firstSearcher warming in solrconfig.xml/str str name=sortfield4/str /lst which does not appear to affect the speed. I'm not sure how replication plays into the equation outside the fact that we are relatively aggressive on the replication (every 60 seconds). I fear I may be at the end of my knowledge without really getting into the code so any help at this point would be greatly appreciated. Shane On Thu, Jun 13, 2013 at 4:11 PM, Shane Perry thry...@gmail.com wrote: I've dug through the code and have narrowed the delay down to TopFieldCollector$OneComparatorNonScoringCollector.setNextReader() at the point where the comparator's setNextReader() method is called (line 98 in the lucene_solr_4_3 branch). That line is actually two method calls so I'm not yet certain which path is the cause. I'll continue to dig through the code but am on thin ice so input would be great. Shane On Thu, Jun 13, 2013 at 7:56 AM, Shane Perry thry...@gmail.com wrote: Erick, We do have soft commits turned. Initially, autoCommit was set at 15000 and autoSoftCommit at 1000. We did up those to 120 and 60 respectively. However, since the core in question is a slave, we don't actually do writes to the core but rely on replication only to populate the index. In this case wouldn't autoCommit and autoSoftCommit essentially be no-ops? I thought I had pulled out all hard commits but a double check shows one instance where it still occurs. Thanks for your time. Shane On Thu, Jun 13, 2013 at 5:19 AM, Erick Erickson erickerick...@gmail.com wrote: Shane: You've covered all the config stuff that I can think of. There's one other possibility. Do you have the soft commits turned on and are they very short? Although soft commits shouldn't invalidate any segment-level caches (but I'm not sure whether the sorting buffers are low-level or not). About the only other thing I can think of is that you're somehow doing hard commits from, say, the client but that's really stretching. All I can really say at this point is that this isn't a problem I've seen before, so it's _likely_ some innocent-seeming config has changed. I'm sure it'll be obvious once you find it G... Erick On Wed, Jun 12, 2013 at 11:51 PM, Shane Perry thry...@gmail.com wrote: Erick, I agree, it doesn't make sense. I manually merged the solrconfig.xml from the distribution example with my 3.6 solrconfig.xml, pulling out what I didn't need. There is the possibility I removed something I shouldn't have though I don't know what it would be. Minus removing the dynamic fields, a custom tokenizer class, and changing all my fields to be stored, the schema.xml file should be the same as well. I'm not currently in the position to do so, but I'll double check those two files. Finally, the data was re-indexed when I moved to 4.3. My statement about field values wasn't stated very well. What I meant is that the 'text' field has more unique terms than some of my other fields. As for this being an edge case, I'm not sure why it would manifest itself in 4.3 but not in 3.6 (short of me having a screwy configuration setting). If I get a chance, I'll see if I can duplicate the behavior with a small document count in a sandboxed environment. Shane On Wed, Jun 12, 2013 at 5:14 PM, Erick Erickson erickerick...@gmail.comwrote: This doesn't make much sense, particularly the fact that you added first/new searchers. I'm assuming that these are sorting on the same field as your
Re: Sorting by field is slow
Erick, We do have soft commits turned. Initially, autoCommit was set at 15000 and autoSoftCommit at 1000. We did up those to 120 and 60 respectively. However, since the core in question is a slave, we don't actually do writes to the core but rely on replication only to populate the index. In this case wouldn't autoCommit and autoSoftCommit essentially be no-ops? I thought I had pulled out all hard commits but a double check shows one instance where it still occurs. Thanks for your time. Shane On Thu, Jun 13, 2013 at 5:19 AM, Erick Erickson erickerick...@gmail.comwrote: Shane: You've covered all the config stuff that I can think of. There's one other possibility. Do you have the soft commits turned on and are they very short? Although soft commits shouldn't invalidate any segment-level caches (but I'm not sure whether the sorting buffers are low-level or not). About the only other thing I can think of is that you're somehow doing hard commits from, say, the client but that's really stretching. All I can really say at this point is that this isn't a problem I've seen before, so it's _likely_ some innocent-seeming config has changed. I'm sure it'll be obvious once you find it G... Erick On Wed, Jun 12, 2013 at 11:51 PM, Shane Perry thry...@gmail.com wrote: Erick, I agree, it doesn't make sense. I manually merged the solrconfig.xml from the distribution example with my 3.6 solrconfig.xml, pulling out what I didn't need. There is the possibility I removed something I shouldn't have though I don't know what it would be. Minus removing the dynamic fields, a custom tokenizer class, and changing all my fields to be stored, the schema.xml file should be the same as well. I'm not currently in the position to do so, but I'll double check those two files. Finally, the data was re-indexed when I moved to 4.3. My statement about field values wasn't stated very well. What I meant is that the 'text' field has more unique terms than some of my other fields. As for this being an edge case, I'm not sure why it would manifest itself in 4.3 but not in 3.6 (short of me having a screwy configuration setting). If I get a chance, I'll see if I can duplicate the behavior with a small document count in a sandboxed environment. Shane On Wed, Jun 12, 2013 at 5:14 PM, Erick Erickson erickerick...@gmail.com wrote: This doesn't make much sense, particularly the fact that you added first/new searchers. I'm assuming that these are sorting on the same field as your slow query. But sorting on a text field for which Overall, the values of the field are unique is a red-flag. Solr doesn't sort on fields that have more than one term, so you might as well use a string field and be done with it, it's possible you're hitting some edge case. Did you just copy your 3.6 schema and configs to 4.3? Did you re-index? Best Erick On Wed, Jun 12, 2013 at 5:11 PM, Shane Perry thry...@gmail.com wrote: Thanks for the responses. Setting first/newSearcher had no noticeable effect. I'm sorting on a stored/indexed field named 'text' who's fieldType is solr.TextField. Overall, the values of the field are unique. The JVM is only using about 2G of the available 12G, so no OOM/GC issue (at least on the surface). The server is question is a slave with approximately 56 million documents. Additionally, sorting on a field of the same type but with significantly less uniqueness results quick response times. The following is a sample of *debugQuery=true* for a query which returns 1 document: lst name=process double name=time61458.0/double lst name=query double name=time61452.0/double /lst lst name=facet double name=time0.0/double /lst lst name=mlt double name=time0.0/double /lst lst name=highlight double name=time0.0/double /lst lst name=stats double name=time0.0/double /lst lst name=debug double name=time6.0/double /lst /lst -- Update -- Out of desperation, I turned off replication by commenting out the *list name=slave* element in the replication requestHandler block. After restarting tomcat I was surprised to find that the replication admin UI still reported the core as replicating. Search queries were still slow. I then disabled replication via the UI and the display updated to report the core was no longer replicating. Queries are now fast so it appears that the sorting may be a red-herring. It's may be of note to also mention that the slow queries don't appear to be getting cached. Thanks again for the feed back. On Wed, Jun 12, 2013 at 2:33 PM, Jack Krupansky j...@basetechnology.com wrote: Rerun the sorted query with debugQuery=true and look at the module timings. See what stands out Are you
Re: Sorting by field is slow
I've dug through the code and have narrowed the delay down to TopFieldCollector$OneComparatorNonScoringCollector.setNextReader() at the point where the comparator's setNextReader() method is called (line 98 in the lucene_solr_4_3 branch). That line is actually two method calls so I'm not yet certain which path is the cause. I'll continue to dig through the code but am on thin ice so input would be great. Shane On Thu, Jun 13, 2013 at 7:56 AM, Shane Perry thry...@gmail.com wrote: Erick, We do have soft commits turned. Initially, autoCommit was set at 15000 and autoSoftCommit at 1000. We did up those to 120 and 60 respectively. However, since the core in question is a slave, we don't actually do writes to the core but rely on replication only to populate the index. In this case wouldn't autoCommit and autoSoftCommit essentially be no-ops? I thought I had pulled out all hard commits but a double check shows one instance where it still occurs. Thanks for your time. Shane On Thu, Jun 13, 2013 at 5:19 AM, Erick Erickson erickerick...@gmail.comwrote: Shane: You've covered all the config stuff that I can think of. There's one other possibility. Do you have the soft commits turned on and are they very short? Although soft commits shouldn't invalidate any segment-level caches (but I'm not sure whether the sorting buffers are low-level or not). About the only other thing I can think of is that you're somehow doing hard commits from, say, the client but that's really stretching. All I can really say at this point is that this isn't a problem I've seen before, so it's _likely_ some innocent-seeming config has changed. I'm sure it'll be obvious once you find it G... Erick On Wed, Jun 12, 2013 at 11:51 PM, Shane Perry thry...@gmail.com wrote: Erick, I agree, it doesn't make sense. I manually merged the solrconfig.xml from the distribution example with my 3.6 solrconfig.xml, pulling out what I didn't need. There is the possibility I removed something I shouldn't have though I don't know what it would be. Minus removing the dynamic fields, a custom tokenizer class, and changing all my fields to be stored, the schema.xml file should be the same as well. I'm not currently in the position to do so, but I'll double check those two files. Finally, the data was re-indexed when I moved to 4.3. My statement about field values wasn't stated very well. What I meant is that the 'text' field has more unique terms than some of my other fields. As for this being an edge case, I'm not sure why it would manifest itself in 4.3 but not in 3.6 (short of me having a screwy configuration setting). If I get a chance, I'll see if I can duplicate the behavior with a small document count in a sandboxed environment. Shane On Wed, Jun 12, 2013 at 5:14 PM, Erick Erickson erickerick...@gmail.comwrote: This doesn't make much sense, particularly the fact that you added first/new searchers. I'm assuming that these are sorting on the same field as your slow query. But sorting on a text field for which Overall, the values of the field are unique is a red-flag. Solr doesn't sort on fields that have more than one term, so you might as well use a string field and be done with it, it's possible you're hitting some edge case. Did you just copy your 3.6 schema and configs to 4.3? Did you re-index? Best Erick On Wed, Jun 12, 2013 at 5:11 PM, Shane Perry thry...@gmail.com wrote: Thanks for the responses. Setting first/newSearcher had no noticeable effect. I'm sorting on a stored/indexed field named 'text' who's fieldType is solr.TextField. Overall, the values of the field are unique. The JVM is only using about 2G of the available 12G, so no OOM/GC issue (at least on the surface). The server is question is a slave with approximately 56 million documents. Additionally, sorting on a field of the same type but with significantly less uniqueness results quick response times. The following is a sample of *debugQuery=true* for a query which returns 1 document: lst name=process double name=time61458.0/double lst name=query double name=time61452.0/double /lst lst name=facet double name=time0.0/double /lst lst name=mlt double name=time0.0/double /lst lst name=highlight double name=time0.0/double /lst lst name=stats double name=time0.0/double /lst lst name=debug double name=time6.0/double /lst /lst -- Update -- Out of desperation, I turned off replication by commenting out the *list name=slave* element in the replication requestHandler block. After restarting tomcat I was surprised to find that the replication admin UI still reported the core as replicating. Search queries were still slow. I then disabled replication via
Sorting by field is slow
In upgrading from Solr 3.6.1 to 4.3.0, our query response time has increased exponentially. After testing in 4.3.0 it appears the same query (with 1 matching document) returns after 100 ms without sorting but takes 1 minute when sorting by a text field. I've looked around but haven't yet found a reason for the degradation. Can someone give me some insight or point me in the right direction for resolving this? In most cases, I can change my code to do client-side sorting but I do have a couple of situations where pagination prevents client-side sorting. Any help would be greatly appreciated. Thanks, Shane
Re: Sorting by field is slow
Thanks for the responses. Setting first/newSearcher had no noticeable effect. I'm sorting on a stored/indexed field named 'text' who's fieldType is solr.TextField. Overall, the values of the field are unique. The JVM is only using about 2G of the available 12G, so no OOM/GC issue (at least on the surface). The server is question is a slave with approximately 56 million documents. Additionally, sorting on a field of the same type but with significantly less uniqueness results quick response times. The following is a sample of *debugQuery=true* for a query which returns 1 document: lst name=process double name=time61458.0/double lst name=query double name=time61452.0/double /lst lst name=facet double name=time0.0/double /lst lst name=mlt double name=time0.0/double /lst lst name=highlight double name=time0.0/double /lst lst name=stats double name=time0.0/double /lst lst name=debug double name=time6.0/double /lst /lst -- Update -- Out of desperation, I turned off replication by commenting out the *list name=slave* element in the replication requestHandler block. After restarting tomcat I was surprised to find that the replication admin UI still reported the core as replicating. Search queries were still slow. I then disabled replication via the UI and the display updated to report the core was no longer replicating. Queries are now fast so it appears that the sorting may be a red-herring. It's may be of note to also mention that the slow queries don't appear to be getting cached. Thanks again for the feed back. On Wed, Jun 12, 2013 at 2:33 PM, Jack Krupansky j...@basetechnology.comwrote: Rerun the sorted query with debugQuery=true and look at the module timings. See what stands out Are you actually sorting on a text field, as opposed to a string field? Of course, it's always possible that maybe you're hitting some odd OOM/GC condition as a result of Solr growing between releases. -- Jack Krupansky -Original Message- From: Shane Perry Sent: Wednesday, June 12, 2013 3:00 PM To: solr-user@lucene.apache.org Subject: Sorting by field is slow In upgrading from Solr 3.6.1 to 4.3.0, our query response time has increased exponentially. After testing in 4.3.0 it appears the same query (with 1 matching document) returns after 100 ms without sorting but takes 1 minute when sorting by a text field. I've looked around but haven't yet found a reason for the degradation. Can someone give me some insight or point me in the right direction for resolving this? In most cases, I can change my code to do client-side sorting but I do have a couple of situations where pagination prevents client-side sorting. Any help would be greatly appreciated. Thanks, Shane
Re: Sorting by field is slow
Erick, I agree, it doesn't make sense. I manually merged the solrconfig.xml from the distribution example with my 3.6 solrconfig.xml, pulling out what I didn't need. There is the possibility I removed something I shouldn't have though I don't know what it would be. Minus removing the dynamic fields, a custom tokenizer class, and changing all my fields to be stored, the schema.xml file should be the same as well. I'm not currently in the position to do so, but I'll double check those two files. Finally, the data was re-indexed when I moved to 4.3. My statement about field values wasn't stated very well. What I meant is that the 'text' field has more unique terms than some of my other fields. As for this being an edge case, I'm not sure why it would manifest itself in 4.3 but not in 3.6 (short of me having a screwy configuration setting). If I get a chance, I'll see if I can duplicate the behavior with a small document count in a sandboxed environment. Shane On Wed, Jun 12, 2013 at 5:14 PM, Erick Erickson erickerick...@gmail.comwrote: This doesn't make much sense, particularly the fact that you added first/new searchers. I'm assuming that these are sorting on the same field as your slow query. But sorting on a text field for which Overall, the values of the field are unique is a red-flag. Solr doesn't sort on fields that have more than one term, so you might as well use a string field and be done with it, it's possible you're hitting some edge case. Did you just copy your 3.6 schema and configs to 4.3? Did you re-index? Best Erick On Wed, Jun 12, 2013 at 5:11 PM, Shane Perry thry...@gmail.com wrote: Thanks for the responses. Setting first/newSearcher had no noticeable effect. I'm sorting on a stored/indexed field named 'text' who's fieldType is solr.TextField. Overall, the values of the field are unique. The JVM is only using about 2G of the available 12G, so no OOM/GC issue (at least on the surface). The server is question is a slave with approximately 56 million documents. Additionally, sorting on a field of the same type but with significantly less uniqueness results quick response times. The following is a sample of *debugQuery=true* for a query which returns 1 document: lst name=process double name=time61458.0/double lst name=query double name=time61452.0/double /lst lst name=facet double name=time0.0/double /lst lst name=mlt double name=time0.0/double /lst lst name=highlight double name=time0.0/double /lst lst name=stats double name=time0.0/double /lst lst name=debug double name=time6.0/double /lst /lst -- Update -- Out of desperation, I turned off replication by commenting out the *list name=slave* element in the replication requestHandler block. After restarting tomcat I was surprised to find that the replication admin UI still reported the core as replicating. Search queries were still slow. I then disabled replication via the UI and the display updated to report the core was no longer replicating. Queries are now fast so it appears that the sorting may be a red-herring. It's may be of note to also mention that the slow queries don't appear to be getting cached. Thanks again for the feed back. On Wed, Jun 12, 2013 at 2:33 PM, Jack Krupansky j...@basetechnology.com wrote: Rerun the sorted query with debugQuery=true and look at the module timings. See what stands out Are you actually sorting on a text field, as opposed to a string field? Of course, it's always possible that maybe you're hitting some odd OOM/GC condition as a result of Solr growing between releases. -- Jack Krupansky -Original Message- From: Shane Perry Sent: Wednesday, June 12, 2013 3:00 PM To: solr-user@lucene.apache.org Subject: Sorting by field is slow In upgrading from Solr 3.6.1 to 4.3.0, our query response time has increased exponentially. After testing in 4.3.0 it appears the same query (with 1 matching document) returns after 100 ms without sorting but takes 1 minute when sorting by a text field. I've looked around but haven't yet found a reason for the degradation. Can someone give me some insight or point me in the right direction for resolving this? In most cases, I can change my code to do client-side sorting but I do have a couple of situations where pagination prevents client-side sorting. Any help would be greatly appreciated. Thanks, Shane
Inaccurate wiki documentation?
I am in the process of setting up a core using Solr 4.3. On the Core Discoveryhttp://wiki.apache.org/solr/Core%20Discovery%20(4.3%20and%20beyond) wiki page it states: As of SOLR-4196, there's a new way of defining cores. Essentially, it is no longer necessary to define cores in solr.xml. In fact, solr.xml is no longer necessary at all and will be obsoleted in Solr 5.x. As of Solr 4.3 the process is as follows: - If a solr.xml file is found in SOLR_HOME, then it is expected to be the old-style solr.xml that defines cores etc. - If there is no solr.xml but there is a solr.properties file, then exploration-based core enumeration is assumed. - If neither a solr.xml nor an solr.properties file is found, a default solr.xml file is assumed. NOTE: as of 5.0, this will not be true and an error will be thrown if no solr.properties file is found. Using the 4.3 war available for download, I attempted to set up my core using the solr.properties file (in anticipation of moving to 5.0). When I start the context, logging shows that the process is falling back to the default solr.xml file (essentially the second bullet does not occur). After digging through the 4_3 branch it looks like solr.properties is not yet part of the library. Am I missing something (I'm able to get the context started using a solr.xml file with solr/solr as the contents)? I'm going with a basic solr.xml for now, but any insight would be appreciated. Thanks in advance.
Outstanding Jira issue
I opened a Jira issue in Oct of 2011 which is still outstanding. I've boosted the priority to Critical as each time I've upgraded Solr, I've had to manually patch and build the jars. There is a patch (for 3.6) attached to the ticket. Is there someone with commit access who can take a look and poke the fix through (preferably on 4.2 as well as 4.3)? The ticket is https://issues.apache.org/jira/browse/SOLR-2834. Thanks in advance. Shane
Re: Outstanding Jira issue
Yeah, I realize my fix is more a bandage. While it wouldn't be a good long-term solution, how about going the path of ignoring unrecognized types and logging a warning message so the handler does crash. The Jira ticket could then be left open (and hopefully assigned) to fix the actual problem. This would allow consumers from having to avoid the scenario or manually patching the file to ignore the problem. On Wed, May 8, 2013 at 11:49 AM, Shawn Heisey s...@elyograg.org wrote: On 5/8/2013 9:20 AM, Shane Perry wrote: I opened a Jira issue in Oct of 2011 which is still outstanding. I've boosted the priority to Critical as each time I've upgraded Solr, I've had to manually patch and build the jars. There is a patch (for 3.6) attached to the ticket. Is there someone with commit access who can take a look and poke the fix through (preferably on 4.2 as well as 4.3)? The ticket is https://issues.apache.org/**jira/browse/SOLR-2834https://issues.apache.org/jira/browse/SOLR-2834 . Your patch just ignores the problem so the request doesn't crash, it doesn't fix it. We need to fix whatever the problem is in HTMLStripCharFilter. I had hoped I could come up with a quick fix, but it's proving too difficult for me to unravel. I can't even figure out it works on good analysis components like WhiteSpaceTokenizer, so I definitely can't see what the problem is for HTMLStripCharFilter. Thanks, Shawn
ICUTokenizer ArrayIndexOutOfBounds
Hi, I've been playing around with using the ICUTokenizer from 4.0.0. Using the code below, I was receiving an ArrayIndexOutOfBounds exception on the call to tokenizer.incrementToken(). Looking at the ICUTokenizer source, I can see why this is occuring (usableLength defaults to -1). ICUTokenizer tokenizer = new ICUTokenizer(myReader); CharTermAttribute termAtt = tokenizer.getAttribute(CharTermAttribute.class); while(tokenizer.incrementToken()) { System.out.println(termAtt.toString()); } After poking around a little more, I found that I can just call tokenizer.reset() (initializes usableLength to 0) right after constructing the object (org.apache.lucene.analysis.icu.segmentation.TestICUTokenizer does a similar step in it's super class). I was wondering if someone could explain why I need to call tokenizer.reset() prior to using the tokenizer for the first time. Thanks in advance, Shane
ClassCastException when using FieldAnalysisRequest
Hi, Using Solr 3.4.0, I am trying to do a field analysis via the FieldAnalysisRequest feature in solrj. During the process() call, the following ClassCastException is thrown: java.lang.ClassCastException: java.lang.String cannot be cast to java.util.List at org.apache.solr.client.solrj.response.AnalysisResponseBase.buildPhases(AnalysisResponseBase.java:69) at org.apache.solr.client.solrj.response.FieldAnalysisResponse.setResponse(FieldAnalysisResponse.java:66) at org.apache.solr.client.solrj.request.FieldAnalysisRequest.process(FieldAnalysisRequest.java:107) My code is as follows: FieldAnalysisRequest request = new FieldAnalysisRequest(myUri). addFieldName(field). setFieldValue(text). setQuery(text); request.process(myServer); Is there something I am doing wrong? Any help would be appreciated. Sincerely, Shane
Re: ClassCastException when using FieldAnalysisRequest
After looking at this more, it appears that solr.HTMLStripCharFilterFactory does not return a list which AnalysisResponseBase is expecting. I have created a bug ticket (https://issues.apache.org/jira/browse/SOLR-2834) On Fri, Oct 14, 2011 at 8:28 AM, Shane Perry thry...@gmail.com wrote: Hi, Using Solr 3.4.0, I am trying to do a field analysis via the FieldAnalysisRequest feature in solrj. During the process() call, the following ClassCastException is thrown: java.lang.ClassCastException: java.lang.String cannot be cast to java.util.List at org.apache.solr.client.solrj.response.AnalysisResponseBase.buildPhases(AnalysisResponseBase.java:69) at org.apache.solr.client.solrj.response.FieldAnalysisResponse.setResponse(FieldAnalysisResponse.java:66) at org.apache.solr.client.solrj.request.FieldAnalysisRequest.process(FieldAnalysisRequest.java:107) My code is as follows: FieldAnalysisRequest request = new FieldAnalysisRequest(myUri). addFieldName(field). setFieldValue(text). setQuery(text); request.process(myServer); Is there something I am doing wrong? Any help would be appreciated. Sincerely, Shane
Re: Omit hour-min-sec in search?
Not sure if there is a means of doing explicitly what you ask, but you could do a date range: +mydate:[-MM-DD 0:0:0 TO -MM-DD 11:59:59] On Thu, Mar 3, 2011 at 9:14 AM, bbarani bbar...@gmail.com wrote: Hi, Is there a way to omit hour-min-sec in SOLR date field during search? I have indexed a field using TrieDateField and seems like it uses UTC format. The dates get stored as below, lastupdateddate2008-02-26T20:40:30.94Z I want to do a search based on just -MM-DD and omit T20:40:30.94Z.. Not sure if its feasible, just want to check if its possible. Also most of the data in our source doesnt have time information hence we are very much interested in just storing the date without time or even if its stored with some default timestamp we want to search just using date without using the timestamp. Thanks, Barani -- View this message in context: http://lucene.472066.n3.nabble.com/Omit-hour-min-sec-in-search-tp2625840p2625840.html Sent from the Solr - User mailing list archive at Nabble.com.
Writing on master while replicating to slave
Hi, When a slave is replicating from the master instance, it appears a write lock is created. Will this lock cause issues with writing to the master while the replication is occurring or does SOLR have some queuing that occurs to prevent the actual write until the replication is complete? I've been looking around but can't seem to find anything definitive. My application's data is user centric and as a result the application does a lot of updates and commits. Additionally, we want to provide near real-time searching and so replication would have to occur aggressively. Does anybody have any strategies for handling such an application which they would be willing to share? Thanks, Shane
Re: rejected email
I tried posting from gmail this morning and had it rejected. When I resent as plaintext, it went through. On Thu, Feb 10, 2011 at 11:51 AM, Erick Erickson erickerick...@gmail.com wrote: Anyone else having problems with the Solr users list suddenly deciding everything you send is spam? For the last couple of days I've had this happening from gmail, and as far as I know I haven't changed anything that would give my mails a different spam score which is being exceeded according to the bounced message... Thanks, Erick
Re: DIH - Closing ResultSet in JdbcDataSource
I have found where a root entity has completed processing and added the logic to clear the entity's cache at that point (didn't change any of the logic for clearing all entity caches once the import has completed). I have also created an enhancement request found at https://issues.apache.org/jira/browse/SOLR-2313. On Tue, Jan 11, 2011 at 2:54 PM, Shane Perry thry...@gmail.com wrote: By placing some strategic debug messages, I have found that the JDBC connections are not being closed until all entity elements have been processed (in the entire config file). A simplified example would be: dataConfig dataSource name=ds1 driver=org.postgresql.Driver url=jdbc:postgresql://localhost:5432/db1 user=... password=... / dataSource name=ds2 driver=org.postgresql.Driver url=jdbc:postgresql://localhost:5432/db2 user=... password=... / document entity name=entity1 datasource=ds1 ... ... field list ... entity name=entity1a datasource=ds1 ... ... field list ... /entity /entity entity name=entity2 datasource=ds2 ... ... field list ... entity name=entity2a datasource=ds2 ... ... field list ... /entity /entity /document /dataConfig The behavior is: JDBC connection opened for entity1 and entity1a - Applicable queries run and ResultSet objects processed All open ResultSet and Statement objects closed for entity1 and entity1a JDBC connection opened for entity2 and entity2a - Applicable queries run and ResultSet objects processed All open ResultSet and Statement objects closed for entity2 and entity2a All JDBC connections (none are closed at this point) are closed. In my instance, I have some 95 unique entity elements (19 parents with 5 children each), resulting in 95 open JDBC connections. If I understand the process correctly, it should be safe to close the JDBC connection for a root entity (immediate children of document) and all descendant entity elements once the parent has been successfully completed. I have been digging around the code, but due to my unfamiliarity with the code, I'm not sure where this would occur. Is this a valid solution? It's looking like I should probably open a defect and I'm willing to do so along with submitting a patch, but need a little more direction on where the fix would best reside. Thanks, Shane On Mon, Jan 10, 2011 at 7:14 AM, Shane Perry thry...@gmail.com wrote: Gora, Thanks for the response. After taking another look, you are correct about the hasnext() closing the ResultSet object (1.4.1 as well as 1.4.0). I didn't recognize the case difference in the two function calls, so missed it. I'll keep looking into the original issue and reply if I find a cause/solution. Shane On Sat, Jan 8, 2011 at 4:04 AM, Gora Mohanty g...@mimirtech.com wrote: On Sat, Jan 8, 2011 at 1:10 AM, Shane Perry thry...@gmail.com wrote: Hi, I am in the process of migrating our system from Postgres 8.4 to Solr 1.4.1. Our system is fairly complex and as a result, I have had to define 19 base entities in the data-config.xml definition file. Each of these entities executes 5 queries. When doing a full-import, as each entity completes, the server hosting Postgres shows 5 idle in transaction for the entity. In digging through the code, I found that the JdbcDataSource wraps the ResultSet object in a custom ResultSetIterator object, leaving the ResultSet open. Walking through the code I can't find a close() call anywhere on the ResultSet. I believe this results in the idle in transaction processes. [...] Have not examined the idle in transaction issue that you mention, but the ResultSet object in a ResultSetIterator is closed in the private hasnext() method, when there are no more results, or if there is an exception. hasnext() is called by the public hasNext() method that should be used in iterating over the results, so I see no issue there. Regards, Gora P.S. This is from Solr 1.4.0 code, but I would not think that this part of the code would have changed.
Re: DIH - Closing ResultSet in JdbcDataSource
By placing some strategic debug messages, I have found that the JDBC connections are not being closed until all entity elements have been processed (in the entire config file). A simplified example would be: dataConfig dataSource name=ds1 driver=org.postgresql.Driver url=jdbc:postgresql://localhost:5432/db1 user=... password=... / dataSource name=ds2 driver=org.postgresql.Driver url=jdbc:postgresql://localhost:5432/db2 user=... password=... / document entity name=entity1 datasource=ds1 ... ... field list ... entity name=entity1a datasource=ds1 ... ... field list ... /entity /entity entity name=entity2 datasource=ds2 ... ... field list ... entity name=entity2a datasource=ds2 ... ... field list ... /entity /entity /document /dataConfig The behavior is: JDBC connection opened for entity1 and entity1a - Applicable queries run and ResultSet objects processed All open ResultSet and Statement objects closed for entity1 and entity1a JDBC connection opened for entity2 and entity2a - Applicable queries run and ResultSet objects processed All open ResultSet and Statement objects closed for entity2 and entity2a All JDBC connections (none are closed at this point) are closed. In my instance, I have some 95 unique entity elements (19 parents with 5 children each), resulting in 95 open JDBC connections. If I understand the process correctly, it should be safe to close the JDBC connection for a root entity (immediate children of document) and all descendant entity elements once the parent has been successfully completed. I have been digging around the code, but due to my unfamiliarity with the code, I'm not sure where this would occur. Is this a valid solution? It's looking like I should probably open a defect and I'm willing to do so along with submitting a patch, but need a little more direction on where the fix would best reside. Thanks, Shane On Mon, Jan 10, 2011 at 7:14 AM, Shane Perry thry...@gmail.com wrote: Gora, Thanks for the response. After taking another look, you are correct about the hasnext() closing the ResultSet object (1.4.1 as well as 1.4.0). I didn't recognize the case difference in the two function calls, so missed it. I'll keep looking into the original issue and reply if I find a cause/solution. Shane On Sat, Jan 8, 2011 at 4:04 AM, Gora Mohanty g...@mimirtech.com wrote: On Sat, Jan 8, 2011 at 1:10 AM, Shane Perry thry...@gmail.com wrote: Hi, I am in the process of migrating our system from Postgres 8.4 to Solr 1.4.1. Our system is fairly complex and as a result, I have had to define 19 base entities in the data-config.xml definition file. Each of these entities executes 5 queries. When doing a full-import, as each entity completes, the server hosting Postgres shows 5 idle in transaction for the entity. In digging through the code, I found that the JdbcDataSource wraps the ResultSet object in a custom ResultSetIterator object, leaving the ResultSet open. Walking through the code I can't find a close() call anywhere on the ResultSet. I believe this results in the idle in transaction processes. [...] Have not examined the idle in transaction issue that you mention, but the ResultSet object in a ResultSetIterator is closed in the private hasnext() method, when there are no more results, or if there is an exception. hasnext() is called by the public hasNext() method that should be used in iterating over the results, so I see no issue there. Regards, Gora P.S. This is from Solr 1.4.0 code, but I would not think that this part of the code would have changed.
Re: DIH - Closing ResultSet in JdbcDataSource
Gora, Thanks for the response. After taking another look, you are correct about the hasnext() closing the ResultSet object (1.4.1 as well as 1.4.0). I didn't recognize the case difference in the two function calls, so missed it. I'll keep looking into the original issue and reply if I find a cause/solution. Shane On Sat, Jan 8, 2011 at 4:04 AM, Gora Mohanty g...@mimirtech.com wrote: On Sat, Jan 8, 2011 at 1:10 AM, Shane Perry thry...@gmail.com wrote: Hi, I am in the process of migrating our system from Postgres 8.4 to Solr 1.4.1. Our system is fairly complex and as a result, I have had to define 19 base entities in the data-config.xml definition file. Each of these entities executes 5 queries. When doing a full-import, as each entity completes, the server hosting Postgres shows 5 idle in transaction for the entity. In digging through the code, I found that the JdbcDataSource wraps the ResultSet object in a custom ResultSetIterator object, leaving the ResultSet open. Walking through the code I can't find a close() call anywhere on the ResultSet. I believe this results in the idle in transaction processes. [...] Have not examined the idle in transaction issue that you mention, but the ResultSet object in a ResultSetIterator is closed in the private hasnext() method, when there are no more results, or if there is an exception. hasnext() is called by the public hasNext() method that should be used in iterating over the results, so I see no issue there. Regards, Gora P.S. This is from Solr 1.4.0 code, but I would not think that this part of the code would have changed.
DIH - Closing ResultSet in JdbcDataSource
Hi, I am in the process of migrating our system from Postgres 8.4 to Solr 1.4.1. Our system is fairly complex and as a result, I have had to define 19 base entities in the data-config.xml definition file. Each of these entities executes 5 queries. When doing a full-import, as each entity completes, the server hosting Postgres shows 5 idle in transaction for the entity. In digging through the code, I found that the JdbcDataSource wraps the ResultSet object in a custom ResultSetIterator object, leaving the ResultSet open. Walking through the code I can't find a close() call anywhere on the ResultSet. I believe this results in the idle in transaction processes. Am I off base here? I'm not sure what the overall implications are of the idle in transaction processes, but is there a way I can get around the issue without importing each entity manually? Any feedback would be greatly appreciated. Thanks in advance, Shane