Re: Solr admin interface freezes on Chrome
> Works fine on Firefox, and I > haven't made any changes to our Solr instance (v8.1.1) in a while. Had a co-worker with a similar issue. He had a pop-blocker enabled in chrome that was preventing some resource call (or something similar). When switching to Firefox everything worked without issue. Any chance something is showing in the developer tools console?
Solr standalone timeouts after upgrading to SOLR 7
Hello all, We recently moved to SOLR 7 from SOLR 6 about 2 weeks ago. Once each week (including today) we experienced query timeout issues with corresponding GC events. There was a spike in CPU up to 66% which is not something we previously saw w/ Solr 6. From the SOLR logs it looks like something inside the JVM has happend, SOLR is reporting closed connections from Jetty. Our data size is relatively small but we do run 5 cores within the one Jetty instance. There index sizes are anywhere between 200Mb to 2GB Our memory consumption is relatively low: "free":"296.1 MB", "total":"569.6 MB", "max":"9.6 GB", "used":"273.5 MB (%2.8)", We had a spike in traffic about 5 minutes prior to some longer GC events (similar situation last week). Any help would be appreciated. Below is my current system info along with a GC log snippet and the corresponding SOLR log error. *System info:* AMZ2 linux 8 core 32 GB Mem *Java:* 1.8.0_222-ea 25.222-b03 *Solr: *solr-spec-version":"7.7.2" *Start options: * "-Xms512m", "-Xmx10g", "-XX:NewRatio=3", "-XX:SurvivorRatio=4", "-XX:TargetSurvivorRatio=90", "-XX:MaxTenuringThreshold=8", "-XX:+UseConcMarkSweepGC", "-XX:ConcGCThreads=4", "-XX:ParallelGCThreads=4", "-XX:+CMSScavengeBeforeRemark", "-XX:PretenureSizeThreshold=64m", "-XX:+UseCMSInitiatingOccupancyOnly", "-XX:CMSInitiatingOccupancyFraction=50", "-XX:CMSMaxAbortablePrecleanTime=6000", "-XX:+CMSParallelRemarkEnabled", "-XX:+ParallelRefProcEnabled", "-XX:-OmitStackTraceInFastThrow", "-verbose:gc", "-XX:+PrintHeapAtGC", "-XX:+PrintGCDetails", "-XX:+PrintGCDateStamps", "-XX:+PrintGCTimeStamps", "-XX:+PrintTenuringDistribution", "-XX:+PrintGCApplicationStoppedTime", "-XX:+UseGCLogFileRotation", "-XX:NumberOfGCLogFiles=9", "-XX:GCLogFileSize=20M", "-Xss256k", "-Dsolr.log.muteconsole" Here is an example of from the GC log: 2019-10-02T16:03:15.888+: 265318.624: [Full GC (Allocation Failure) 2019-10-02T16:03:15.888+: 265318.624: [CMS2019-10-02T16:03:16.134+: 26 5318.870: [CMS-concurrent-mark: 1.773/1.783 secs] [Times: user=13.14 sys=0.00, real=1.78 secs] (concurrent mode failure): 7864319K->7864319K(7864320K), 9.5890129 secs] 10048895K->8863021K(10048896K), [Metaspace: 53159K->53159K(1097728K)], 9.5892061 secs] [Times: user=10.31 sys=0.00, real=9.59 secs] Heap after GC invocations=296656 (full 546): par new generation total 2184576K, used 998701K [0x00054000, 0x0005e000, 0x0005e000) eden space 1747712K, 57% used [0x00054000, 0x00057cf4b4f0, 0x0005aaac) from space 436864K, 0% used [0x0005aaac, 0x0005aaac, 0x0005c556) to space 436864K, 0% used [0x0005c556, 0x0005c556, 0x0005e000) concurrent mark-sweep generation total 7864320K, used 7864319K [0x0005e000, 0x0007c000, 0x0007c000) Metaspace used 53159K, capacity 54766K, committed 55148K, reserved 1097728K class spaceused 5589K, capacity 5950K, committed 6000K, reserved 1048576K } 2019-10-02T16:03:25.477+: 265328.214: Total time for which application threads were stopped: 9.5906157 seconds, Stopping threads took: 0.0001274 seconds *With the following from the SOLR log: * [ x:core] o.a.s.s.HttpSolrCall Unable to write response, client closed connection or we are s hutting down org.eclipse.jetty.io.EofException: Closed at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:665) ~[jetty-server-9.4.14.v20181114.jar:9.4.14.v20181114] at org.apache.solr.servlet.ServletOutputStreamWrapper.write(ServletOutputStreamWrapper.java:126) ~[solr-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c 1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:48] at org.apache.solr.response.QueryResponseWriterUtil$1.write(QueryResponseWriterUtil.java:54) ~[solr-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fef c589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:48] at java.io.OutputStream.write(OutputStream.java:116) ~[?:1.8.0_222-ea] at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) ~[?:1.8.0_222-ea] at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282) ~[?:1.8.0_222-ea] at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125) ~[?:1.8.0_222-ea] at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207) ~[?:1.8.0_222-ea] at org.apache.solr.common.util.FastWriter.flush(FastWriter.java:140) ~[solr-solrj-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - j anhoy - 2019-05-28 23:37:52] at org.apache.solr.common.util.FastWriter.flushBuffer(FastWriter.java:154) ~[solr-solrj-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b 94 - janhoy - 2019-05-28 23:37:52] at
Fwd: Solr standalone timeouts after upgrading to SOLR 7
Hello all, We recently moved to SOLR 7 from SOLR 6 about 2 weeks ago. Once each week (including today) we experienced query timeout issues with corresponding GC events. There was a spike in CPU up to 66% which is not something we previously saw w/ Solr 6. From the SOLR logs it looks like something inside the JVM has happend, SOLR is reporting closed connections from Jetty. Our data size is relatively small but we do run 5 cores within the one Jetty instance. There index sizes are anywhere between 200Mb to 2GB Our memory consumption is relatively low: "free":"296.1 MB", "total":"569.6 MB", "max":"9.6 GB", "used":"273.5 MB (%2.8)", We had a spike in traffic about 5 minutes prior to some longer GC events (similar situation last week). Any help would be appreciated. Below is my current system info along with a GC log snippet and the corresponding SOLR log error. *System info:* AMZ2 linux 8 core 32 GB Mem *Java:* 1.8.0_222-ea 25.222-b03 *Solr: *solr-spec-version":"7.7.2" *Start options: * "-Xms512m", "-Xmx10g", "-XX:NewRatio=3", "-XX:SurvivorRatio=4", "-XX:TargetSurvivorRatio=90", "-XX:MaxTenuringThreshold=8", "-XX:+UseConcMarkSweepGC", "-XX:ConcGCThreads=4", "-XX:ParallelGCThreads=4", "-XX:+CMSScavengeBeforeRemark", "-XX:PretenureSizeThreshold=64m", "-XX:+UseCMSInitiatingOccupancyOnly", "-XX:CMSInitiatingOccupancyFraction=50", "-XX:CMSMaxAbortablePrecleanTime=6000", "-XX:+CMSParallelRemarkEnabled", "-XX:+ParallelRefProcEnabled", "-XX:-OmitStackTraceInFastThrow", "-verbose:gc", "-XX:+PrintHeapAtGC", "-XX:+PrintGCDetails", "-XX:+PrintGCDateStamps", "-XX:+PrintGCTimeStamps", "-XX:+PrintTenuringDistribution", "-XX:+PrintGCApplicationStoppedTime", "-XX:+UseGCLogFileRotation", "-XX:NumberOfGCLogFiles=9", "-XX:GCLogFileSize=20M", "-Xss256k", "-Dsolr.log.muteconsole" Here is an example of from the GC log: 2019-10-02T16:03:15.888+: 265318.624: [Full GC (Allocation Failure) 2019-10-02T16:03:15.888+: 265318.624: [CMS2019-10-02T16:03:16.134+: 26 5318.870: [CMS-concurrent-mark: 1.773/1.783 secs] [Times: user=13.14 sys=0.00, real=1.78 secs] (concurrent mode failure): 7864319K->7864319K(7864320K), 9.5890129 secs] 10048895K->8863021K(10048896K), [Metaspace: 53159K->53159K(1097728K)], 9.5892061 secs] [Times: user=10.31 sys=0.00, real=9.59 secs] Heap after GC invocations=296656 (full 546): par new generation total 2184576K, used 998701K [0x00054000, 0x0005e000, 0x0005e000) eden space 1747712K, 57% used [0x00054000, 0x00057cf4b4f0, 0x0005aaac) from space 436864K, 0% used [0x0005aaac, 0x0005aaac, 0x0005c556) to space 436864K, 0% used [0x0005c556, 0x0005c556, 0x0005e000) concurrent mark-sweep generation total 7864320K, used 7864319K [0x0005e000, 0x0007c000, 0x0007c000) Metaspace used 53159K, capacity 54766K, committed 55148K, reserved 1097728K class spaceused 5589K, capacity 5950K, committed 6000K, reserved 1048576K } 2019-10-02T16:03:25.477+: 265328.214: Total time for which application threads were stopped: 9.5906157 seconds, Stopping threads took: 0.0001274 seconds *With the following from the SOLR log: * [ x:core] o.a.s.s.HttpSolrCall Unable to write response, client closed connection or we are s hutting down org.eclipse.jetty.io.EofException: Closed at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:665) ~[jetty-server-9.4.14.v20181114.jar:9.4.14.v20181114] at org.apache.solr.servlet.ServletOutputStreamWrapper.write(ServletOutputStreamWrapper.java:126) ~[solr-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c 1fefc589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:48] at org.apache.solr.response.QueryResponseWriterUtil$1.write(QueryResponseWriterUtil.java:54) ~[solr-core-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fef c589eb7cd070a415b94 - janhoy - 2019-05-28 23:37:48] at java.io.OutputStream.write(OutputStream.java:116) ~[?:1.8.0_222-ea] at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) ~[?:1.8.0_222-ea] at sun.nio.cs.StreamEncoder.implWrite(StreamEncoder.java:282) ~[?:1.8.0_222-ea] at sun.nio.cs.StreamEncoder.write(StreamEncoder.java:125) ~[?:1.8.0_222-ea] at java.io.OutputStreamWriter.write(OutputStreamWriter.java:207) ~[?:1.8.0_222-ea] at org.apache.solr.common.util.FastWriter.flush(FastWriter.java:140) ~[solr-solrj-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b94 - j anhoy - 2019-05-28 23:37:52] at org.apache.solr.common.util.FastWriter.flushBuffer(FastWriter.java:154) ~[solr-solrj-7.7.2.jar:7.7.2 d4c30fc2856154f2c1fefc589eb7cd070a415b 94 - janhoy - 2019-05-28 23:37:52] at
Re: Work-around for "indexed without position data"
Not sure if it helps beyond the steps to reproduce that I supplied above, but I also see that "Omit Term Frequencies & Positions" is still set on the field according to the LukeRequestHandler: ITS--OF-- On Mon, Jun 5, 2017 at 1:18 PM, Solr User <solr...@gmail.com> wrote: > Sorry for the delay. I was able to reproduce this easily with my setup, > but reproducing this on a Solr example proved challenging. Hopefully the > work that I did to find the situation in which this is produced will help > in resolving the problem. The driving factor for this appears to be how > updates are sent to Solr. When sending batches of updates with commits, > the problem is reproduced. If the commit is held until after all updates > are sent, then no problem is produced. This leads me to believe that this > issue has something to do with overlapping commits or index merges. This > was reproducible regardless of running classic or managed schema and > regardless of running Solr core or SolrCloud. > > There are not many steps to reproduce this, but you will need a way to > send these updates. I have included inline create.sh and create.pl > scripts to generate the data and send the updates. You can index a > lastModified field or something to convince yourself that everything has > been re-indexed. I left that out to keep the steps lean. Also, this test > is using commit statements from the client sending the updates for > simplicity even though it is not a good practice. My normal setup is using > Solrj with commitWithin to allow Solr to manage when the commits take > place, but the same error is produced either way. > > > *STEPS TO REPRODUCE* > >1. Install Solr 5.5.3 and change to that working directory >2. bin/solr -e techproducts >3. bin/solr stop [Why these next 3 steps? These are to start the >index completely new without the 32 example documents as opposed to a >delete query. The documents are not posted after the core is detected the >second time.] >4. rm -rf ./example/techproducts/solr/techproducts/data/ >5. bin/solr -e techproducts >6. ./create.sh >7. curl -X POST -H 'Content-type:application/json' --data-binary '{ >"replace-field":{ "name":"cat", "type":"text_en_splitting", "indexed":true, >"multiValued":true, "stored":true } }' http://localhost:8983/solr/ >techproducts/schema >8. http://localhost:8983/solr/techproducts/select?q=cat:% >22hard%20drive%22 [error] >9. ./create.sh >10. http://localhost:8983/solr/techproducts/select?q=cat:% >22hard%20drive%22 [error even though all documents have been >re-indexed] > > *create.sh* > #!/bin/bash > for i in {1..100}; do > echo "$i" > ./create.pl $i > ./create.xml$i > curl http://localhost:8983/solr/techproducts/update?commit=true -H > "Content-Type: text/xml" --data-binary @./create.xml$i > done > > *create.pl <http://create.pl>* > #!/usr/bin/perl > my $S = $ARGV[0]; > my $I = 100; > my $N = $S*$I + $I; > my $i; > print "\n"; > for($i=$S*$I; $i<$N; $i++) { >print "SP${i}cat > hard drive ${i}\n"; > } > print "\n"; > > > On Fri, May 26, 2017 at 2:14 AM, Rick Leir <rl...@leirtech.com> wrote: > >> Can you reproduce this error? What are the steps you take to reproduce >> it? ( simple is better). >> >> cheers -- Rick >> >> >> >> On 2017-05-25 05:46 PM, Solr User wrote: >> >>> This is in regards to changing a field type from string to >>> text_en_splitting, re-indexing all documents, even optimizing to give the >>> index a chance to merge segments and rewrite itself entirely, and then >>> getting this error when running a phrase query: >>> java.lang.IllegalStateException: field "blah" was indexed without >>> position >>> data; cannot run PhraseQuery >>> >>> I have encountered this issue before and have always done one of the >>> following as a work-around: >>> 1. Instead of changing the field type on an existing field just create a >>> new field and retire the old one. >>> 2. Delete the index directory and start from scratch. >>> >>> These work-arounds are not always ideal. Does anyone know what is >>> holding >>> onto that old field type definition? What thinks it is still a string? >>> Every document has been re-indexed and I am sure of this because I have a >>> time stamp indexed. Is there any other way to get this to work? >>> >>> For what it is worth, I am running this in SolrCloud mode but I remember >>> seeing this issue before SolrCloud was released as well. >>> >>> >> >
Re: Anonymous Read?
Thanks! The null role value did the trick. I tried this with the predefined permissions and it worked as well. Thanks again! On Tue, Jun 6, 2017 at 2:08 PM, Oakley, Craig (NIH/NLM/NCBI) [C] < craig.oak...@nih.gov> wrote: > We usually end security.json with the permissions > >{ > "name":"open_select", > "path":"/select/*", > "role":null}, > { > "name":"all-admin", > "collection":null, > "path":"/*", > "role":"allgen"}, > { > "name":"all-core-handlers", > "path":"/*", > "role":"allgen"}] > } } > > > ...and then assign the "allgen" role to all users > > This allows a select without a login & password, but requires a login & > password for anything else (including the front page of the GUI) > > -Original Message- > From: Solr User [mailto:solr...@gmail.com] > Sent: Tuesday, June 06, 2017 2:27 PM > To: solr-user@lucene.apache.org > Subject: Anonymous Read? > > Is it possible to setup Solr security to allow anonymous query (/select > etc.) but restricted access to other permissions as described in > https://lucidworks.com/2015/08/17/securing-solr-basic- > auth-permission-rules/ > ? >
Anonymous Read?
Is it possible to setup Solr security to allow anonymous query (/select etc.) but restricted access to other permissions as described in https://lucidworks.com/2015/08/17/securing-solr-basic-auth-permission-rules/ ?
Re: Work-around for "indexed without position data"
Sorry for the delay. I was able to reproduce this easily with my setup, but reproducing this on a Solr example proved challenging. Hopefully the work that I did to find the situation in which this is produced will help in resolving the problem. The driving factor for this appears to be how updates are sent to Solr. When sending batches of updates with commits, the problem is reproduced. If the commit is held until after all updates are sent, then no problem is produced. This leads me to believe that this issue has something to do with overlapping commits or index merges. This was reproducible regardless of running classic or managed schema and regardless of running Solr core or SolrCloud. There are not many steps to reproduce this, but you will need a way to send these updates. I have included inline create.sh and create.pl scripts to generate the data and send the updates. You can index a lastModified field or something to convince yourself that everything has been re-indexed. I left that out to keep the steps lean. Also, this test is using commit statements from the client sending the updates for simplicity even though it is not a good practice. My normal setup is using Solrj with commitWithin to allow Solr to manage when the commits take place, but the same error is produced either way. *STEPS TO REPRODUCE* 1. Install Solr 5.5.3 and change to that working directory 2. bin/solr -e techproducts 3. bin/solr stop [Why these next 3 steps? These are to start the index completely new without the 32 example documents as opposed to a delete query. The documents are not posted after the core is detected the second time.] 4. rm -rf ./example/techproducts/solr/techproducts/data/ 5. bin/solr -e techproducts 6. ./create.sh 7. curl -X POST -H 'Content-type:application/json' --data-binary '{ "replace-field":{ "name":"cat", "type":"text_en_splitting", "indexed":true, "multiValued":true, "stored":true } }' http://localhost:8983/solr/techproducts/schema 8. http://localhost:8983/solr/techproducts/select?q=cat:%22hard%20drive%22 [error] 9. ./create.sh 10. http://localhost:8983/solr/techproducts/select?q=cat:%22hard%20drive%22 [error even though all documents have been re-indexed] *create.sh* #!/bin/bash for i in {1..100}; do echo "$i" ./create.pl $i > ./create.xml$i curl http://localhost:8983/solr/techproducts/update?commit=true -H "Content-Type: text/xml" --data-binary @./create.xml$i done *create.pl <http://create.pl>* #!/usr/bin/perl my $S = $ARGV[0]; my $I = 100; my $N = $S*$I + $I; my $i; print "\n"; for($i=$S*$I; $i<$N; $i++) { print "SP${i}cat hard drive ${i}\n"; } print "\n"; On Fri, May 26, 2017 at 2:14 AM, Rick Leir <rl...@leirtech.com> wrote: > Can you reproduce this error? What are the steps you take to reproduce it? > ( simple is better). > > cheers -- Rick > > > > On 2017-05-25 05:46 PM, Solr User wrote: > >> This is in regards to changing a field type from string to >> text_en_splitting, re-indexing all documents, even optimizing to give the >> index a chance to merge segments and rewrite itself entirely, and then >> getting this error when running a phrase query: >> java.lang.IllegalStateException: field "blah" was indexed without >> position >> data; cannot run PhraseQuery >> >> I have encountered this issue before and have always done one of the >> following as a work-around: >> 1. Instead of changing the field type on an existing field just create a >> new field and retire the old one. >> 2. Delete the index directory and start from scratch. >> >> These work-arounds are not always ideal. Does anyone know what is holding >> onto that old field type definition? What thinks it is still a string? >> Every document has been re-indexed and I am sure of this because I have a >> time stamp indexed. Is there any other way to get this to work? >> >> For what it is worth, I am running this in SolrCloud mode but I remember >> seeing this issue before SolrCloud was released as well. >> >> >
Work-around for "indexed without position data"
This is in regards to changing a field type from string to text_en_splitting, re-indexing all documents, even optimizing to give the index a chance to merge segments and rewrite itself entirely, and then getting this error when running a phrase query: java.lang.IllegalStateException: field "blah" was indexed without position data; cannot run PhraseQuery I have encountered this issue before and have always done one of the following as a work-around: 1. Instead of changing the field type on an existing field just create a new field and retire the old one. 2. Delete the index directory and start from scratch. These work-arounds are not always ideal. Does anyone know what is holding onto that old field type definition? What thinks it is still a string? Every document has been re-indexed and I am sure of this because I have a time stamp indexed. Is there any other way to get this to work? For what it is worth, I am running this in SolrCloud mode but I remember seeing this issue before SolrCloud was released as well.
Re: Faceting and Grouping Performance Degradation in Solr 5
I am pleased to report that we are in Production on Solr 5.5.3 with comparable performance to Solr 4.8.1 through leveraging facet.method=uif as well as https://issues.apache.org/jira/browse/SOLR-9176. Thanks to everyone who worked on these! On Mon, Oct 3, 2016 at 3:55 PM, Solr User <solr...@gmail.com> wrote: > Below is some further testing. This was done in an environment that had > no other queries or updates during testing. We ran through several > scenarios so I pasted this with HTML formatting below so you may view this > as a table. Sorry if you have to pull this out into a different file for > viewing, but I did not want the formatting to be messed up. The times are > average times in milliseconds. Same test methodology as above except there > was a 5 minute warmup and a 15 minute test. > > Note that both the segment and deletions were recorded from only 1 out of > 2 of the shards so we cannot try to extrapolate a function between them and > the outcome. In other words, just view them as "non-optimized" versus > "optimized" and "has deletions" versus "no deletions". The only exceptions > are the 0 deletes were true for both shards and the 1 segment and 8 segment > cases were true for both shards. A few of the tests were repeated as well. > > The only conclusion that I could draw is that the number of segments and > the number of deletes appear to greatly influence the response times, at > least more than any difference in Solr version. There also appears to be > some external contributor to variancemaybe network, etc. > > Thoughts? > > > Date9/29/20169/29/ > 20169/29/20169/30/20169/30/ > 20169/30/20169/30/20169/30/ > 20169/30/20169/30/20169/30/ > 20169/30/20169/30/201610/3/ > 201610/3/201610/3/201610/3/2016Solr > Version5.5.25.5.24.8.14. > 8.14.8.15.5.25.5.25.5.2< > /td>5.5.25.5.25.5.25.5.2 td>5.5.24.8.14.8.14.8.1 td>4.8.1Deleted Docs57873 > 57873176958593694593694 > 578735787357873578730< > /td>00< > /td>0Segment Count3434 td>1827273434< > td>34348811 td>8811 > facet.method=uifYESYESN/A< > td>N/AN/AYESYESNO td>NONOYESYESNO td>N/AN/AN/AN/AScenario > #1198210145186< > td>190208209210206 td>1091427370160 td>1098385Scenario > #29288596258 td>7270777468< > td>7363616654 > 5251 > > > > > On Wed, Sep 28, 2016 at 4:44 PM, Solr User <solr...@gmail.com> wrote: > >> I plan to re-test this in a separate environment that I have more control >> over and will share the results when I can. >> >> On Wed, Sep 28, 2016 at 3:37 PM, Solr User <solr...@gmail.com> wrote: >> >>> Certainly. And I would of course welcome anyone else to test this for >>> themselves especially with facet.method=uif to see if that has indeed >>> bridged the gap between Solr 4 and Solr 5. I would be very happy if my >>> testing is invalid due to variance, problem in process, etc. One thing I >>> was pondering is if I should force merge the index to a certain amount of >>> segments because indexing yields a random number of segments and >>> deletions. The only thing stopping me short of doing that were >>> observations of longer Solr 4 times even with more deletions and similar >>> number of segments. >>> >>> We use Soasta as our testing tool. Before testing, load is sent for >>> 10-15 minutes to make sure any Solr caches have stabilized. Then the test >>> is run for 30 minutes of steady volume with Scenario #1 tested at 15 >>> req/sec and Scenario #2 tested at 100 req/sec. Each request is different >>> with input being pulled from data files. The requests are repeatable test >>> to test. >>> >>> The numbers posted above are average response times as reported by >>> Soasta. However, respective time differences are supported by Splunk which >>> indexes the Solr logs and Dynatrace which is instrumented on one of the >>> JVM's. >>> >>> The versions are deployed to the same machines thereby overlaying the >>> previous installation. Going Solr 4 to Solr 5, full indexing is run with >>> the same input data. Being in SolrCloud mode, the full indexing comprises >>> of indexing all documents and then deleting any that were not touched. >>> Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not >>> load with a Solr 5 index. Testing Solr 4 after reverting yields the same >>> results as the previous Solr 4 test. >>> >>> >>> On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen <t...@statsbiblioteket.dk> >>>
Re: ClassNotFoundException with Custom ZkACLProvider
For those interested, I ended up bundling the customized ACL provider with the solr.war. I could not stomach looking at the stack trace in the logs. On Mon, Nov 7, 2016 at 4:47 PM, Solr User <solr...@gmail.com> wrote: > This is mostly just an FYI regarding future work on issues like SOLR-8792. > > I wanted admin update but world read on ZK since I do not have anything > sensitive from a read perspective in the Solr data and did not want to > force all SolrCloud clients to implement authentication just for read. So, > I extended DefaultZkACLProvider and implemented a replacement for > VMParamsAllAndReadonlyDigestZkACLProvider. > > My custom code is loaded from the sharedLib in solr.xml. However, there > is a temporary ZK lookup to read solr.xml (and chroot) which is obviously > done before loading sharedLib. Therefore, I am faced with a > ClassNotFoundException. This has no negative effect on the ACL > functionalityjust the annoying stack trace in the logs. I do not want > to package this custom code with the Solr code and do not want to package > this along with Solr dependencies in the Jetty lib/ext. > > So, I am planning to live with the stack trace and just wanted to share > this for any future work on the dynamic solr.xml and chroot lookups or in > case I am missing some work-around. > > Thanks! > >
ClassNotFoundException with Custom ZkACLProvider
This is mostly just an FYI regarding future work on issues like SOLR-8792. I wanted admin update but world read on ZK since I do not have anything sensitive from a read perspective in the Solr data and did not want to force all SolrCloud clients to implement authentication just for read. So, I extended DefaultZkACLProvider and implemented a replacement for VMParamsAllAndReadonlyDigestZkACLProvider. My custom code is loaded from the sharedLib in solr.xml. However, there is a temporary ZK lookup to read solr.xml (and chroot) which is obviously done before loading sharedLib. Therefore, I am faced with a ClassNotFoundException. This has no negative effect on the ACL functionalityjust the annoying stack trace in the logs. I do not want to package this custom code with the Solr code and do not want to package this along with Solr dependencies in the Jetty lib/ext. So, I am planning to live with the stack trace and just wanted to share this for any future work on the dynamic solr.xml and chroot lookups or in case I am missing some work-around. Thanks!
Re: Faceting and Grouping Performance Degradation in Solr 5
Below is some further testing. This was done in an environment that had no other queries or updates during testing. We ran through several scenarios so I pasted this with HTML formatting below so you may view this as a table. Sorry if you have to pull this out into a different file for viewing, but I did not want the formatting to be messed up. The times are average times in milliseconds. Same test methodology as above except there was a 5 minute warmup and a 15 minute test. Note that both the segment and deletions were recorded from only 1 out of 2 of the shards so we cannot try to extrapolate a function between them and the outcome. In other words, just view them as "non-optimized" versus "optimized" and "has deletions" versus "no deletions". The only exceptions are the 0 deletes were true for both shards and the 1 segment and 8 segment cases were true for both shards. A few of the tests were repeated as well. The only conclusion that I could draw is that the number of segments and the number of deletes appear to greatly influence the response times, at least more than any difference in Solr version. There also appears to be some external contributor to variancemaybe network, etc. Thoughts? Date9/29/20169/29/20169/29/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/20169/30/201610/3/201610/3/201610/3/201610/3/2016Solr Version5.5.25.5.24.8.14.8.14.8.15.5.25.5.25.5.25.5.25.5.25.5.25.5.25.5.24.8.14.8.14.8.14.8.1Deleted Docs578735787317695859369459369457873578735787357873Segment Count34341827273434343488118811facet.method=uifYESYESN/AN/AN/AYESYESNONONOYESYESNON/AN/AN/AN/AScenario #119821014518619020820921020610914273701601098385Scenario #29288596258727077746873636166545251 On Wed, Sep 28, 2016 at 4:44 PM, Solr User <solr...@gmail.com> wrote: > I plan to re-test this in a separate environment that I have more control > over and will share the results when I can. > > On Wed, Sep 28, 2016 at 3:37 PM, Solr User <solr...@gmail.com> wrote: > >> Certainly. And I would of course welcome anyone else to test this for >> themselves especially with facet.method=uif to see if that has indeed >> bridged the gap between Solr 4 and Solr 5. I would be very happy if my >> testing is invalid due to variance, problem in process, etc. One thing I >> was pondering is if I should force merge the index to a certain amount of >> segments because indexing yields a random number of segments and >> deletions. The only thing stopping me short of doing that were >> observations of longer Solr 4 times even with more deletions and similar >> number of segments. >> >> We use Soasta as our testing tool. Before testing, load is sent for >> 10-15 minutes to make sure any Solr caches have stabilized. Then the test >> is run for 30 minutes of steady volume with Scenario #1 tested at 15 >> req/sec and Scenario #2 tested at 100 req/sec. Each request is different >> with input being pulled from data files. The requests are repeatable test >> to test. >> >> The numbers posted above are average response times as reported by >> Soasta. However, respective time differences are supported by Splunk which >> indexes the Solr logs and Dynatrace which is instrumented on one of the >> JVM's. >> >> The versions are deployed to the same machines thereby overlaying the >> previous installation. Going Solr 4 to Solr 5, full indexing is run with >> the same input data. Being in SolrCloud mode, the full indexing comprises >> of indexing all documents and then deleting any that were not touched. >> Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not >> load with a Solr 5 index. Testing Solr 4 after reverting yields the same >> results as the previous Solr 4 test. >> >> >> On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen <t...@statsbiblioteket.dk> >> wrote: >> >>> On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote: >>> > Further testing indicates that any performance difference is not due >>> > to deletes. Both Solr 4.8.1 and Solr 5.5.2 benefited from removing >>> > deletes. >>> >>> Sanity check: Could you describe how you test? >>> >>> * How many queries do you issue for each test? >>> * Are each query a new one or do you re-use the same query? >>> * Do you discard the first X calls? >>> * Are the numbers averages, medians or something third? >>> * What do you do about disk cache? >>> * Are both Solr's on the same machine? >>> * Do they use the same index? >>> * Do you alternate between testing 4.8.1 and 5.5.2 first? >>> >>> - Toke Eskildsen, State and University Library, Denmark >>> >> >> >
Re: Faceting and Grouping Performance Degradation in Solr 5
I plan to re-test this in a separate environment that I have more control over and will share the results when I can. On Wed, Sep 28, 2016 at 3:37 PM, Solr User <solr...@gmail.com> wrote: > Certainly. And I would of course welcome anyone else to test this for > themselves especially with facet.method=uif to see if that has indeed > bridged the gap between Solr 4 and Solr 5. I would be very happy if my > testing is invalid due to variance, problem in process, etc. One thing I > was pondering is if I should force merge the index to a certain amount of > segments because indexing yields a random number of segments and > deletions. The only thing stopping me short of doing that were > observations of longer Solr 4 times even with more deletions and similar > number of segments. > > We use Soasta as our testing tool. Before testing, load is sent for 10-15 > minutes to make sure any Solr caches have stabilized. Then the test is run > for 30 minutes of steady volume with Scenario #1 tested at 15 req/sec and > Scenario #2 tested at 100 req/sec. Each request is different with input > being pulled from data files. The requests are repeatable test to test. > > The numbers posted above are average response times as reported by > Soasta. However, respective time differences are supported by Splunk which > indexes the Solr logs and Dynatrace which is instrumented on one of the > JVM's. > > The versions are deployed to the same machines thereby overlaying the > previous installation. Going Solr 4 to Solr 5, full indexing is run with > the same input data. Being in SolrCloud mode, the full indexing comprises > of indexing all documents and then deleting any that were not touched. > Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not > load with a Solr 5 index. Testing Solr 4 after reverting yields the same > results as the previous Solr 4 test. > > > On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen <t...@statsbiblioteket.dk> > wrote: > >> On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote: >> > Further testing indicates that any performance difference is not due >> > to deletes. Both Solr 4.8.1 and Solr 5.5.2 benefited from removing >> > deletes. >> >> Sanity check: Could you describe how you test? >> >> * How many queries do you issue for each test? >> * Are each query a new one or do you re-use the same query? >> * Do you discard the first X calls? >> * Are the numbers averages, medians or something third? >> * What do you do about disk cache? >> * Are both Solr's on the same machine? >> * Do they use the same index? >> * Do you alternate between testing 4.8.1 and 5.5.2 first? >> >> - Toke Eskildsen, State and University Library, Denmark >> > >
Re: Faceting and Grouping Performance Degradation in Solr 5
Certainly. And I would of course welcome anyone else to test this for themselves especially with facet.method=uif to see if that has indeed bridged the gap between Solr 4 and Solr 5. I would be very happy if my testing is invalid due to variance, problem in process, etc. One thing I was pondering is if I should force merge the index to a certain amount of segments because indexing yields a random number of segments and deletions. The only thing stopping me short of doing that were observations of longer Solr 4 times even with more deletions and similar number of segments. We use Soasta as our testing tool. Before testing, load is sent for 10-15 minutes to make sure any Solr caches have stabilized. Then the test is run for 30 minutes of steady volume with Scenario #1 tested at 15 req/sec and Scenario #2 tested at 100 req/sec. Each request is different with input being pulled from data files. The requests are repeatable test to test. The numbers posted above are average response times as reported by Soasta. However, respective time differences are supported by Splunk which indexes the Solr logs and Dynatrace which is instrumented on one of the JVM's. The versions are deployed to the same machines thereby overlaying the previous installation. Going Solr 4 to Solr 5, full indexing is run with the same input data. Being in SolrCloud mode, the full indexing comprises of indexing all documents and then deleting any that were not touched. Going Solr 5 back to Solr 4, the snapshot is restored since Solr 4 will not load with a Solr 5 index. Testing Solr 4 after reverting yields the same results as the previous Solr 4 test. On Wed, Sep 28, 2016 at 4:02 AM, Toke Eskildsen <t...@statsbiblioteket.dk> wrote: > On Tue, 2016-09-27 at 15:08 -0500, Solr User wrote: > > Further testing indicates that any performance difference is not due > > to deletes. Both Solr 4.8.1 and Solr 5.5.2 benefited from removing > > deletes. > > Sanity check: Could you describe how you test? > > * How many queries do you issue for each test? > * Are each query a new one or do you re-use the same query? > * Do you discard the first X calls? > * Are the numbers averages, medians or something third? > * What do you do about disk cache? > * Are both Solr's on the same machine? > * Do they use the same index? > * Do you alternate between testing 4.8.1 and 5.5.2 first? > > - Toke Eskildsen, State and University Library, Denmark >
Re: Faceting and Grouping Performance Degradation in Solr 5
Further testing indicates that any performance difference is not due to deletes. Both Solr 4.8.1 and Solr 5.5.2 benefited from removing deletes. The times appear to converge on an optimized index. Below are the details. Not sure what else to make of this at this point other than moving forward with an upgrade with an optimized index wherever possible. Scenario #1: Using facet.method=uif with faceting on several multi-valued fields. 4.8.1 (with deletes): 115 ms 5.5.2 (with deletes): 155 ms 4.8.1 (without deletes): 104 ms 5.5.2 (without deletes): 125 ms 4.8.1 (1 segment without deletes): 55 ms 5.5.2 (1 segment without deletes): 44 ms Scenario #2: Using facet.method=enum with faceting on several multi-valued fields. These fields are different than Scenario #1 and perform much better with enum hence that method is used instead. 4.8.1 (with deletes): 38 ms 5.5.2 (with deletes): 49 ms 4.8.1 (without deletes): 35 ms 5.5.2 (without deletes): 42 ms 4.8.1 (1 segment without deletes): 28 ms 5.5.2 (1 segment without deletes): 34 ms On Tue, Sep 27, 2016 at 3:45 AM, Alessandro Benedetti <abenede...@apache.org > wrote: > Hi ! > At the time we didn't investigate the deletion implication at all. > This can be interesting. > if you proceed with your investigations and discover what changed in the > deletion approach, I would be more than happy to help! > > Cheers > > On Mon, Sep 26, 2016 at 10:59 PM, Solr User <solr...@gmail.com> wrote: > > > Thanks again for your work on honoring the facet.method. I have an > > observation that I would like to share and get your feedback on if > > possible. > > > > I performance tested Solr 5.5.2 with various facet queries and the only > way > > I get comparable results to Solr 4.8.1 is when I expungeDeletes. Is it > > possible that Solr 5 is not as efficiently ignoring deletes as Solr 4? > > Here are the details. > > > > Scenario #1: Using facet.method=uif with faceting on several > multi-valued > > fields. > > 4.8.1 (with deletes): 115 ms > > 5.5.2 (with deletes): 155 ms > > 5.5.2 (without deletes): 125 ms > > 5.5.2 (1 segment without deletes): 44 ms > > > > Scenario #2: Using facet.method=enum with faceting on several > multi-valued > > fields. These fields are different than Scenario #1 and perform much > > better with enum hence that method is used instead. > > 4.8.1 (with deletes): 38 ms > > 5.5.2 (with deletes): 49 ms > > 5.5.2 (without deletes): 42 ms > > 5.5.2 (1 segment without deletes): 34 ms > > > > > > > > On Tue, May 31, 2016 at 11:57 AM, Alessandro Benedetti < > > abenede...@apache.org> wrote: > > > > > Interesting developments : > > > > > > https://issues.apache.org/jira/browse/SOLR-9176 > > > > > > I think we found why term Enum seems slower in recent Solr ! > > > In our case it is likely to be related to the commit I mention in the > > Jira. > > > Have a check Joel ! > > > > > > On Wed, May 25, 2016 at 12:30 PM, Alessandro Benedetti < > > > abenede...@apache.org> wrote: > > > > > > > I am investigating this scenario right now. > > > > I can confirm that the enum slowness is in Solr 6.0 as well. > > > > And I agree with Joel, it seems to be un-related with the famous > > faceting > > > > regression :( > > > > > > > > Furthermore with the legacy facet approach, if you set docValues for > > the > > > > field you are not going to be able to try the enum approach anymore. > > > > > > > > org/apache/solr/request/SimpleFacets.java:448 > > > > > > > > if (method == FacetMethod.ENUM && sf.hasDocValues()) { > > > > // only fc can handle docvalues types > > > > method = FacetMethod.FC; > > > > } > > > > > > > > > > > > I got really horrible regressions simply using term enum in both > Solr 4 > > > > and Solr 6. > > > > > > > > And even the most optimized fcs approach with docValues and > > > > facet.threads=nCore does not perform as the simple enum in Solr 4 . > > > > > > > > i.e. > > > > > > > > For some sample queries I have 40 ms vs 160 ms and similar... > > > > I think we should open an issue if we can confirm it is not related > > with > > > > the other. > > > > A lot of people will continue using the legacy approach for a > while... > > > > > > > > On Wed, May 18, 2016 at 10:42 PM, Joel Bernstein <joels...@gmail.com > > &g
Re: Faceting and Grouping Performance Degradation in Solr 5
Thanks again for your work on honoring the facet.method. I have an observation that I would like to share and get your feedback on if possible. I performance tested Solr 5.5.2 with various facet queries and the only way I get comparable results to Solr 4.8.1 is when I expungeDeletes. Is it possible that Solr 5 is not as efficiently ignoring deletes as Solr 4? Here are the details. Scenario #1: Using facet.method=uif with faceting on several multi-valued fields. 4.8.1 (with deletes): 115 ms 5.5.2 (with deletes): 155 ms 5.5.2 (without deletes): 125 ms 5.5.2 (1 segment without deletes): 44 ms Scenario #2: Using facet.method=enum with faceting on several multi-valued fields. These fields are different than Scenario #1 and perform much better with enum hence that method is used instead. 4.8.1 (with deletes): 38 ms 5.5.2 (with deletes): 49 ms 5.5.2 (without deletes): 42 ms 5.5.2 (1 segment without deletes): 34 ms On Tue, May 31, 2016 at 11:57 AM, Alessandro Benedetti < abenede...@apache.org> wrote: > Interesting developments : > > https://issues.apache.org/jira/browse/SOLR-9176 > > I think we found why term Enum seems slower in recent Solr ! > In our case it is likely to be related to the commit I mention in the Jira. > Have a check Joel ! > > On Wed, May 25, 2016 at 12:30 PM, Alessandro Benedetti < > abenede...@apache.org> wrote: > > > I am investigating this scenario right now. > > I can confirm that the enum slowness is in Solr 6.0 as well. > > And I agree with Joel, it seems to be un-related with the famous faceting > > regression :( > > > > Furthermore with the legacy facet approach, if you set docValues for the > > field you are not going to be able to try the enum approach anymore. > > > > org/apache/solr/request/SimpleFacets.java:448 > > > > if (method == FacetMethod.ENUM && sf.hasDocValues()) { > > // only fc can handle docvalues types > > method = FacetMethod.FC; > > } > > > > > > I got really horrible regressions simply using term enum in both Solr 4 > > and Solr 6. > > > > And even the most optimized fcs approach with docValues and > > facet.threads=nCore does not perform as the simple enum in Solr 4 . > > > > i.e. > > > > For some sample queries I have 40 ms vs 160 ms and similar... > > I think we should open an issue if we can confirm it is not related with > > the other. > > A lot of people will continue using the legacy approach for a while... > > > > On Wed, May 18, 2016 at 10:42 PM, Joel Bernstein <joels...@gmail.com> > > wrote: > > > >> The enum slowness is interesting. It would appear on the surface to not > be > >> related to the FieldCache issue. I don't think the main emphasis of the > >> JSON facet API has been the enum approach. You may find using the JSON > >> facet API and eliminating the use of enum meets your performance needs. > >> > >> With the CollapsingQParserPlugin top_fc is definitely faster during > >> queries. The tradeoff is slower warming times and increased memory usage > >> if > >> the collapse fields are used in faceting, as faceting will load the > field > >> into a different cache. > >> > >> Joel Bernstein > >> http://joelsolr.blogspot.com/ > >> > >> On Wed, May 18, 2016 at 5:28 PM, Solr User <solr...@gmail.com> wrote: > >> > >> > Joel, > >> > > >> > Thank you for taking the time to respond to my question. I tried the > >> JSON > >> > Facet API for one query that uses facet.method=enum (since this one > has > >> a > >> > ton of unique values and performed better with enum) but this was way > >> > slower than even the slower Solr 5 times. I did not try the new API > >> with > >> > the non-enum queries though so I will give that a go. It looks like > >> Solr > >> > 5.5.1 also has a facet.method=uif which will be interesting to try. > >> > > >> > If these do not prove helpful, it looks like I will need to wait for > >> > SOLR-8096 to be resolved before upgrading. > >> > > >> > Thanks also for your comment on top_fc for the CollapsingQParser. I > use > >> > collapse/expand for some queries but traditional grouping for others > >> due to > >> > performance. It will be interesting to see if those grouping queries > >> > perform better now using CollapsingQParser with top_fc. > >> > > >> > On Wed, May 18, 2016 at 11:39 AM, Joel Bernstein <joels...@gmail.com> > >> > wrot
Re: Indexing a (File attached to a document)
Hi I am using MapReduceIndexer Tool to index data from hdfs , using morphlines as ETL tool. Specifying data path as xpath's in morphline file. sorry for delay -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-a-File-attached-to-a-document-tp4276334p4278730.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Faceting and Grouping Performance Degradation in Solr 5
Joel, Thank you for taking the time to respond to my question. I tried the JSON Facet API for one query that uses facet.method=enum (since this one has a ton of unique values and performed better with enum) but this was way slower than even the slower Solr 5 times. I did not try the new API with the non-enum queries though so I will give that a go. It looks like Solr 5.5.1 also has a facet.method=uif which will be interesting to try. If these do not prove helpful, it looks like I will need to wait for SOLR-8096 to be resolved before upgrading. Thanks also for your comment on top_fc for the CollapsingQParser. I use collapse/expand for some queries but traditional grouping for others due to performance. It will be interesting to see if those grouping queries perform better now using CollapsingQParser with top_fc. On Wed, May 18, 2016 at 11:39 AM, Joel Bernstein <joels...@gmail.com> wrote: > Yes, SOLR-8096 is the issue here. > > I don't believe indexing with docValues is going to help too much with > this. The enum slowness may not be related, but I'm not positive about > that. > > The major slowdowns are likely due to the removal of the top level > FieldCache from general use and the removal of the FieldValuesCache which > was used for multi-value field faceting. > > The JSON facet API covers all the functionality in the traditional > faceting, and it has been developed to be very performant. > > You may also want to see if Collapse/Expand can meet your applications > needs rather Grouping. It allows you to specify using a top level > FieldCache if performance is a blocker without it. > > > > > Joel Bernstein > http://joelsolr.blogspot.com/ > > On Wed, May 18, 2016 at 10:42 AM, Solr User <solr...@gmail.com> wrote: > > > Does anyone know the answer to this? > > > > On Wed, May 4, 2016 at 2:19 PM, Solr User <solr...@gmail.com> wrote: > > > > > I recently was attempting to upgrade from Solr 4.8.1 to Solr 5.4.1 but > > had > > > to abort due to average response times degraded from a baseline volume > > > performance test. The affected queries involved faceting (both enum > > method > > > and default) and grouping. There is a critical bug > > > https://issues.apache.org/jira/browse/SOLR-8096 currently open which I > > > gather is the cause of the slower response times. One concern I have > is > > > that discussions around the issue offer the suggestion of indexing with > > > docValues which alleviated the problem in at least that one reported > > case. > > > However, indexing with docValues did not improve the performance in my > > case. > > > > > > Can someone please confirm or correct my understanding that this issue > > has > > > no path forward at this time and specifically that it is already known > > that > > > docValues does not necessarily solve this? > > > > > > Thanks in advance! > > > > > > > > > > > >
Re: Faceting and Grouping Performance Degradation in Solr 5
Does anyone know the answer to this? On Wed, May 4, 2016 at 2:19 PM, Solr User <solr...@gmail.com> wrote: > I recently was attempting to upgrade from Solr 4.8.1 to Solr 5.4.1 but had > to abort due to average response times degraded from a baseline volume > performance test. The affected queries involved faceting (both enum method > and default) and grouping. There is a critical bug > https://issues.apache.org/jira/browse/SOLR-8096 currently open which I > gather is the cause of the slower response times. One concern I have is > that discussions around the issue offer the suggestion of indexing with > docValues which alleviated the problem in at least that one reported case. > However, indexing with docValues did not improve the performance in my case. > > Can someone please confirm or correct my understanding that this issue has > no path forward at this time and specifically that it is already known that > docValues does not necessarily solve this? > > Thanks in advance! > > >
Indexing a (File attached to a document)
Hi If I index a document with a file attachment attached to it in solr, can I visualise data of that attached file attachment also while querying that particular document? Please help me on this Thanks & Regards Vidya Nadella -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-a-File-attached-to-a-document-tp4276334.html Sent from the Solr - User mailing list archive at Nabble.com.
Faceting and Grouping Performance Degradation in Solr 5
I recently was attempting to upgrade from Solr 4.8.1 to Solr 5.4.1 but had to abort due to average response times degraded from a baseline volume performance test. The affected queries involved faceting (both enum method and default) and grouping. There is a critical bug https://issues.apache.org/jira/browse/SOLR-8096 currently open which I gather is the cause of the slower response times. One concern I have is that discussions around the issue offer the suggestion of indexing with docValues which alleviated the problem in at least that one reported case. However, indexing with docValues did not improve the performance in my case. Can someone please confirm or correct my understanding that this issue has no path forward at this time and specifically that it is already known that docValues does not necessarily solve this? Thanks in advance!
Re: Solr suggester throws error on core reload.
I want to use AnalyzingInfixLookupFactory for my autosuggestions. Any idea when this issue will get fixed? Do we have any workaround for this issue. - Nutch Solr User The ultimate search engine would basically understand everything in the world, and it would always give you the right thing. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-suggester-throws-error-on-core-reload-tp4220725p4222902.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr suggester throws error on core reload.
Hi Erick, Sorry for the confusion caused, Next time will be more careful while posting questions in forum. Actually we are using AnalyzingInfixLookupFactory for auto-suggestions. And currently is has open issue with core reload (https://issues.apache.org/jira/browse/SOLR-6246). So my question was related to resolution of this issue. - Nutch Solr User The ultimate search engine would basically understand everything in the world, and it would always give you the right thing. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-suggester-throws-error-on-core-reload-tp4220725p4223098.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: multiple but identical suggestions in autocomplete
You will need to call this service from UI as you are calling suggester component currently. (may be on every key-press event in text box). You will pass required parameters too. Service will internally form a solr suggester query and query Solr. From the returned response it will keep only unique suggestions from top N suggestions and return suggestions to UI. - Nutch Solr User The ultimate search engine would basically understand everything in the world, and it would always give you the right thing. -- View this message in context: http://lucene.472066.n3.nabble.com/multiple-but-identical-suggestions-in-autocomplete-tp4220055p4220953.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr suggester throws error on core reload.
I am using AnalyzingInfixSuggester for auto-suggest feature. but whenever I try to reload solr core following error is thrown , org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@E:\SSearch\SolrServer\solr-5.2.1\server\solr\ssearch\data\main-suggest\write.lock After restart everything works fine. What could be the reason for this? - Nutch Solr User The ultimate search engine would basically understand everything in the world, and it would always give you the right thing. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-suggester-throws-error-on-core-reload-tp4220725.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: multiple but identical suggestions in autocomplete
May be you are using DocumentDictionaryFactory because HighFrequencyDictionaryFactory will never return duplicate duplicate terms. We also had same problem with *DocumentDictionaryFactory + AnalyzingInfixSuggester* We have created one service between UI and Solr which groups duplicate suggestions. and returns unique list to UI with only contains unique suggestions. - Nutch Solr User The ultimate search engine would basically understand everything in the world, and it would always give you the right thing. -- View this message in context: http://lucene.472066.n3.nabble.com/multiple-but-identical-suggestions-in-autocomplete-tp4220055p4220727.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr suggester throws error on core reload.
I found existing issue here https://issues.apache.org/jira/browse/SOLR-6246 . It says fix version 5.2 but Resolution is unresolved. - Nutch Solr User The ultimate search engine would basically understand everything in the world, and it would always give you the right thing. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-suggester-throws-error-on-core-reload-tp4220725p4220730.html Sent from the Solr - User mailing list archive at Nabble.com.
Suggester always highlights suggestions even if we pass highlight=false
I am still experiencing https://issues.apache.org/jira/browse/SOLR-6648 issue with solr 5.2.1. even if i send highlight=false solr returns me highlighted suggestions. Any idea why this is happening? My configurations : *URL : *http://solrhost:solrpost/mycorename/suggest?suggest.dictionary=altSuggestersuggest.dictionary=mainSuggesterwt=jsonsuggest.q=treatmsuggest.count=20highlight=false *reponse : * { responseHeader: { status: 0, QTime: 6 }, suggest: { mainSuggester: { treatm: { numFound: 20, suggestions: [ { term: *Treatm*ent Refusal, weight: 0, payload: }, { term: Withholding *Treatm*ent, weight: 0, payload: }, { term: *Treatm*ent Refusal, weight: 0, payload: }, { term: Withholding *Treatm*ent, weight: 0, payload: } ] } }, altSuggester: { treatm: { numFound: 2, suggestions: [ { term: *treatm*ent, weight: 197, payload: }, { term: *treatm*ents, weight: 5, payload: } ] } } } } *My Configurations : * searchComponent name=suggest class=solr.SuggestComponent lst name=suggester str name=namemainSuggester/str str name=lookupImplAnalyzingInfixLookupFactory/str str name=dictionaryImplDocumentDictionaryFactory/str str name=fieldkeyphrases/str str name=suggestAnalyzerFieldTypetext_general/str str name=indexPathmain-suggest/str str name=buildOnStartuptrue/str /lst lst name=suggester str name=namealtSuggester/str str name=lookupImplAnalyzingInfixLookupFactory/str str name=dictionaryImplHighFrequencyDictionaryFactory/str str name=fieldtext/str str name=suggestAnalyzerFieldTypetext_general/str str name=indexPathalt-suggest/str str name=allTermsRequiredfalse/str str name=buildOnStartuptrue/str /lst /searchComponent requestHandler name=/suggest class=solr.SearchHandler startup=lazy lst name=defaults str name=suggesttrue/str str name=suggest.count10/str str name=suggest.dictionarymainSuggester/str /lst arr name=components strsuggest/str /arr /requestHandler - Nutch Solr User The ultimate search engine would basically understand everything in the world, and it would always give you the right thing. -- View this message in context: http://lucene.472066.n3.nabble.com/Suggester-always-highlights-suggestions-even-if-we-pass-highlight-false-tp4219846.html Sent from the Solr - User mailing list archive at Nabble.com.
Query token access in solr function queries
How can i access each query token seperately in function query . I want to pass each token to ttf function to get total term frequency for that token. Currently I have access to main query using $q parameter. Do I have to write some code to tokenize original query and add tokens as additional parameters to main query say t1,t2,t3 like this before sending query to Solr. Is there any other way to do this using existing solr functions ? one more questions is If I have to write my own function for this how should I return these tokens? - Nutch Solr User The ultimate search engine would basically understand everything in the world, and it would always give you the right thing. -- View this message in context: http://lucene.472066.n3.nabble.com/Query-token-access-in-solr-function-queries-tp4219695.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tips for faster indexing
I can confirm this behavior, seen when sending json docs in batch, never happens when sending one by one, but sporadic when sending batches. Like if sole/jetty drops couple of documents out of the batch. Regards On 21 Jul 2015, at 21:38, Vineeth Dasaraju vineeth.ii...@gmail.com wrote: Hi, Thank You Erick for your inputs. I tried creating batches of 1000 objects and indexing it to solr. The performance is way better than before but I find that number of indexed documents that is shown in the dashboard is lesser than the number of documents that I had actually indexed through solrj. My code is as follows: private static String SOLR_SERVER_URL = http://localhost:8983/solr/newcore ; private static String JSON_FILE_PATH = /home/vineeth/week1_fixed.json; private static JSONParser parser = new JSONParser(); private static SolrClient solr = new HttpSolrClient(SOLR_SERVER_URL); public static void main(String[] args) throws IOException, SolrServerException, ParseException { File file = new File(JSON_FILE_PATH); Scanner scn=new Scanner(file,UTF-8); JSONObject object; int i = 0; CollectionSolrInputDocument batch = new ArrayListSolrInputDocument(); while(scn.hasNext()){ object= (JSONObject) parser.parse(scn.nextLine()); SolrInputDocument doc = indexJSON(object); batch.add(doc); if(i%1000==0){ System.out.println(Indexed + (i+1) + objects. ); solr.add(batch); batch = new ArrayListSolrInputDocument(); } i++; } solr.add(batch); solr.commit(); System.out.println(Indexed + (i+1) + objects. ); } public static SolrInputDocument indexJSON(JSONObject jsonOBJ) throws ParseException, IOException, SolrServerException { CollectionSolrInputDocument batch = new ArrayListSolrInputDocument(); SolrInputDocument mainEvent = new SolrInputDocument(); mainEvent.addField(id, generateID()); mainEvent.addField(RawEventMessage, jsonOBJ.get(RawEventMessage)); mainEvent.addField(EventUid, jsonOBJ.get(EventUid)); mainEvent.addField(EventCollector, jsonOBJ.get(EventCollector)); mainEvent.addField(EventMessageType, jsonOBJ.get(EventMessageType)); mainEvent.addField(TimeOfEvent, jsonOBJ.get(TimeOfEvent)); mainEvent.addField(TimeOfEventUTC, jsonOBJ.get(TimeOfEventUTC)); Object obj = parser.parse(jsonOBJ.get(User).toString()); JSONObject userObj = (JSONObject) obj; SolrInputDocument childUserEvent = new SolrInputDocument(); childUserEvent.addField(id, generateID()); childUserEvent.addField(User, userObj.get(User)); obj = parser.parse(jsonOBJ.get(EventDescription).toString()); JSONObject eventdescriptionObj = (JSONObject) obj; SolrInputDocument childEventDescEvent = new SolrInputDocument(); childEventDescEvent.addField(id, generateID()); childEventDescEvent.addField(EventApplicationName, eventdescriptionObj.get(EventApplicationName)); childEventDescEvent.addField(Query, eventdescriptionObj.get(Query)); obj= JSONValue.parse(eventdescriptionObj.get(Information).toString()); JSONArray informationArray = (JSONArray) obj; for(int i = 0; iinformationArray.size(); i++){ JSONObject domain = (JSONObject) informationArray.get(i); SolrInputDocument domainDoc = new SolrInputDocument(); domainDoc.addField(id, generateID()); domainDoc.addField(domainName, domain.get(domainName)); String s = domain.get(columns).toString(); obj= JSONValue.parse(s); JSONArray ColumnsArray = (JSONArray) obj; SolrInputDocument columnsDoc = new SolrInputDocument(); columnsDoc.addField(id, generateID()); for(int j = 0; jColumnsArray.size(); j++){ JSONObject ColumnsObj = (JSONObject) ColumnsArray.get(j); SolrInputDocument columnDoc = new SolrInputDocument(); columnDoc.addField(id, generateID()); columnDoc.addField(movieName, ColumnsObj.get(movieName)); columnsDoc.addChildDocument(columnDoc); } domainDoc.addChildDocument(columnsDoc); childEventDescEvent.addChildDocument(domainDoc); } mainEvent.addChildDocument(childEventDescEvent); mainEvent.addChildDocument(childUserEvent); return mainEvent; } I would be grateful if you could let me know what I am missing. On Sun, Jul 19, 2015 at 2:16 PM, Erick Erickson erickerick...@gmail.com wrote: First thing is it looks like you're only sending one document at a time, perhaps with child objects. This is not optimal at all. I usually batch my docs up in groups of 1,000, and there is anecdotal evidence that there may (depending on the docs) be some gains above that number. Gotta balance the batch size off against how bug the docs are of course. Assuming that you really are calling this method for one doc (and
Re: Basic auth
I followed this guide: http://learnsubjects.drupalgardens.com/content/how-place-http-authentication-solr But there is some something wrong, can anyone help or refer to a guide on how to setup http basic auth? Regards On 19 Jul 2015, at 01:10, solr.user.1...@gmail.com wrote: SOLR-4470 is about: Support for basic auth in internal Solr requests. What is wrong with the internal requests? Can someone help simplify, would it ever be possible to run with basic auth? What work arounds? Regards
Basic auth
SOLR-4470 is about: Support for basic auth in internal Solr requests. What is wrong with the internal requests? Can someone help simplify, would it ever be possible to run with basic auth? What work arounds? Regards
Re: Programmatically find out if node is overseer
Hi Anshum what do you mean by: ideally, there shouldn't be a point where you have multiple active Overseers in a single cluster How can multiple Overseers happen? And what are the consequences? Regards On 17 Jul 2015, at 19:37, Anshum Gupta ans...@anshumgupta.net wrote: ideally, there shouldn't be a point where you have multiple active Overseers in a single cluster
Re: Setup cloud collection
Thanks Shawn, but don't want to build something in front of Solr cloud to help Solr assign leader role to distribute load of indexing. Instead of doing this manual step (rebalance leaders) maybe one host should not take the leader role of multiple shards for same collection if the number of live nodes are equal to number of shards. But assuming that when you say it will happen over time, Maybe I'll continue indexing and see that leaders will be rebalanced soon. Regards On 16 Jul 2015, at 14:57, Shawn Heisey apa...@elyograg.org wrote: On 7/16/2015 5:51 AM, SolrUser2015 wrote: Hi, I'm new to solr! So downloaded version 5.2 and modified the solr file so it allows me to create a 5 node cluster: 5 shards and replication factor 3 Now I see that one node is marked as leader for 3 shards. So my question is, how can 1 node serve requests for 3 shards, wouldn't that be uneven distribution of load? SolrCloud will distribute individual queries to different replicas, so over time the entire cloud will be used. The leader role shouldn't affect queries, that role is mostly there for indexing and fault handling. If you are really concerned about this, you can assign preferred leaders and then ask Solr to reshuffle them. I have never used this functionality. Here's the documentation on it: https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-RebalanceLeaders Thanks, Shawn
Re: Setup cloud collection
Thank you, very good explanation. Regards On 16 Jul 2015, at 17:12, Shawn Heisey apa...@elyograg.org wrote: On 7/16/2015 7:47 AM, solr.user.1...@gmail.com wrote: Thanks Shawn, but don't want to build something in front of Solr cloud to help Solr assign leader role to distribute load of indexing. Instead of doing this manual step (rebalance leaders) maybe one host should not take the leader role of multiple shards for same collection if the number of live nodes are equal to number of shards. But assuming that when you say it will happen over time, Maybe I'll continue indexing and see that leaders will be rebalanced soon. Unless you have a fairly major event (like Solr restarting or an operation taking longer than zkClientTimeout) your leaders will never change. It's a semi-permanent role. When a qualifying event happens, SolrCloud does an election process to determine the leader, but elections do not happen unless you force them with a REBALANCELEADERS action or one of several errors occurs. You don't have to build anything in front of Solr. You simply have to assign a preferred leader for each shard, an action that can be done with an HTTP call in a browser. I don't think we have anything in the admin UI to assign preferred leaders ... I will look into it and open an issue if necessary. The thing that I'm saying will happen over time is that all replicas will be used for queries. If you send a thousand queries, you'll find that they will be divided fairly evenly among all replicas. The fact that you have one node as leader for three of your shards is not very much of a big deal, but if you really want to change it, you can do so with the preferred leader feature. Thanks, Shawn
Per field mm parameter
How to specify per field mm parameter in edismax query. - Nutch Solr User The ultimate search engine would basically understand everything in the world, and it would always give you the right thing. -- View this message in context: http://lucene.472066.n3.nabble.com/Per-field-mm-parameter-tp4208325.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sorting on multivalues field in Solr
Thanks Alex that was really useful. - Nutch Solr User The ultimate search engine would basically understand everything in the world, and it would always give you the right thing. -- View this message in context: http://lucene.472066.n3.nabble.com/Sorting-on-multivalued-field-in-Solr-tp4204996p4205017.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.10.2 Found core but I get No cores available in dashboard page
interesting. unfortunately, time to take a break and so will have to deal with this in the new year tho. Merry Christmas and thanks for all the time and effort you guys put in answering all of our questions. It is much appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4175423.html Sent from the Solr - User mailing list archive at Nabble.com.
what does this write.lock does not exist mean??
I looked for messages on the following error but dont see anything in nabble. Does anyone know what this error means and how to correct it?? SEVERE: java.lang.IllegalArgumentException: /var/apache/my-solr-slave/solr/coreA/data/index/write.lock does not exist I also occasionally see error messages about specific index files such as this: SEVERE: null:java.lang.IllegalArgumentException: /var/apache/my_solr-slave/solr/coreA/data/index/_md39_1.del does not exist I am using Solr 4.0.0, with Java 1.7.0_11-b21 and tomcat 7.0.34, running on a 12GB centos box; we have master/slave setup with multiple slave searchers per indexer. any thoughts on this would be appreciated -- View this message in context: http://lucene.472066.n3.nabble.com/what-does-this-write-lock-does-not-exist-mean-tp4175291.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.10.2 Found core but I get No cores available in dashboard page
I did find out the cause of my problems. Turns out the problem wasn't due to the solrconfig.xml file; it was in the schema.xml file I spent a fair bit of time making my solrconfig closer to the default solrconfig.xml in the solr download; when that didnt get rid of the error I went back to the only other file we had that was different Turns out the line that was causing the problem was the middle line in this location_rpt fieldtype definition: fieldType name=location_rpt class=solr.SpatialRecursivePrefixTreeFieldType spatialContextFactory=com.spatial4j.core.context.jts.JtsSpatialContextFactory geo=true distErrPct=0.025 maxDistErr=0.09 units=degrees / The spatialContextFactory line caused the core to not load even tho no error/warning messages were shown. I missed that extra line somehow; mea culpa. Anyhow, I really appreciate the responses/help I got on this issue. many thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4174118.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.10.2 Found core but I get No cores available in dashboard page
my apologies for the lack of clarity our internal name for the project to upgrade solr from 4.0 to 4.10.2 is helios and so we named our test folder heliosearch. I was not even aware of the github project Heliosearch, and nothing we are doing is related to it. to simplify things for this post, we simplified things so that we have one solr instance but two cores; coreX contains the collection1 files/folders as per the downloaded solr 4.10.2 package, while coreA uses the same collection1 files/folders but with schema.xml and solrconfig.xml changes to meet our needs so file and foldername-wise, here is what we did: 1. C:\SOLR\solr-4.10.2.zip\solr-4.10.2\example renamed to C:\SOLR\helios-4.10.2\Master 2. renamed example\solr\collection1 to example\solr\coreX; no files modified here 3. copied example\solr\coreX to example\solr\coreA 4. modified the coreA schema to match our current production schema; ie our field names, etc 5. modified the coreA solrconfig.xml to meet our needs (see below) here are the solrconfig.xml changes we made to coreA 1. directoryFactory name=DirectoryFactory class=${solr.directoryFactory:solr.StandardDirectoryFactory} 2. mergeFactor4/mergeFactor 3. reopenReadersfalse/reopenReaders 4. infoStreamfalse/infoStream 5. commented out autoCommit section 6. commented out autoSoftCommit section 7. commented out the cache name=perSegFilter... section 8. maxWarmingSearchers4/maxWarmingSearchers 9. requestParsers enableRemoteStreaming=true multipartUploadLimitInKB=2048000 / 10. requestHandler name=/select class=solr.SearchHandler contains arr name=last-componentsstrgeocluster/str/arr 11. commented out these sections: requestHandler name=/browse class=solr.SearchHandler requestHandler name=/spell class=solr.SearchHandler startup=lazy requestHandler name=/suggest class=solr.SearchHandler startup=lazy searchComponent name=suggest class=solr.SuggestComponent searchComponent name=tvComponent class=solr.TermVectorComponent/ requestHandler name=/tvrh class=solr.SearchHandler startup=lazy searchComponent name=quot;clusteringquot; ... lt;requestHandler name=quot;/clusteringquot;... lt;searchComponent name=quot;elevatorquot; class=quot;solr.QueryElevationComponentquot; requestHandler name=/elevate class=solr.SearchHandler startup=lazy queryResponseWriter name=xslt class=solr.XSLTResponseWriter here are the schema.xml changes we made to our copy of the downloaded solr 4.10.2 package (aside from replacing the example fields provided in the downloaded solr 4.10.2): 1. schema name=Helios version=1.5 2. removed the example fields provided in the downloaded solr 4.10.2 3. delete various types we dont use in our current schemas 4. added fieldtypes that are in our current solr 4.0 instances 5. added various fieldtypes that are in our current solr 4.0 instances 6. readded the text field as apparently required:field name=text type=text_general indexed=true stored=false multiValued=true/ also note that we are using java 1.7.0_67 and jetty-8.1.10.v20130312 all in all, I dont see anything that we have done that would keep the cores from being discovered. hope that helps. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173831.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.10.2 Found core but I get No cores available in dashboard page
small correction; coreX (the one with the unmodified schema.xml and solrconfig.xml) IS seen by solr and appears on the solr admin page, but coreA (which has our modified schema and solrconfig) is found by solr but is not shown in the solr admin page: 1494 [main] INFO org.apache.solr.core.CoresLocator û Looking for core definitions underneath C:\SOLR\helios-4.10.2\Master\solr 1502 [main] INFO org.apache.solr.core.CoresLocator û Found core coreA in C:\SOLR\helios-4.10.2\Master\solr\coreA\ 1502 [main] INFO org.apache.solr.core.CoresLocator û Found core coreX in C:\SOLR\helios-4.10.2\Master\solr\coreX\ 1503 [main] INFO org.apache.solr.core.CoresLocator û Found 2 core definitions -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173832.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.10.2 Found core but I get No cores available in dashboard page
yes, have triple checked the schema and solrconfig XML; various tools have indicated the XML is valid no missing types or dupes, and have not disabled the admin handler as mentioned in my most recent response, I can see the coreX core (the renamed and unmodified collection1 core from the downloaded package) and query it with no issues, but coreA (whch has our specific schema and solrconfig changes) is not showing in the admin interface and cannot be queried (I get a 404) both cores are located in the same solr folder. appreciate the suggestions; looks like I will need to gradually move my schema and core changes towards the collection1 content and see where things start working; will take a while...sigh will let you know what I find out. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173839.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.10.2 Found core but I get No cores available in dashboard page
Chris, will get the schema and solrconfig ready for uploading. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173840.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr 4.10.2 Found core but I get No cores available in dashboard page
org.apache.solr.servlet.SolrDispatchFilter û user.dir=C:\SOLR\helios-4.10.2\Instance\Master 1864 [main] INFO org.apache.solr.servlet.SolrDispatchFilter û SolrDispatchFilter.init() done 1885 [main] INFO org.eclipse.jetty.server.AbstractConnector û Started SocketConnector@0.0.0.0:8086 9895 [qtp618640318-19] INFO org.apache.solr.servlet.SolrDispatchFilter û [admin] webapp=null path=/admin/cores params={indexInfo=false_=1418236560709wt=json} status=0 QTime=17 9931 [qtp618640318-19] INFO org.apache.solr.servlet.SolrDispatchFilter û [admin] webapp=null path=/admin/info/system params={_=1418236560885wt=json} status=0 QTime=2 -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.10.2 Found core but I get No cores available in dashboard page
definitely puzzling. am running this on my local box (ie using http://localhost:8086/solr) and it is the only running instance of any solr. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173618.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.10.2 Found core but I get No cores available in dashboard page
log tab shows No Events available no errors at all in the CMD console my test version hasnt got any logging changes that are already in the default solr 4.10.2 package some kind of warning or error message would have been helpful... -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-4-10-2-Found-core-but-I-get-No-cores-available-in-dashboard-page-tp4173602p4173627.html Sent from the Solr - User mailing list archive at Nabble.com.
confused about how to set a solr query timeout when using tomcat
I inherited a set of some old 1.4x Solrs running under tomcat6/java6 while I will eventually upgrade them to a more recent solr/tomcat/java, I am unable to do in near term one of my priority fixes tho is to implement some sort of timeout for solr queries that exceed 1000ms (or so); ie if the query takes longer than that, I want to abort that query (returning nothing or an error or whatever) so that solr can process other queries. while we have optimized our queries for an average 50ms response time, we do occasionally see some that can run between 10 and 100 seconds. I know that this version of Solr itself doesn't have a built in timeout mechanism, which leaves me with figuring out what to do (it seems to me that I have to figure out how to get Tomcat to timeout the queries somehow) note that I DID google until my fingers hurt and have not been able to find clear (at least not clear to me) instructions on how do to so Details: 1. the setup uses the DataImportHandler to updates Solr, and updates occur often and can be quite large; we use batchSize=1 and autoCommit=true with doc size being around 1400 to 1600 bytes. I dont want the timeout to kill the imports of course 2. I tried adding a timeout param to the tomcat configuration but it doesnt work: Connector port=quot;8086quot; protocol=quot;HTTP/1.1quot; connectionTimeout=quot;2quot; protocol=quot;HTTP/1.1quot; timeout=quot;1quot; / any thoughts?? can anyone point me in the right direction on how to implement this? any help appreciated. thx in advance -- View this message in context: http://lucene.472066.n3.nabble.com/confused-about-how-to-set-a-solr-query-timeout-when-using-tomcat-tp4171363.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: confused about how to set a solr query timeout when using tomcat
millions of documents per shard, with a number of shards ~40gb index folder size 12gb of heap on a 16gb machine (this old Solr doesnt use O/S mem space like 4.x does) servers are hosted internally, and are powerful understood. as mentioned, we tuned the bulk of our queries to run very quickly (50ms or less), but we do occasionally see queries (ie internal ones for statistics/tests) that can be excessively long running Basically, we want to be able to enforce how long those long running queries are allowed to run -- View this message in context: http://lucene.472066.n3.nabble.com/confused-about-how-to-set-a-solr-query-timeout-when-using-tomcat-tp4171363p4171368.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: confused about how to set a solr query timeout when using tomcat
yes, that solr queries continue to run the query on the solr server even after a connection is broken was my understanding and concern as well I was hoping I had overlooked or missed something in Solr or Tomcat documentation that might do the job it is unfortunate if anyone else can think of something, let me know -- View this message in context: http://lucene.472066.n3.nabble.com/confused-about-how-to-set-a-solr-query-timeout-when-using-tomcat-tp4171363p4171379.html Sent from the Solr - User mailing list archive at Nabble.com.
how do I stop queries from being logged in two different log files in Tomcat
hi all. We have a number of solr 1.4x and solr 4.x installations running on tomcat We are trying to standardize the content of our log files so that we can automate log analysis; we dont want to use log4j at this time. In our solr 1.4x installations, the following conf\logging.properties file is correctly logging queries only to our localhost_access_log.xxx.txt files, and tomcat type messages to our catalina.xxx.log files However in our solr 4.x installations, we are seeing solr queries being logged in both our localhost_access_log.xxx.txt files and our catalina.xxx.log files. We dont want the solr queries logged in catalina.xxx.log files since it more than doubles the amount of logging being done and doubles the disk space requirement (which can be huge). Is there a way to configure logging, without using log4j (for now), to only log solr queries to the localhost_access_log.xxx.txt files?? I have looked at various tomcat logging info and dont see how to do it. Any help appreciated. # Licensed to the Apache Software Foundation (ASF) under one or more # contributor license agreements. See the NOTICE file distributed with # this work for additional information regarding copyright ownership. # The ASF licenses this file to You under the Apache License, Version 2.0 # (the License); you may not use this file except in compliance with # the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an AS IS BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. handlers = 1catalina.org.apache.juli.FileHandler, 2localhost.org.apache.juli.FileHandler, 3manager.org.apache.juli.FileHandler, java.util.logging.ConsoleHandler .handlers = 1catalina.org.apache.juli.FileHandler, java.util.logging.ConsoleHandler # Handler specific properties. # Describes specific configuration info for Handlers. 1catalina.org.apache.juli.FileHandler.level = FINE 1catalina.org.apache.juli.FileHandler.directory = ${catalina.base}/logs 1catalina.org.apache.juli.FileHandler.prefix = catalina. 2localhost.org.apache.juli.FileHandler.level = FINE 2localhost.org.apache.juli.FileHandler.directory = ${catalina.base}/logs 2localhost.org.apache.juli.FileHandler.prefix = localhost. 3manager.org.apache.juli.FileHandler.level = FINE 3manager.org.apache.juli.FileHandler.directory = ${catalina.base}/logs 3manager.org.apache.juli.FileHandler.prefix = manager. java.util.logging.ConsoleHandler.level = WARNING java.util.logging.ConsoleHandler.formatter = java.util.logging.SimpleFormatter # Facility specific properties. # Provides extra control for each logger. org.apache.catalina.core.ContainerBase.[Catalina].[localhost].level = INFO org.apache.catalina.core.ContainerBase.[Catalina].[localhost].handlers = 2localhost.org.apache.juli.FileHandler org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/manager].level = INFO org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/manager].handlers = 3manager.org.apache.juli.FileHandler # For example, set the org.apache.catalina.util.LifecycleBase logger to log # each component that extends LifecycleBase changing state: #org.apache.catalina.util.LifecycleBase.level = FINE -- View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-stop-queries-from-being-logged-in-two-different-log-files-in-Tomcat-tp4168587.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how do I stop queries from being logged in two different log files in Tomcat
awesome Mike. that does exactly what I want. many thanks -- View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-stop-queries-from-being-logged-in-two-different-log-files-in-Tomcat-tp4168587p4168597.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how do I get search for fort st john to match ft saint john
thanks guys. unfortunately the solr that contains this schema/data is in a legacy system that requires the fields to not be changed. we will, hopefully in the near future, be able to look at redesigning the schema. alternatively, I could look at boning up on Java (which I havent used in a long time) and see if I can write a subword synonym plugin of some sort to perform this type of synonyming thanks anyhow. -- View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128914.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how do I get search for fort st john to match ft saint john
Hi Eric. No, that doesnt fix the problem either (I have tested this previously and did so again just now) Since the PatternTokenizerFactory is not tokenizing on whitespace(by design since I want the user to search by phrase), the phrase marina former fort ord (for example) does not get turned into four tokens (marina, former, fort and ord), and so the SynonymFilterFactory does not create synonyms for them (by design) the original question remains: is there a tokenizer/plugin that will allow me to synonym words in a unbroken phrase? note: the reason I dont want to tokenize the data by whitespace is that it would cause way to many results to get returned if I, for example, search on new or st ... However, I still want to be able to include fort saint john in the results if the user searches for ft st john or fort st john or ... -- View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128640.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how do I get search for fort st john to match ft saint john
Hi Eric. Sorry, been away. The city_index_synonyms.txt file is pretty small as it contains just these two lines: saint,st,ste fort,ft There is nothing at all in the city_query_synonyms.txt file, and it isn't used either. My understanding is that solr would create the appropriate synonym entries in the index and so treat fort and ft as equal if you have a simple one line schema (that uses the type definition from my original email) and index fort saint john, does it work for you? i.e. does it return results if you search for ft st john and ft saint john and fort st john? My Solr 4.6.1 instance doesn't. I am wondering if synonyms just don't work for all/some words in a phrase -- View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4128500.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how do I get search for fort st john to match ft saint john
yes, and I can see that (as expected) per the field type: 1. the indexed value is lowercased 2. stripped of non-alpha characters 3. multiple consecutive whitespace is removed 4. trimmed 5. goes thru the SynonymFilterFactory where: a. the indexed value of Marina/Former Fort Ord is marina former fort ord b. the search value of Marina/Former Ft Ord is marina former ft ord This I already knew. My question wasn't why they dont match, it is: how do I get search for fort st john to match ft saint john. ie is there a way to index/search that would allow the search to match. the SynonymFilterFactory during indexing does not create a matching term for marina former ft ord, which I think it would do if the indexed value was a word instead of a phrase (ie fort vs Marina/Former Fort Ord) (note that my terms/understanding of how this works may be incorrect, hence my request for assistance/understanding) -- View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231p4127764.html Sent from the Solr - User mailing list archive at Nabble.com.
how do I get search for fort st john to match ft saint john
I have been using solr for a while but started running across situations where synonyms are required. the example I have is group of city names that look like Fort Saint John (a city), in a text field. Users may want to search for Ft St John or Fort St John or Ft Saint John however My attempted solution was to create a type that uses SynonymFilterFactory and a text file of city based synonyms like this: saint,st,ste fort,ft this doesnt work however and I am not sure I understand why. any help appreciated. thx p.s. I am using Solr 4.6.1 and here is the field type definition from the solrconfig.xml: fieldtype name=geo_search_area_text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.PatternTokenizerFactory pattern=[\^\-,|] group=-1 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PatternReplaceFilterFactory pattern=[^\w\s] replacement= replace=all / filter class=solr.PatternReplaceFilterFactory pattern=[\s]{2,} replacement= replace=all / filter class=solr.TrimFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=city_index_synonyms.txt ignoreCase=true expand=true / /analyzer analyzer type=query tokenizer class=solr.PatternTokenizerFactory pattern=[\^\-,|] group=-1 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.PatternReplaceFilterFactory pattern=[^\w\s] replacement= replace=all / filter class=solr.PatternReplaceFilterFactory pattern=[\s]{2,} replacement= replace=all / filter class=solr.TrimFilterFactory/ /analyzer /fieldtype -- View this message in context: http://lucene.472066.n3.nabble.com/how-do-I-get-search-for-fort-st-john-to-match-ft-saint-john-tp4127231.html Sent from the Solr - User mailing list archive at Nabble.com.
does shards.tolerant deal with this scenario?
hi all I have some questions re shards.tolerant=true and timeAllowed=xxx I have seen situations where shards.tolerant=true works; if one of the shards specified in a query is dead, shards.tolerant seems to work and I get results from the non-dead shards However, if one of the shards goes down during the execution of a query, I have to wait for the primary searcher (the solr sending the request to the shards) to timeout, which can last minutes. ie shards.tolerant doesn't seem to work question 1: is timeAllowed shard-aware? ie in a sharded query, does this param get used by all the shards specified or does it only get used by the primary searcher? question 2: Since shards.tolerant=true is not helping when a shard goes down during query execution, is there any other way to deal with this? If timeAllowed is shard-aware, I would think that I could use timeAware and the primary searcher would then wait xxx milliseconds and return with whatever the other shards had sent back. Is that correct? thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/does-shards-tolerant-deal-with-this-scenario-tp4125300.html Sent from the Solr - User mailing list archive at Nabble.com.
Are there any Java versions we should avoid with Solr
we are currently using Oracle Java 1.7.0_11 23.6-b04 JDK with our Solr 4.6.1 setup I was looking at upgrading to a more recent version but am wondering, are there any versions to avoid? reason I ask is that I see some versions that have GC issues but am not sure how/if Solr is affected by them. 7u40 has bug with New minimum young generation size is not properly checked by the JVM, and with Irregular crash or corrupt term vectors in the Lucene libraries 7u51 has bug with Memory leak when GCNotifier uses create_from_platform_dependent_str() -- View this message in context: http://lucene.472066.n3.nabble.com/Are-there-any-Java-versions-we-should-avoid-with-Solr-tp4121164.html Sent from the Solr - User mailing list archive at Nabble.com.
is it possible to consolidate filterquery cache strings
lets say I have a largish set of data (120M docs) and that I am partitioning my data by groups of states (using the state codes) Someone suggested that I could use the following format in my solrconfig.xml when defining the filterqueries work: listener event=newSearcher class=solr.QuerySenderListener arr name=queries lst str name=q*:*/str str name=fqState:AL/str str name=fqState:AK/str ... str name=fqState:WY/str /arr /listener Would that work, and if so how would I know that the cache is being hit? Or do I need to use the following traditional syntax instead: listener event=newSearcher class=solr.QuerySenderListener arr name=queries lst str name=q*:*/str str name=fqState:AL/str /str lst str name=q*:*/str str name=fqState:AK/str /str ... lst str name=q*:*/str str name=fqState:WY/str /str /arr /listener any help appreciated -- View this message in context: http://lucene.472066.n3.nabble.com/is-it-possible-to-consolidate-filterquery-cache-strings-tp4121005.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: is it possible to consolidate filterquery cache strings
note: by partitioning I mean that I have sharded the 120M docs into 9 Solr partitions (each on a separate server) -- View this message in context: http://lucene.472066.n3.nabble.com/is-it-possible-to-consolidate-filterquery-cache-strings-tp4121005p4121012.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: is it possible to consolidate filterquery cache strings
would not breaking the FQs out by state be faster for warming up the fq caches? -- View this message in context: http://lucene.472066.n3.nabble.com/is-it-possible-to-consolidate-filterquery-cache-strings-tp4121005p4121030.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr3.4 on tomcat 7.0.23 - hung with error threw exception java.lang.IllegalStateException: Cannot call sendError() after the response has been committed
were you able to resolve this issue, and if so how?? I am encountering the same issue in a couple of solr versions (including 4.0 and 4.5) -- View this message in context: http://lucene.472066.n3.nabble.com/Solr3-4-on-tomcat-7-0-23-hung-with-error-threw-exception-java-lang-IllegalStateException-Cannot-call-tp4087342p4107286.html Sent from the Solr - User mailing list archive at Nabble.com.
what is difference between 4.1 and 5.x
just curious as to what the difference is between 4.1 and 5.0 i.e. is 4.1 a maintenance branch for what is currently 4.0 or are they very different designs/architectures -- View this message in context: http://lucene.472066.n3.nabble.com/what-is-difference-between-4-1-and-5-x-tp4032064.html Sent from the Solr - User mailing list archive at Nabble.com.
spatial searches and geo-json data
hi all. I have a large amount of spatial data in geo-json format that I get from mssql server. I want to be able to index that data and am trying to figure out how to convert the data into WKT format since solr only accepts WKT. is anyone away of any solr module or tsql code or c# code that would help me with the conversion? -- View this message in context: http://lucene.472066.n3.nabble.com/spatial-searches-and-geo-json-data-tp4026140.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: is there a way to prevent abusing rows parameter
Thanks guys. This is a problem with the front end not validating requests. I was hoping there might be a simple config value I could enter/change, rather than going the long process of migrating a proper fix all the way up to our production servers. Looks like not, but thx. -- View this message in context: http://lucene.472066.n3.nabble.com/is-there-a-way-to-prevent-abusing-rows-parameter-tp4021467p4021892.html Sent from the Solr - User mailing list archive at Nabble.com.
upgrading from 4.0 to 4.1 causes CorruptIndexException: checksum mismatch in segments file
hi all I have been working on moving us from 4.0 to a newer build of 4.1 I am seeing a CorruptIndexException: checksum mismatch in segments file error when I try to use the existing index files. I did see something in the build log for #119 re LUCENE-4446 that mentions flip file formats to point to 4.1 format Do I just need to reindex or is this some other issue (ie do I need to configure something differently)? or should I move back a few builds? note, we are currently using: solr-spec 4.0.0.2012.04.05.15.05.52 solr-impl 4.0-SNAPSHOT 1310094M - - 2012-04-05 15:05:52 lucene-spec 4.0-SNAPSHOT lucene-impl 4.0-SNAPSHOT 1309921 - - 2012-04-05 10:25:27 and are considering moving to: solr-spec 4.1.0.2012.11.03.18.08.42 solr-impl 4.1-2012-11-03_18-05-49 1405392 - hudson - 2012-11-03 18:08:42 lucene-spec 4.1-2012-11-03_18-05-49 lucene-impl 4.1-2012-11-03_18-05-49 1405392 - hudson - 2012-11-03 18:06:50 (aka apache-solr-4.1-2012-11-03_18-05-49) -- View this message in context: http://lucene.472066.n3.nabble.com/upgrading-from-4-0-to-4-1-causes-CorruptIndexException-checksum-mismatch-in-segments-file-tp4021913.html Sent from the Solr - User mailing list archive at Nabble.com.
is there a way to prevent abusing rows parameter
silly question is there any configuration value I can set to prevent someone from entering a bad value for the rows parameter? ie to prevent something like rows=1 from crashing my servers? the server I am looking at is a solr v3.6 -- View this message in context: http://lucene.472066.n3.nabble.com/is-there-a-way-to-prevent-abusing-rows-parameter-tp4021467.html Sent from the Solr - User mailing list archive at Nabble.com.
Intersects spatial query returns polygons it shouldn't
, -93.22617724980643 45.29791971794424, -93.23408017640227 45.298023690859175, -93.2343080073169 45.288444186545625, -93.23432525195352 45.287995322205425, -93.23469515647318 45.269279712377234, -93.23475627635968 45.266203358381446, -93.23560542207227 45.26619551047824, -93.23899176558338 45.26613779367068, -93.24250527367546 45.26608234822973, -93.243445378056 45.26606503829342, -93.24512861083372 45.2660344570852, -93.24588057830995 45.26602026067889, -93.24713274287363 45.26599455787498, -93.25036838013868 45.26592734514467, -93.25172461510564 45.265900698298395, -93.25236738024864 45.265888260809106, -93.25481754173921 45.26583307838667, -93.25571357952906 45.265819559899164, -93.2594981489083 45.26575415212897, -93.26098138766197 45.265754375486374, -93.26155216698102 45.26565612540643, -93.26170097145753 45.26562288963898, -93.26208574477789 45.26553876835043, -93.26245875524685 45.265434673708015, -93.26277275191426 45.265316250819595, -93.26311663127117 45.26517251314189, -93.26346212923646 45.26500240317637, -93.26393572774133 45.26477558787491, -93.2651820516718 45.26406759657772, -93.26518110226205 45.26337226279194, -93.26515218908767 45.26311636791454, -93.26518703008779 45.262871689663605, -93.2652064900752 45.26265582104258, -93.2652110298225 45.26215614194132, -93.26522443086994 45.26112430402238, -93.26522989950563 45.260703199933474, -93.26524872191168 45.25930812973533, -93.26525187087448 45.258897852775995, -93.26525857049303 45.258025812056765, -93.26527734826267 45.256675072153314, -93.26528081766433 45.25612813038996, -93.265287399575 45.25512698071874, -93.26530031054412 45.253711671615115, -93.26531490547187 45.25273002640574, -93.26532214123614 45.252243491267, -93.26533817105908 45.25062180123498, -93.26535413994274 45.24906421173263, -93.26536141910549 45.24841165046578, -93.26536638602661 45.24796649509243, -93.26537318826473 45.24735637067748, -93.26539798003012 45.24589779189643, -93.265404909549 45.24454674190931, -93.2654060939449 45.24296904311022, -93.26540624905046 45.24276127146885, -93.26540843815205 45.2420263885843, -93.26541275006169 45.240577352345994, -93.2654375717671 45.238843301612725, -93.26544518264211 45.237906888690105, -93.26544940933664 45.23738688110566, -93.26546966016808 45.236093591927926, -93.2654781584622 45.235359229961944, -93.26548338867605 45.23490715107922, -93.26553582901259 45.23354268990693, -93.26554071996831 45.23330119833777, -93.26555987026248 45.2323552839169, -93.26557251955711 45.23173040973764, -93.26556626032777 45.22975235185782, -93.26556606661761 45.229367333607186, -93.26556579189545 45.228823722705066, -93.26562882232702 45.226872206176665, -93.26571073971922 45.224335971082276, -93.26574560622672 45.2219321787, -93.26574836877063 45.22173093256304, -93.26577033227747 45.22021043432355, -93.26578588443306 45.21913391123174, -93.26580662128347 45.21769799745153, -93.26580983179628 45.217475736026664, -93.26581322607608 45.217240685631346, -93.26590715360736 45.210737684073244, -93.26591966090616 45.209871711997586, -93.2659016992406 45.20722015227932, -93.26587484243684 45.203254836571126, -93.26585637174348 45.20052765082941, -93.26585684827346 45.19841676076085, -93.26587786763154 45.19732741144391, -93.2658624676632 45.1970879109074, -93.2659274100303 45.194004979577755, -93.26595017983325 45.191531890895845, -93.26595423366354 45.19092534610275, -93.26593099287571 45.190637988686554, -93.2659274057232 45.18986823069059, -93.26592485308495 45.18931973506328))' -- View this message in context: http://lucene.472066.n3.nabble.com/Intersects-spatial-query-returns-polygons-it-shouldn-t-tp4008646.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: question(s) re lucene spatial toolkit aka LSP aka spatial4j
Thanks David. No worries about the delay; am always happy and appreciative when someone responds. I don't understand what you mean by All center points get cached into memory upon first use in a score in question 2 about the Java OOM errors I am seeing. The Solr instance I have setup for testing has around 200k docs, with one WKT field per doc (indexed and stored and set to multivalue). I did a count of the number of points that get indexed in Solr (computed in MS SQL by counting the number of points (using STNumPoints) for each geometry (using STNumGeometries) in the WKT data I am indexing), and I have around 35M points total. If only the center points for 190K docs get cached, wouldn't that easily fit in 7GB of heap? Even if Solr was caching 35M points, that still doesn't sound like 7GB worth of data. -- View this message in context: http://lucene.472066.n3.nabble.com/question-s-re-lucene-spatial-toolkit-aka-LSP-aka-spatial4j-tp3997757p4000268.html Sent from the Solr - User mailing list archive at Nabble.com.
question(s) re lucene spatial toolkit aka LSP aka spatial4j
hopefully someone is using the lucene spatial toolkit aka LSP aka spatial4j, and can answer this question we are using this spatial tool for doing searches. overall, it seems to work very well. however, finding documentation is difficult. I have a couple of questions: 1. I have a geohash field in my solr schema that contains indexed geographic polygon data. I want to find all docs where that polygon intersects a given lat/long. I was experimenting with returning distance in the resultset and with sorting by distance and found that the following query works. However, I dont know what distance means in the query. i.e. is it distance from point to the polygon centroid, to the closest outer edge of the polygon, its a useless random value, etc. Does anyone know?? http://solrserver:solrport/solr/core0/select?q=*:*fq={!v=$geoq%20cache=false}geoq=wkt_search:%22Intersects(Circle(-97.057%2047.924%20d=0.01))%22sort=query($geoq)+ascfl=catchment_wkt1_trimmed,school_name,latitude,longitude,dist:query($geoq,-1),loc_city,loc_state 2. some of the polygons, being geographic representations, are very big (ie state/province polygons). when solr starts processing a spatial query (like the one above), I can see (INFO: Building Cache [xx]) it fills in some sort of memory cache (org.apache.lucene.spatial.strategy.util.ShapeFieldCache) of the indexed polygon data. We are encountering Java OOM issues when this occurs (even when we booested the mem to 7GB). I know that some of the polygons can have more than 2300 points, but heavy trimming isn't really an option due to level of detail issues. Can we control this caching, or the indexing of the polygons, in any way to reduce the memory requirements?? -- View this message in context: http://lucene.472066.n3.nabble.com/question-s-re-lucene-spatial-toolkit-aka-LSP-aka-spatial4j-tp3997757.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using Customized sorting in Solr
Hi, Any suggestions, Am I trying to do too much with solr? Is there any other search engine, which should be used here? I am looking into solr codebase and planning to modify QueryComponent. Will this be the right approach? Regards, Shivam On Fri, Apr 27, 2012 at 10:48 AM, solr user solr.user...@gmail.com wrote: Jan, Thanks for the response, I though of using it, but it will be suboptimal to do this in the scenario I have. I guess I have to explain the scenario better, let me try it again:- 1. I have importance based buckets in the system, this is implemented using a variable named bucket_count having integer values 0,1,2,3, and I have to show results in order of bucket_count i.e. results from 0th bucket at top, then results from 1st bucket and so on. That is done by doing a asc sort on this variable. 2. Now *within these buckets* I need to ensure that 1st listing of every advertiser comes at top, then 2nd listing from every advertiser and so on. Now if I go with the grouping on advertiserId and and use the group.offset, then probably I also need to do additive filtering on bucket_count. To explain it better pseudo algorithm will be like 1. query solr with group.offset 0 and bucket count 0 2. if results more than zero in step1 then increase group offset and follow step 1 again 3. else increase bucket count with group offset zero and start from step 1. With this logic in the worst case I need to query solr (number of importance buckets)*(max number of listings by an advertiser). Which could be very high number of solr queries for a single user query. Please suggest if I can do this with more optimal way. I am also open to do modifications in solr/lucene code if needed. Regards, BC Rathore On Fri, Apr 27, 2012 at 4:09 AM, Jan Høydahl jan@cominvent.comwrote: Hi, How about trying grouping with paging? First you do group=truegroup.field=advertiserIdgroup.limit=1group.offset=0group.main=truesort=somethinggroup.sort=how-much-paid desc That gives you one listing per advertiser, sorted the way you like. Then to grab the next batch of ads, you go group.offset=1 etc etc. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 26. apr. 2012, at 08:10, solr user wrote: Hi, We are planning to move the search of one of our listing based portal to solr/lucene search server from sphinx search server. But we are facing a challenge is porting customized sorting being used in our portal. We only have last 60 days of data live.The algorithm is as follows:- 1. Put all listings into 54 buckets – (Date bucket for 60 days) i.e. buckets of 7day, 1 day, 1 day…… 2. For each date bucket we make 2 buckets –(Paid / free bucket) 3. For each paid / free bucket cycle the advertisers on uniqueness basis i.e. inside a bucket the ordering should be 1st listing of each advertiser, 2nd listing of each advertiser and so on in other words within a *sub-bucket* second listing of an advertiser will be displayed only after first listing of all advertiser has been displayed. For taking care of point 1 and 2 we have created a field named bucket_index at the time of indexing the data and get the results sorted by this index, but we are not able to find a way to create a sort field at index time or think of a sort function for the point no 3. Please suggest if there is a way to do so in solr. Tia, BC Rathore
Using Customized sorting in Solr
Hi, We are planning to move the search of one of our listing based portal to solr/lucene search server from sphinx search server. But we are facing a challenge is porting customized sorting being used in our portal. We only have last 60 days of data live.The algorithm is as follows:- 1. Put all listings into 54 buckets – (Date bucket for 60 days) i.e. buckets of 7day, 1 day, 1 day…… 2. For each date bucket we make 2 buckets –(Paid / free bucket) 3. For each paid / free bucket cycle the advertisers on uniqueness basis i.e. inside a bucket the ordering should be 1st listing of each advertiser, 2nd listing of each advertiser and so on in other words within a *sub-bucket* second listing of an advertiser will be displayed only after first listing of all advertiser has been displayed. For taking care of point 1 and 2 we have created a field named bucket_index at the time of indexing the data and get the results sorted by this index, but we are not able to find a way to create a sort field at index time or think of a sort function for the point no 3. Please suggest if there is a way to do so in solr. Tia, BC Rathore
Re: Limiting term frequency in a document to a specific term
With the Solr search relevancy functions, a ParseException, unknown function ttf in FunctionQuery. http://localhost:8983/solr/select/?fl=score,documentPageIddefType=funcq=ttf(contents,amplifiers) where contents is a field name, and amplifiers is text in the field name. Just curious why I get a parse exception for the above syntax. On Monday, January 23, 2012, Ahmet Arslan iori...@yahoo.com wrote: Below is an example query to search for the term frequency in a document, but it is returning the frequency for all the terms. [ http://localhost:8983/solr/select/?fl=documentPageIdq=documentPageId:49667.3qt=tvrhtv.tf=truetv.fl=contents][1 ] I would like to be able to limit the query to just one term that I know occurs in the document. I don't fully follow but http://wiki.apache.org/solr/FunctionQuery#tf may be what you want?
Re: Getting a word count frequency out of a page field
Thanks for the article. I am indexing each page of a document as if it were a document. I think the answer is to configure SOLR for use of the TermVector Component: http://wiki.apache.org/solr/TermVectorComponent I have not tried it yet, but someone told me on StackExchange forum to try this one. -Melanie On Sun, Jan 22, 2012 at 8:56 PM, Erick Erickson erickerick...@gmail.comwrote: Here's Hoss' XY problem writeup: http://people.apache.org/~hossman/#xyproblem but this doesn't appear to be that. There's no way out of the box that I know of to do what you want. It starts with the fact that Solr has no clue what a page is in the first place. Or a paragraph. Or a sentence. So you're really on your own here Solr only knows about *documents*. If each document is a page, you can do some stuff with term frequencies etc. But for a larger document you'll be getting into some pretty low-level analysis of the data to accomplish this. Sorry I can't be more help. Erick On Sun, Jan 22, 2012 at 5:35 PM, solr user mvidaat...@gmail.com wrote: See comments inline below. On Sun, Jan 22, 2012 at 8:27 PM, Erick Erickson erickerick...@gmail.com wrote: Faceting won't work at all. Its function is to return the count of the *documents* that a value occurs in, so that's no good for your use case. I don't know how to issue a proper SOLR query that returns a word count for a paragraph of text such as the term amplifier for a field. For some reason it only returns. This is really unclear. Are you asking for the word counts of a paragraph that contains amplifier? The number of times amplifier appears in a paragraph? In a document? I'm looking for the number of times the word or term appears in a paragraph that I'm indexing as the field name contents. I'm storing and indexing the field name contents that contains multiple occurrences of the term/word. However, when I query for that term it only reports that the word/term appeared only once in the field name contents. And why do you want this information anyway? It might be an XY problem. I want to be able to search for word frequency for a page in a document that has many pages. So I can report to the user that the term/word occurred on page 1 10 times. The user can click on the result and go right the the page where the word/term appeared most frequently. What do you mean an XY problem? Best Erick On Fri, Jan 20, 2012 at 1:06 PM, solr user mvidaat...@gmail.com wrote: SOLR reports the term occurrence for terms over all the documents. I am having trouble making a query that returns the term occurrence in a specific page field called, documentPageId. I don't know how to issue a proper SOLR query that returns a word count for a paragraph of text such as the term amplifier for a field. For some reason it only returns. The things I've tried only return a count for 1 occurrence of the term even though I see the term in the paragraph more than just once. I've tried faceting on the field, contents http://localhost:8983/solr/select?indent=onq=*:*wt=standardfacet=onfacet.field=documentPageIdfacet.query=amplifierfacet.sort=lexfacet.missing=onfacet.method=count lst name=facet_counts lst name=facet_queries int name=amplifier21/int /lst lst name=facet_fields lst name=documentPageId int name=49667.11/int int name=49667.101/int int name=49667.111/int int name=49667.121/int int name=49667.131/int int name=49667.141/int int name=49667.151/int int name=49667.161/int int name=49667.171/int int name=49667.181/int int name=49667.191/int int name=49667.21/int int name=49667.201/int int name=49667.211/int int name=49667.31/int int name=49667.41/int int name=49667.51/int int name=49667.61/int int name=49667.71/int int name=49667.81/int int name=49667.91/int int name=49670.11/int int name=49670.21/int int name=49670.31/int int name=49670.41/int int name=49677.11/int int name=49677.21/int int name=49677.31/int int0/int /lst /lst lst name=facet_dates/ lst name=facet_ranges/ /lst /response In schema.xml: field name=contents type=bucketFirstLetter stored=true indexed=true / field name=documentPageId type=string indexed=true stored=true multiValued=false/ In solrconfig.xml: str name=facet.fieldfilewrapper/str str name=facet.fieldcaseNumber/str str name=facet.fieldpageNumber/str str name=facet.fielddocumentId/str str name=facet.fieldcontents/str str name=facet.querydocumentId/str str name=facet.querycaseNumber/str str name=facet.querypageNumber/str str name=facet.fielddocumentPageId/str str name=facet.querycontents/str Thanks in advance,
Limiting term frequency in a document to a specific term
0 down vote favorite share [fb] share [tw] What is the proper query URL to limit the term frequency to just one term in a document? Below is an example query to search for the term frequency in a document, but it is returning the frequency for all the terms. [ http://localhost:8983/solr/select/?fl=documentPageIdq=documentPageId:49667.3qt=tvrhtv.tf=truetv.fl=contents][1 ] I would like to be able to limit the query to just one term that I know occurs in the document. The documentation for Term Frequency said to specify the following: f.fieldName.tv.tf - Turns on Term Frequency for the fieldName specified. This is in the wiki documentation: http://wiki.apache.org/solr/TermVectorComponent I tried various combinations of the above for the term amplifier in the URL but I could not get it to work. I would appreciate the appropriate syntax for a specific term amplifier.
Re: Getting a word count frequency out of a page field
See comments inline below. On Sun, Jan 22, 2012 at 8:27 PM, Erick Erickson erickerick...@gmail.comwrote: Faceting won't work at all. Its function is to return the count of the *documents* that a value occurs in, so that's no good for your use case. I don't know how to issue a proper SOLR query that returns a word count for a paragraph of text such as the term amplifier for a field. For some reason it only returns. This is really unclear. Are you asking for the word counts of a paragraph that contains amplifier? The number of times amplifier appears in a paragraph? In a document? I'm looking for the number of times the word or term appears in a paragraph that I'm indexing as the field name contents. I'm storing and indexing the field name contents that contains multiple occurrences of the term/word. However, when I query for that term it only reports that the word/term appeared only once in the field name contents. And why do you want this information anyway? It might be an XY problem. I want to be able to search for word frequency for a page in a document that has many pages. So I can report to the user that the term/word occurred on page 1 10 times. The user can click on the result and go right the the page where the word/term appeared most frequently. What do you mean an XY problem? Best Erick On Fri, Jan 20, 2012 at 1:06 PM, solr user mvidaat...@gmail.com wrote: SOLR reports the term occurrence for terms over all the documents. I am having trouble making a query that returns the term occurrence in a specific page field called, documentPageId. I don't know how to issue a proper SOLR query that returns a word count for a paragraph of text such as the term amplifier for a field. For some reason it only returns. The things I've tried only return a count for 1 occurrence of the term even though I see the term in the paragraph more than just once. I've tried faceting on the field, contents http://localhost:8983/solr/select?indent=onq=*:*wt=standardfacet=onfacet.field=documentPageIdfacet.query=amplifierfacet.sort=lexfacet.missing=onfacet.method=count lst name=facet_counts lst name=facet_queries int name=amplifier21/int /lst lst name=facet_fields lst name=documentPageId int name=49667.11/int int name=49667.101/int int name=49667.111/int int name=49667.121/int int name=49667.131/int int name=49667.141/int int name=49667.151/int int name=49667.161/int int name=49667.171/int int name=49667.181/int int name=49667.191/int int name=49667.21/int int name=49667.201/int int name=49667.211/int int name=49667.31/int int name=49667.41/int int name=49667.51/int int name=49667.61/int int name=49667.71/int int name=49667.81/int int name=49667.91/int int name=49670.11/int int name=49670.21/int int name=49670.31/int int name=49670.41/int int name=49677.11/int int name=49677.21/int int name=49677.31/int int0/int /lst /lst lst name=facet_dates/ lst name=facet_ranges/ /lst /response In schema.xml: field name=contents type=bucketFirstLetter stored=true indexed=true / field name=documentPageId type=string indexed=true stored=true multiValued=false/ In solrconfig.xml: str name=facet.fieldfilewrapper/str str name=facet.fieldcaseNumber/str str name=facet.fieldpageNumber/str str name=facet.fielddocumentId/str str name=facet.fieldcontents/str str name=facet.querydocumentId/str str name=facet.querycaseNumber/str str name=facet.querypageNumber/str str name=facet.fielddocumentPageId/str str name=facet.querycontents/str Thanks in advance,
Getting a word count frequency out of a page field
SOLR reports the term occurrence for terms over all the documents. I am having trouble making a query that returns the term occurrence in a specific page field called, documentPageId. I don't know how to issue a proper SOLR query that returns a word count for a paragraph of text such as the term amplifier for a field. For some reason it only returns. The things I've tried only return a count for 1 occurrence of the term even though I see the term in the paragraph more than just once. I've tried faceting on the field, contents http://localhost:8983/solr/select?indent=onq=*:*wt=standardfacet=onfacet.field=documentPageIdfacet.query=amplifierfacet.sort=lexfacet.missing=onfacet.method=count lst name=facet_counts lst name=facet_queries int name=amplifier21/int /lst lst name=facet_fields lst name=documentPageId int name=49667.11/int int name=49667.101/int int name=49667.111/int int name=49667.121/int int name=49667.131/int int name=49667.141/int int name=49667.151/int int name=49667.161/int int name=49667.171/int int name=49667.181/int int name=49667.191/int int name=49667.21/int int name=49667.201/int int name=49667.211/int int name=49667.31/int int name=49667.41/int int name=49667.51/int int name=49667.61/int int name=49667.71/int int name=49667.81/int int name=49667.91/int int name=49670.11/int int name=49670.21/int int name=49670.31/int int name=49670.41/int int name=49677.11/int int name=49677.21/int int name=49677.31/int int0/int /lst /lst lst name=facet_dates/ lst name=facet_ranges/ /lst /response In schema.xml: field name=contents type=bucketFirstLetter stored=true indexed=true / field name=documentPageId type=string indexed=true stored=true multiValued=false/ In solrconfig.xml: str name=facet.fieldfilewrapper/str str name=facet.fieldcaseNumber/str str name=facet.fieldpageNumber/str str name=facet.fielddocumentId/str str name=facet.fieldcontents/str str name=facet.querydocumentId/str str name=facet.querycaseNumber/str str name=facet.querypageNumber/str str name=facet.fielddocumentPageId/str str name=facet.querycontents/str Thanks in advance,
Re: Terms Component - solr-1.4.0
Hi All, Please help me in implementing TermsComponent in my current Solr solution. Regards, Solr User On Tue, May 17, 2011 at 4:12 PM, Solr User solr...@gmail.com wrote: Hi All, I am using Solr 1.4.0 and dismax as request handler.I have the following in my solrconfig.xml in the dismax request handler tag arr name=last-components strspellcheck/str /arr The above tags helps to find terms if there are spelling issues. I tried configuring terms component and no luck. May I know how to configure terms component with dismax? or Do I need to call terms component directly to get auto suggestions? Thank you so much in advance. Regards, Solr User
Terms Component - solr-1.4.0
Hi All, I am using Solr 1.4.0 and dismax as request handler.I have the following in my solrconfig.xml in the dismax request handler tag arr name=last-components strspellcheck/str /arr The above tags helps to find terms if there are spelling issues. I tried configuring terms component and no luck. May I know how to configure terms component with dismax? or Do I need to call terms component directly to get auto suggestions? Thank you so much in advance. Regards, Solr User
Out of memory while creating indexes
Hi All, I am trying to create indexes out of a 400MB XML file using the following command and I am running into out of memory exception. $JAVA_HOME/bin/java -Xms768m -Xmx1024m -*Durl*=http://$SOLR_HOST SOLR_PORT/solr/customercarecore/update -jar $SOLRBASEDIR/*dataconvertor*/common/lib/post.jar $SOLRBASEDIR/dataconvertor/customercare/xml/CustomerData.xml I am planning to bump up the memory and try again. Did any one ran into similar issue? Any inputs would be very helpful to resolve the out of memory exception. I was able to create indexes with small file but not with large file. I am not using Solr J. Thanks, Solr User
Re: what would cause large numbers of executeWithRetry INFO messages?
sorry, never did find a solution to that. if you do happen to figure it out, pls post a reply to this thread. thanks -- View this message in context: http://lucene.472066.n3.nabble.com/what-would-cause-large-numbers-of-executeWithRetry-INFO-messages-tp1453417p2281087.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to get all the search results?
Hi, I tried *:* using dismax and I get no results. Is there a way that I can get all the search results using dismax? Thanks, Murali On Mon, Dec 6, 2010 at 11:17 AM, Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com wrote: Hello, shouldn't that query syntax be *:* ? Regards, -- Savvas. On 6 December 2010 16:10, Solr User solr...@gmail.com wrote: Hi, First off thanks to the group for guiding me to move from default search handler to dismax. I have a question related to getting all the search results. In the past with the default search handler I was getting all the search results (8000) if I pass q=* as search string but with dismax I was getting only 16 results instead of 8000 results. How to get all the search results using dismax? Do I need to configure anything to make * (asterisk) work? Thanks, Solr User
Re: How to get all the search results?
Hi Shawn, Yes you did. I tried and did not work so I asked the same question again. Now I understood and tried directly on the Solr admin and I got all the search results. I will implement the same on the website. Thank you so much Shawn. On Mon, Dec 13, 2010 at 5:16 PM, Shawn Heisey s...@elyograg.org wrote: On 12/13/2010 9:59 AM, Solr User wrote: Hi, I tried *:* using dismax and I get no results. Is there a way that I can get all the search results using dismax? For dismax, use q= or simply leave the q parameter off the URL entirely. It appears that you need to have q.alt set to *:* for this to work. It would be a good idea to include this in your handler definition: str name=q.alt*:*/str Two people (myself and Peter Karich) gave this answer on this thread last week, within 15 minutes of the time your original question was posted. Here's the entire thread on nabble: http://lucene.472066.n3.nabble.com/How-to-get-all-the-search-results-td2028233.html Shawn
How to get all the search results?
Hi, First off thanks to the group for guiding me to move from default search handler to dismax. I have a question related to getting all the search results. In the past with the default search handler I was getting all the search results (8000) if I pass q=* as search string but with dismax I was getting only 16 results instead of 8000 results. How to get all the search results using dismax? Do I need to configure anything to make * (asterisk) work? Thanks, Solr User
Re: Dismax - Boosting
Hi Ahmet, In the past we used /spell and if there is not match then we use to get a list of suggestions and then we use to make another call with the first suggestion to get search results. After that we show user both suggestions for the spelling mistake and results of the first suggestion. I think the URL that you provided which has plug in will do help doing that. Is there a way from Solr to directly get the spelling suggestions as well as first suggestion data at the same time? For example: if seach keywork is mooon (typed by mistake instead of moon) the we need all suggestions like: Did you mean: moon, mo, mooing, moonen, soon, mood, moose, moore, spoon, moons? and also the search results for the first suggestion moon. Thanks, Solr User On Fri, Nov 19, 2010 at 6:41 PM, Ahmet Arslan iori...@yahoo.com wrote: The below is my previous configuration which use to work correctly. searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=namedefault/str str name=fieldsearchFields/str str name=spellcheckIndexDir/solr/qa/tradedata/spellchecker/str str name=buildOnCommittrue/str /lst /searchComponent We use to search only in one field which is searchFields but with implementing dismax we are searching in different fields like title^9.0 subtitle^3.0 author^2.0 desc shortdesc imprint category isbn13 isbn10 format series season bisacsub award. Do we need to modify the above configuration to include all the above fields:??? Please give me an example. Searching and spell checking are independent. For example you can search on 10 fields, and create suggestions from 2 fields. Spell checker accepts one field in its configuration. So you need to populate this field with copyField. Using the fields that you want to use spell checking. And type of this field should be textSpell in your case. You can use above config. In the past we use to query twice to get first the suggestions and then we use to query using the first suggestion to show the data. Is there a way that we can do it in one step? Are you talking about queries that return 0 numFound? Re-executing the search like, described here http://sematext.com/products/dym-researcher/index.html Not out-of-the-box.
Special Characters
Hi, I am searching for j.r.r. tolkien and getting results back but if I search for jrr I am not getting any results. Also not getting any results if I am searching for jrr tolkien. I am using AND as the default operator. The search results should work for both j.r.r. tolkien and jrr tolkien. What configuration changes I need to make so that special characters like hypen (-), period (.) are ignored while indexing? or any other suggestions? Thanks, Solr User
Re: Special Characters
Hi Eric, I use solr version 1.4.0 and below is my schema.xml fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. add enablePositionIncrements=true in both the index and query analyzers to leave a 'gap' for more accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer /fieldType It creates 3 tokens j r r tolkien works fine but not jrr tolkien. I will read about PatternReplaceCharFilterFactory and try it. Please let me know if I need to do anything differently. Thanks, Solr User On Mon, Nov 22, 2010 at 8:19 AM, Erick Erickson erickerick...@gmail.comwrote: What version of Solr are you using? You can think about PatternReplaceCharFilterFactory if you're using the right version of Solr. But you have other problems than that. Let's claim you get the periods removed. Do you tokenize three tokens or one? I.e. jrr or j r r? In the latter case your search still won't match. Best Erick On Mon, Nov 22, 2010 at 7:45 AM, Solr User solr...@gmail.com wrote: Hi, I am searching for j.r.r. tolkien and getting results back but if I search for jrr I am not getting any results. Also not getting any results if I am searching for jrr tolkien. I am using AND as the default operator. The search results should work for both j.r.r. tolkien and jrr tolkien. What configuration changes I need to make so that special characters like hypen (-), period (.) are ignored while indexing? or any other suggestions? Thanks, Solr User
Facet - Range Query issue
Hi, I am having issue with querying and using facet. This was working fine earlier: /spell/?q=(sun) AND (pubyear:[1991 TO 2011])rows=9facet=truefacet.limit=-1facet.mincount=1facet.field=authorfacet.field=pubyearfacet.field=formatfacet.field=seriesfacet.field=seasonfacet.field=imprintfacet.field=categoryfacet.field=awardfacet.field=agefacet.field=readingfacet.field=gradefacet.field=pricespellcheck=truedebugQuery=on After modifying to use dismax handler with new schema the below query does not work: /select/?q=(sun) AND (pubyear:[1991 TO 2011])rows=9facet=truefacet.limit=-1facet.mincount=1facet.field=authorfacet.field=pubyear_facetfacet.field=format_facetfacet.field=series_facetfacet.field=season_facetfacet.field=imprint_facetfacet.field=category_facetfacet.field=award_facetfacet.field=age_facetfacet.field=reading_facetfacet.field=grade_facetfacet.field=price_facetspellcheck=truedebugQuery=on lst name=debug str name=rawquerystring(sun) AND (pubyear:[1991 TO 2011])/str str name=querystring(sun) AND (pubyear:[1991 TO 2011])/str str name=parsedquery+((+DisjunctionMaxQuery((series:sun | desc:sun | bisacsub:sun | award:sun | format:sun | shortdesc:sun | pubyear:sun | author:sun^2.0 | category:sun | title:sun^9.0 | isbn10:sun | season:sun | imprint:sun | subtitle:sun^3.0 | isbn13:sun)) +DisjunctionMaxQuery((series:pubyear 1991 | desc:pubyear 1991 | bisacsub:pubyear 1991 | award:pubyear 1991 | format:pubyear 1991 | shortdesc:pubyear 1991 | pubyear:pubyear 1991 | author:pubyear 1991^2.0 | category:pubyear 1991 | title:pubyear 1991^9.0 | isbn10:pubyear 1991 | season:pubyear 1991 | imprint:pubyear 1991 | subtitle:pubyear 1991^3.0 | isbn13:pubyear 1991)) DisjunctionMaxQuery((series:2011 | desc:2011 | bisacsub:2011 | award:2011 | format:2011 | shortdesc:2011 | pubyear:2011 | author:2011^2.0 | category:2011 | title:2011^9.0 | isbn10:2011 | season:2011 | imprint:2011 | subtitle:2011^3.0 | isbn13:2011)))~1) ()/str str name=parsedquery_toString+((+(series:sun | desc:sun | bisacsub:sun | award:sun | format:sun | shortdesc:sun | pubyear:sun | author:sun^2.0 | category:sun | title:sun^9.0 | isbn10:sun | season:sun | imprint:sun | subtitle:sun^3.0 | isbn13:sun) +(series:pubyear 1991 | desc:pubyear 1991 | bisacsub:pubyear 1991 | award:pubyear 1991 | format:pubyear 1991 | shortdesc:pubyear 1991 | pubyear:pubyear 1991 | author:pubyear 1991^2.0 | category:pubyear 1991 | title:pubyear 1991^9.0 | isbn10:pubyear 1991 | season:pubyear 1991 | imprint:pubyear 1991 | subtitle:pubyear 1991^3.0 | isbn13:pubyear 1991) (series:2011 | desc:2011 | bisacsub:2011 | award:2011 | format:2011 | shortdesc:2011 | pubyear:2011 | author:2011^2.0 | category:2011 | title:2011^9.0 | isbn10:2011 | season:2011 | imprint:2011 | subtitle:2011^3.0 | isbn13:2011))~1) ()/str lst name=explain / str name=QParserDisMaxQParser/str Basically we are trying to pass the query string along with a facet field and the range. Is there any syntax issue? Please help this is urgent as I got stuck. Thanks, Solr user
Re: Facet - Range Query issue
Eric, I solved the issue by adding fq parameter in the query. Thank you so much for your reply. Thanks, Murali On Mon, Nov 22, 2010 at 1:51 PM, Erick Erickson erickerick...@gmail.comwrote: Well, without seeing the changes you made to the schema, it's hard to tell much. Also, could you define not work? What, exactly, fails to do what you expect? But the first question I have is did you reindex after changing your schema?. And have you checked your index to verify that there values in the fields you changed? Best Erick On Mon, Nov 22, 2010 at 1:42 PM, Solr User solr...@gmail.com wrote: Hi, I am having issue with querying and using facet. This was working fine earlier: /spell/?q=(sun) AND (pubyear:[1991 TO 2011])rows=9facet=truefacet.limit=-1facet.mincount=1facet.field=authorfacet.field=pubyearfacet.field=formatfacet.field=seriesfacet.field=seasonfacet.field=imprintfacet.field=categoryfacet.field=awardfacet.field=agefacet.field=readingfacet.field=gradefacet.field=pricespellcheck=truedebugQuery=on After modifying to use dismax handler with new schema the below query does not work: /select/?q=(sun) AND (pubyear:[1991 TO 2011])rows=9facet=truefacet.limit=-1facet.mincount=1facet.field=authorfacet.field=pubyear_facetfacet.field=format_facetfacet.field=series_facetfacet.field=season_facetfacet.field=imprint_facetfacet.field=category_facetfacet.field=award_facetfacet.field=age_facetfacet.field=reading_facetfacet.field=grade_facetfacet.field=price_facetspellcheck=truedebugQuery=on lst name=debug str name=rawquerystring(sun) AND (pubyear:[1991 TO 2011])/str str name=querystring(sun) AND (pubyear:[1991 TO 2011])/str str name=parsedquery+((+DisjunctionMaxQuery((series:sun | desc:sun | bisacsub:sun | award:sun | format:sun | shortdesc:sun | pubyear:sun | author:sun^2.0 | category:sun | title:sun^9.0 | isbn10:sun | season:sun | imprint:sun | subtitle:sun^3.0 | isbn13:sun)) +DisjunctionMaxQuery((series:pubyear 1991 | desc:pubyear 1991 | bisacsub:pubyear 1991 | award:pubyear 1991 | format:pubyear 1991 | shortdesc:pubyear 1991 | pubyear:pubyear 1991 | author:pubyear 1991^2.0 | category:pubyear 1991 | title:pubyear 1991^9.0 | isbn10:pubyear 1991 | season:pubyear 1991 | imprint:pubyear 1991 | subtitle:pubyear 1991^3.0 | isbn13:pubyear 1991)) DisjunctionMaxQuery((series:2011 | desc:2011 | bisacsub:2011 | award:2011 | format:2011 | shortdesc:2011 | pubyear:2011 | author:2011^2.0 | category:2011 | title:2011^9.0 | isbn10:2011 | season:2011 | imprint:2011 | subtitle:2011^3.0 | isbn13:2011)))~1) ()/str str name=parsedquery_toString+((+(series:sun | desc:sun | bisacsub:sun | award:sun | format:sun | shortdesc:sun | pubyear:sun | author:sun^2.0 | category:sun | title:sun^9.0 | isbn10:sun | season:sun | imprint:sun | subtitle:sun^3.0 | isbn13:sun) +(series:pubyear 1991 | desc:pubyear 1991 | bisacsub:pubyear 1991 | award:pubyear 1991 | format:pubyear 1991 | shortdesc:pubyear 1991 | pubyear:pubyear 1991 | author:pubyear 1991^2.0 | category:pubyear 1991 | title:pubyear 1991^9.0 | isbn10:pubyear 1991 | season:pubyear 1991 | imprint:pubyear 1991 | subtitle:pubyear 1991^3.0 | isbn13:pubyear 1991) (series:2011 | desc:2011 | bisacsub:2011 | award:2011 | format:2011 | shortdesc:2011 | pubyear:2011 | author:2011^2.0 | category:2011 | title:2011^9.0 | isbn10:2011 | season:2011 | imprint:2011 | subtitle:2011^3.0 | isbn13:2011))~1) ()/str lst name=explain / str name=QParserDisMaxQParser/str Basically we are trying to pass the query string along with a facet field and the range. Is there any syntax issue? Please help this is urgent as I got stuck. Thanks, Solr user
Re: Dismax - Boosting
Hi Ahmet, The below is my previous configuration which use to work correctly. searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetextSpell/str lst name=spellchecker str name=namedefault/str str name=fieldsearchFields/str str name=spellcheckIndexDir/solr/qa/tradedata/spellchecker/str str name=buildOnCommittrue/str /lst /searchComponent We use to search only in one field which is searchFields but with implementing dismax we are searching in different fields like title^9.0 subtitle^3.0 author^2.0 desc shortdesc imprint category isbn13 isbn10 format series season bisacsub award. Do we need to modify the above configuration to include all the above fields:??? Please give me an example. In the past we use to query twice to get first the suggestions and then we use to query using the first suggestion to show the data. Is there a way that we can do it in one step? Thanks, Murali On Wed, Nov 17, 2010 at 7:00 PM, Ahmet Arslan iori...@yahoo.com wrote: 2. How to use spell checker request handler along with dismax? Just append this at the end of dismax request handler definition: arr name=last-components strspellcheck/str /arr /requestHandler
Re: Dismax - Boosting
Ahmet, I modified the schema as follows: (Added more fields for faceting) field name=title type=text indexed=true stored=true omitNorms=true / field name=author type=text indexed=true stored=true multiValued=true omitNorms=true / field name=authortype type=text indexed=true stored=true multiValued=true omitNorms=true / field name=isbn13 type=text indexed=true stored=true / field name=isbn10 type=text indexed=true stored=true / field name=material type=text indexed=true stored=true / field name=pubdate type=text indexed=true stored=true / field name=pubyear type=text indexed=true stored=true / field name=reldate type=text indexed=false stored=true / field name=format type=text indexed=true stored=true / field name=pages type=text indexed=false stored=true / field name=desc type=text indexed=true stored=true / field name=series type=text indexed=true stored=true / field name=season type=text indexed=true stored=true / field name=imprint type=text indexed=true stored=true / field name=bisacsub type=text indexed=true stored=true multiValued=true omitNorms=true / field name=bisacstatus type=text indexed=false stored=true / field name=category type=text indexed=true stored=true multiValued=true omitNorms=true / field name=award type=text indexed=true stored=true multiValued=true omitNorms=true / field name=age type=text indexed=true stored=true / field name=reading type=text indexed=true stored=true / field name=grade type=text indexed=true stored=true / field name=path type=text indexed=false stored=true / field name=shortdesc type=text indexed=true stored=true / field name=subtitle type=text indexed=true stored=true omitNorms=true/ field name=price type=float indexed=true stored=true/ field name=author_facet type=string indexed=true stored=true omitNorms=true/ field name=pubyear_facet type=string indexed=true stored=true multiValued=true omitNorms=true/ field name=format_facet type=string indexed=true stored=true omitNorms=true/ field name=series_facet type=string indexed=true stored=true omitNorms=true/ field name=season_facet type=string indexed=true stored=true omitNorms=true/ field name=imprint_facet type=string indexed=true stored=true omitNorms=true/ field name=category_facet type=string indexed=true stored=true multiValued=true omitNorms=true/ field name=award_facet type=string indexed=true stored=true multiValued=true omitNorms=true/ field name=age_facet type=string indexed=true stored=true omitNorms=true/ field name=reading_facet type=string indexed=true stored=true omitNorms=true/ field name=grade_facet type=string indexed=true stored=true omitNorms=true/ field name=price_facet type=string indexed=true stored=true omitNorms=true/ Also added Copy Fields as below: copyField source=author dest=author_facet/ copyField source=pubyear dest=pubyear_facet/ copyField source=format dest=format_facet/ copyField source=series dest=series_facet/ copyField source=season dest=season_facet/ copyField source=imprint dest=imprint_facet/ copyField source=category dest=category_facet/ copyField source=award dest=award_facet/ copyField source=age dest=age_facet/ copyField source=reading dest=reading_facet/ copyField source=grade dest=grade_facet/ copyField source=price dest=price_facet/ With the above changes I am not getting any facet data as a result. Why is that the facet data not returning and what mistake I did with the schema? Thanks, Solr User On Wed, Nov 17, 2010 at 6:42 PM, Ahmet Arslan iori...@yahoo.com wrote: Wow you facet on many fields : author,pubyear,format,series,season,imprint,category,award,age,reading,grade,price The fields you facet on should be untokenized type: string, int, tint date etc. The fields you want full text search, e.g. the ones you specify in qf, pf parameter should be text type. (title subtitle authordesc shortdesc imprint category isbn13 isbn10 format series season bisacsub award) If you have common fields, for example category, you need two copy of that. one string one text. So that you can both full-text search and facet on. Use copy field for this. copyField source=category dest=category_string/ Example document: category: electronic devices query electronic will return it, and facets on category_string will be displayed as : electronic devices (1) not : electronic (1) devices (1) --- On Wed, 11/17/10, Solr User solr...@gmail.com wrote: From: Solr User solr...@gmail.com Subject: Re: Dismax - Boosting To: solr-user@lucene.apache.org Date: Wednesday, November 17, 2010, 11:31 PM Ahmet, Thanks for the reply and it was very helpful. The query that I used before changing to dismax was: /solr/tradecore/spell/?q=curiouswt=jsonrows=9facet=truefacet.limit=-1facet.mincount=1facet.field=authorfacet.field=pubyearfacet.field=formatfacet.field=seriesfacet.field=seasonfacet.field=imprintfacet.field=categoryfacet.field=awardfacet.field=agefacet.field=readingfacet.field
Re: Dismax - Boosting
Ahmet, Thanks for the reply and it was very helpful. The query that I used before changing to dismax was: /solr/tradecore/spell/?q=curiouswt=jsonrows=9facet=truefacet.limit=-1facet.mincount=1facet.field=authorfacet.field=pubyearfacet.field=formatfacet.field=seriesfacet.field=seasonfacet.field=imprintfacet.field=categoryfacet.field=awardfacet.field=agefacet.field=readingfacet.field=gradefacet.field=pricespellcheck=true The above query use to return all the data related to facets, data and also any suggestions related to spelling mistakes properly. The configuration after modifying using dismax is as below: Schema.xml: field name=title type=text indexed=true stored=true omitNorms=true / field name=author type=text indexed=true stored=true multiValued=true omitNorms=true / field name=authortype type=text indexed=true stored=true multiValued=true omitNorms=true / field name=isbn13 type=text indexed=true stored=true / field name=isbn10 type=text indexed=true stored=true / field name=material type=text indexed=true stored=true / field name=pubdate type=text indexed=true stored=true / field name=pubyear type=text indexed=true stored=true / field name=reldate type=text indexed=false stored=true / field name=format type=text indexed=true stored=true / field name=pages type=text indexed=false stored=true / field name=desc type=text indexed=true stored=true / field name=series type=text indexed=true stored=true / field name=season type=text indexed=true stored=true / field name=imprint type=text indexed=true stored=true / field name=bisacsub type=text indexed=true stored=true multiValued=true omitNorms=true / field name=bisacstatus type=text indexed=false stored=true / field name=category type=text indexed=true stored=true multiValued=true omitNorms=true / field name=award type=text indexed=true stored=true multiValued=true omitNorms=true / field name=age type=text indexed=true stored=true / field name=reading type=text indexed=true stored=true / field name=grade type=text indexed=true stored=true / field name=path type=text indexed=false stored=true / field name=shortdesc type=text indexed=true stored=true / field name=subtitle type=text indexed=true stored=true omitNorms=true/ field name=price type=float indexed=true stored=true/ SolrConfig.xml: requestHandler name=dismax class=solr.SearchHandler default=true lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str !-- float name=tie0.01/float -- str name=qf title^9.0 subtitle^3.0 author^1.0 desc shortdesc imprint category isbn13 isbn10 format series season bisacsub award /str !-- str name=pf text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9 /str str name=bf popularity^0.5 recip(price,1,1000,1000)^0.3 /str -- str name=fl * /str !-- str name=mm 2lt;-1 5lt;-2 6lt;90% /str int name=ps100/int str name=q.alt*:*/str -- !-- example highlighter config, enable per-query with hl=true -- !-- str name=hl.fltext features name/str -- !-- for this field, we want no fragmenting, just highlighting -- !-- str name=f.name.hl.fragsize0/str -- !-- instructs Solr to return the field itself if no query terms are found -- !-- str name=f.name.hl.alternateFieldname/str str name=f.text.hl.fragmenterregex/str -- !-- defined below -- /lst /requestHandler The query that I used after changing to dismax is: solr/tradecore/select/?q=curiouswt=jsonrows=9facet=truefacet.limit=-1facet.mincount=1facet.field=authorfacet.field=pubyearfacet.field=formatfacet.field=seriesfacet.field=seasonfacet.field=imprintfacet.field=categoryfacet.field=awardfacet.field=agefacet.field=readingfacet.field=gradefacet.field=pricespellcheck=true The following are the issues that I am having after modifying to dismax: 1. Facets data is not coming correctly. Lot of extra data is coming. Why and how to fix it? 2. How to use spell checker request handler along with dismax? Thanks, Murali On Mon, Nov 15, 2010 at 5:38 PM, Ahmet Arslan iori...@yahoo.com wrote: 1. Do we need to change the above DisMax handler configuration as per our requirements? Or Leave it as it is? What changes? Yes, you need to edit it. At least field names. Does your schema has a field named sku? 2. Do we need make DisMax as a default request handler? Do I need to add attribute default=true to the tag? If you are going to always use it, why not, change it by adding default=true. By doing so you need to add qt parameter in every request. But don't forget to delete other default=true. There can be only one default=true :) 3. I read in the documentation that Default Search Handler and DisMax are the same except that to use DisMaxQueryParser add defType=dismax in the query string. Is there anything else do we need to do? Above dismax
Dismax - Boosting
Hi, Currently we are using StandardRequestHandler and the configuration in SolrConfig.xml is as below: requestHandler name=standard class=solr.SearchHandler default=true !-- default values for query parameters -- lst name=defaults str name=echoParamsexplicit/str !-- int name=rows10/int str name=fl*/str str name=version2.1/str -- /lst /requestHandler We would like to switch to DisMax request handler and the configuration in SolrConfig.xml is: requestHandler name=dismax class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qf text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 /str str name=pf text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9 /str str name=bf popularity^0.5 recip(price,1,1000,1000)^0.3 /str str name=fl id,name,price,score /str str name=mm 2lt;-1 5lt;-2 6lt;90% /str int name=ps100/int str name=q.alt*:*/str !-- example highlighter config, enable per-query with hl=true -- str name=hl.fltext features name/str !-- for this field, we want no fragmenting, just highlighting -- str name=f.name.hl.fragsize0/str !-- instructs Solr to return the field itself if no query terms are found -- str name=f.name.hl.alternateFieldname/str str name=f.text.hl.fragmenterregex/str !-- defined below -- /lst /requestHandler Questions: 1. Do we need to change the above DisMax handler configuration as per our requirements? Or Leave it as it is? What changes? 2. Do we need make DisMax as a default request handler? Do I need to add attribute default=true to the tag? 3. I read in the documentation that Default Search Handler and DisMax are the same except that to use DisMaxQueryParser add defType=dismax in the query string. Is there anything else do we need to do? We are basically moving on to dismax handler and trying to understand what changes we need to make to SolrConfig.xml. I understood what changes need to be made to schema.xml in a different thread on this forum. Thanks, Solr User
Re: WELCOME to solr-user@lucene.apache.org
Ahmet, Thanks for the reply. select/?q=built+to+lastdefType=dismaxqf=searchFields^0.2+title^20debugQuery=on For some reason if I use title field in my query I don't get any results. I am copying all searchable fields into searchFields field. So I am able to search only in the searchFields field not in any other fields. I request you all to clarify if anything wrong with my schema.xml. The schema.xml is at the bottom of this email. I am not able to get the boosting working on the title field. Please help me here too. Thanks, Solr User On Thu, Nov 11, 2010 at 5:11 PM, Ahmet Arslan iori...@yahoo.com wrote: There are several mistakes in your approach: copyField just copies data. Index time boost is not copied. There is no such boosting syntax. /select?q=Eachtitle^9fl=score You are searching on your default field. This is not your cause of your problem but omitNorms=true disables index time boosts. http://wiki.apache.org/solr/DisMaxQParserPlugin can satisfy your need. --- On Thu, 11/11/10, Solr User solr...@gmail.com wrote: From: Solr User solr...@gmail.com Subject: Re: WELCOME to solr-user@lucene.apache.org To: solr-user@lucene.apache.org Date: Thursday, November 11, 2010, 11:54 PM Eric, Thank you so much for the reply and apologize for not providing all the details. The following are the field definitons in my schema.xml: field name=title type=string indexed=true stored=true omitNorms=false / field name=author type=string indexed=true stored=true multiValued=true omitNorms=true / field name=authortype type=string indexed=true stored=true multiValued=true omitNorms=true / field name=isbn13 type=string indexed=true stored=true / field name=isbn10 type=string indexed=true stored=true / field name=material type=string indexed=true stored=true / field name=pubdate type=string indexed=true stored=true / field name=pubyear type=string indexed=true stored=true / field name=reldate type=string indexed=false stored=true / field name=format type=string indexed=true stored=true / field name=pages type=string indexed=false stored=true / field name=desc type=string indexed=true stored=true / field name=series type=string indexed=true stored=true / field name=season type=string indexed=true stored=true / field name=imprint type=string indexed=true stored=true / field name=bisacsub type=string indexed=true stored=true multiValued=true omitNorms=true / field name=bisacstatus type=string indexed=false stored=true / field name=category type=string indexed=true stored=true multiValued=true omitNorms=true / field name=award type=string indexed=true stored=true multiValued=true omitNorms=true / field name=age type=string indexed=true stored=true / field name=reading type=string indexed=true stored=true / field name=grade type=string indexed=true stored=true / field name=path type=string indexed=false stored=true / field name=shortdesc type=string indexed=true stored=true / field name=subtitle type=string indexed=true stored=true omitNorms=true/ field name=price type=float indexed=true stored=true/ field name=searchFields type=textSpell indexed=true stored=true multiValued=true omitNorms=true/ Copy Fields: copyField source=title dest=searchFields/ copyField source=author dest=searchFields/ copyField source=isbn13 dest=searchFields/ copyField source=isbn10 dest=searchFields/ copyField source=format dest=searchFields/ copyField source=series dest=searchFields/ copyField source=season dest=searchFields/ copyField source=imprint dest=searchFields/ copyField source=bisacsub dest=searchFields/ copyField source=category dest=searchFields/ copyField source=award dest=searchFields/ copyField source=shortdesc dest=searchFields/ copyField source=desc dest=searchFields/ copyField source=subtitle dest=searchFields/ defaultSearchFieldsearchFields/defaultSearchField Before creating the indexes I feed XML file to the Solr job to create index files. I added Boost attribute to the title field before creating indexes and an example is below: ?xml version=1.0 encoding=UTF-8 standalone=no?adddocfield name=material1785440/fieldfield boost=10.0 name=titleEach Little Bird That Sings/fieldfield name=price16.0/fieldfield name=isbn100152051139/fieldfield name=isbn139780152051136/fieldfield name=formatHardcover/fieldfield name=pubdate2005-03-01/fieldfield name=pubyear2005/fieldfield name=reldate2005-02-22/fieldfield name=pages272/fieldfield name=bisacstatusActive/fieldfield name=seasonSpring 2005/fieldfield name=imprintChildren's/fieldfield name=age8.0-12.0/fieldfield name=grade3-6/fieldfield name=authorMarla Frazee/fieldfield name=authortypeJacket Illustrator/fieldfield name=authorDeborah Wiles/fieldfield
Re: WELCOME to solr-user@lucene.apache.org
Ahmet, In production system we are using /spell/?q=built+to+last so that we can check the spelling. We are not using /select?q=built+to+last Can I use dismax with /spell? I understood from your reply that I need to change my schema.xml and modify the field types. Do I need to still use the searchFields field and what do I need to specify in the defaultSearchField tag? searchFields is one of the field names that we provided. Thanks, Solr User On Fri, Nov 12, 2010 at 10:26 AM, Ahmet Arslan iori...@yahoo.com wrote: select/?q=built+to+lastdefType=dismaxqf=searchFields^0.2+title^20debugQuery=on For some reason if I use title field in my query I don't get any results. I am copying all searchable fields into searchFields field. So I am able to search only in the searchFields field not in any other fields. I request you all to clarify if anything wrong with my schema.xml. The schema.xml is at the bottom of this email. I am not able to get the boosting working on the title field. Please help me here too. Change type of your title field. It is string now. Make it solr.TextField. Actually you dont need cath-all copy field with dismax. Just change their types string to text and append them qf= parameter.
Re: WELCOME to solr-user@lucene.apache.org
Hi, I have a question about boosting. I have the following fields in my schema.xml: 1. title 2. description 3. ISBN etc I want to boost the field title. I tried index time boosting but it did not work. I also tried Query time boosting but with no luck. Can someone help me on how to implement boosting on a specific field like title? Thanks, Solr User On Thu, Nov 11, 2010 at 10:26 AM, solr-user-h...@lucene.apache.org wrote: Hi! This is the ezmlm program. I'm managing the solr-user@lucene.apache.org mailing list. I'm working for my owner, who can be reached at solr-user-ow...@lucene.apache.org. Acknowledgment: I have added the address solr...@gmail.com to the solr-user mailing list. Welcome to solr-u...@lucene.apache.org! Please save this message so that you know the address you are subscribed under, in case you later want to unsubscribe or change your subscription address. --- Administrative commands for the solr-user list --- I can handle administrative requests automatically. Please do not send them to the list address! Instead, send your message to the correct command address: To subscribe to the list, send a message to: solr-user-subscr...@lucene.apache.org To remove your address from the list, send a message to: solr-user-unsubscr...@lucene.apache.org Send mail to the following for info and FAQ for this list: solr-user-i...@lucene.apache.org solr-user-...@lucene.apache.org Similar addresses exist for the digest list: solr-user-digest-subscr...@lucene.apache.org solr-user-digest-unsubscr...@lucene.apache.org To get messages 123 through 145 (a maximum of 100 per request), mail: solr-user-get.123_...@lucene.apache.org To get an index with subject and author for messages 123-456 , mail: solr-user-index.123_...@lucene.apache.org They are always returned as sets of 100, max 2000 per request, so you'll actually get 100-499. To receive all messages with the same subject as message 12345, send a short message to: solr-user-thread.12...@lucene.apache.org The messages should contain one line or word of text to avoid being treated as s...@m, but I will ignore their content. Only the ADDRESS you send to is important. You can start a subscription for an alternate address, for example j...@host.domain, just add a hyphen and your address (with '=' instead of '@') after the command word: solr-user-subscribe-john=host.dom...@lucene.apache.org To stop subscription for this address, mail: solr-user-unsubscribe-john=host.dom...@lucene.apache.org In both cases, I'll send a confirmation message to that address. When you receive it, simply reply to it to complete your subscription. If despite following these instructions, you do not get the desired results, please contact my owner at solr-user-ow...@lucene.apache.org. Please be patient, my owner is a lot slower than I am ;-) --- Enclosed is a copy of the request I received. Return-Path: solr...@gmail.com Received: (qmail 48883 invoked by uid 99); 11 Nov 2010 15:26:44 - Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Nov 2010 15:26:44 + X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of solr...@gmail.comdesignates 209.85.213.48 as permitted sender) Received: from [209.85.213.48] (HELO mail-yw0-f48.google.com) (209.85.213.48) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Nov 2010 15:26:35 + Received: by ywp4 with SMTP id 4so1394872ywp.35 for solr-user-sc.1289489103.apfngfdapdhadiahjfln-solrnew=gmail.com @lucene.apache.org; Thu, 11 Nov 2010 07:26:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=4KuKRrRVLjzTO4oB9/DNxMdQPfNQH2GnYznzPE6YqOo=; b=l5lBfUYcyvipJn9SE+5j+t1XUmBjTtbyPYlRVj7jDb6G+W3NzQ21EHOowiD9rNH2L9 gc2+6mGEZmRJOZQwpKD7SUQ2bXL9fVm7mVfS21TMAgC+ZsWQ3vvFOHXalWZa8dbtcOY7 C23KauLY7YH1UfducfXL77J7u0/snEZl5jQ7A= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=nb9+3a9bOHnjGO5T5BhMlW15adcafr+MPzvpgc5X5NXEUGCI05ViLho0SSoQP2Wp2i xp1Mfjrjw05umeKmHX23oeD5Idc2G6xgz8I3ZcJ1bUM+cD7c52cMKG2suE2VvhUHlfah z52rEtlqd0Q9fk/ZDWwR2DS7GoiVMRmgaWgD0= MIME-Version: 1.0 Received: by 10.229.216.201 with SMTP id hj9mr877669qcb.58.1289489174123; Thu, 11 Nov 2010 07:26:14 -0800 (PST) Received: by 10.229.66.165 with HTTP; Thu, 11 Nov 2010 07:26:14 -0800 (PST) In-Reply-To: 1289489103.46214.ez...@lucene.apache.org References: 1289489103.46214.ez...@lucene.apache.org
Boosting
Hi, I have a question about boosting. I have the following fields in my schema.xml: 1. title 2. description 3. ISBN etc I want to boost the field title. I tried index time boosting but it did not work. I also tried Query time boosting but with no luck. Can someone help me on how to implement boosting on a specific field like title? Thanks, Solr User
Re: WELCOME to solr-user@lucene.apache.org
learns about life's surprises in this funny, poignant, and very Southern coming-of-age story./field/docdocfield name=material1195443/fieldfield boost=10.0 name=titleBaby Bear's Chairs/fieldfield name=price16.0/fieldfield name=isbn100152051147/fieldfield name=isbn139780152051143/fieldfield name=formatHardcover/fieldfield name=pubdate2005-09-01/fieldfield name=pubyear2005/fieldfield name=reldate2005-08-01/fieldfield name=pages40/fieldfield name=bisacstatusActive/fieldfield name=seasonFall 2005/fieldfield name=imprintChildren's/fieldfield name=age2.0-5.0/fieldfield name=gradeP-K/fieldfield name=authorJane Yolen/fieldfield name=authortypeAuthor/fieldfield name=authorMelissa Sweet/fieldfield name=authortypeIllustrator/fieldfield name=bisacsubBedtime amp; Dreams/fieldfield name=bisacsubAnimals/Bears/fieldfield name=bisacsubFamily/General (see also headings under Social Issues)/fieldfield name=bisacsubSocial Issues/Emotions amp; Feelings/fieldfield name=bisacsubFamily/Parents/fieldfield name=categoryAnimals/Bears/fieldfield name=categoryBedtime Books/fieldfield name=categoryFamily Relationships/Parent-Child/fieldfield name=path/assets/product/0152051147.gif/fieldfield name=desclt;divgt;Baby Bear is the littlest bear in his family, and sometimes that's not so easy. Mama and Papa Bear get to stay up late in their great big chairs. Big brother gets to play fun games in his middle-sized chair. And Baby Bear only seems to cause trouble in his own tiny chair. But at the end of the day, he finds the onelt;igt; lt;/igt;perfect chair that's comfier and cozier than all the rest.lt;brgt; lt;brgt;Bestselling author Jane Yolen and popular illustrator Melissa Sweet have come together to create a lyrical bedtime tale about a baby bear trying to find his place in a family. With a playful rhyming text and adorable, fun illustrations, here is a book for parents and their own baby bears to treasure.lt;brgt;lt;/divgt;/fieldfield name=shortdescIn this sweet, bedtime story, Baby Bear discovers that Papa's lap is the best chair of all!/field/doc/add I am trying to boost the title field so that the search results brings the actual match with title as the first item in the results. Adding boost attribute to the title field and Index time boosting did not change the search results. I tried Query time boosting also as mentioned below but no luck /select?q=Each+Little+Bird+That+Singstitle^9fl=score Any help to fix this issue would be really helpful. Thanks, Solr User On Thu, Nov 11, 2010 at 10:32 AM, Solr User solr...@gmail.com wrote: Hi, I have a question about boosting. I have the following fields in my schema.xml: 1. title 2. description 3. ISBN etc I want to boost the field title. I tried index time boosting but it did not work. I also tried Query time boosting but with no luck. Can someone help me on how to implement boosting on a specific field like title? Thanks, Solr User
what would cause large numbers of executeWithRetry INFO messages?
I see a large number (~1000) of the following executeWithRetry messages in my apache catalina log files every day (see bolded snippet below). They seem to appear at random intervals. Since they are not flagged as errors or warnings, I have been ignoring them for now. However, I started wondering if INFO message is a red-herring and thinking there might be an actual problem somewhere. Does anyone know what would cause this type of message? Are they normal? I have not seen anything in my google searches for solr that contain this message Details: 1. My CPU usage seems fine as does my heap; we have lots of cpu capacity and heap space 2. The log is from a searcher but I know that the intervals do not correspond to replication (every 15 min on the hour) 3. the INFO lines appear in all searcher logs (we have a number of searchers) 4. the data is around 10m records per searcher and occupies around 14gb 5. I am not noticing any problems performing queries on the solr (so no trace info to give you); performance and queries seem fine Log snippet: Sep 10, 2010 2:17:59 AM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. Sep 10, 2010 2:18:20 AM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry INFO: I/O exception (org.apache.commons.httpclient.NoHttpResponseException) caught when processing request: The server xxx.admin.inf failed to respond Sep 10, 2010 2:18:20 AM org.apache.commons.httpclient.HttpMethodDirector executeWithRetry INFO: Retrying request Sep 10, 2010 2:18:20 AM org.apache.solr.handler.SnapPuller fetchLatestIndex INFO: Slave in sync with master. any info appreciated. thx -- View this message in context: http://lucene.472066.n3.nabble.com/what-would-cause-large-numbers-of-executeWithRetry-INFO-messages-tp1453417p1453417.html Sent from the Solr - User mailing list archive at Nabble.com.