Re: Need Help in Patching OPENNLP
Thanks much !! Explorer -- Internet Explorer :) Sorry for the miscommunication. Yeah let me check it once again. appreciate all the help :) krn -- View this message in context: http://lucene.472066.n3.nabble.com/Need-Help-in-Patching-OPENNLP-tp4052362p4053094.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: MoreLikeThis - Odd results - what am I doing wrong?
OK - so I have my SOLR instance running on AWS. Any suggestions on how to safely share the link? Right now, the whole SOLR instance is totally open. Gagandeep singh gagan.g...@gmail.com wrote: say debugQuery=truemlt=true and see the scores for the MLT query, not a sample query. You can use Amazon ec2 to bring up your solr, you should be able to get a micro instance for free trial. On Mon, Apr 1, 2013 at 5:10 AM, dc tech dctech1...@gmail.com wrote: I did try the raw query against the *simi* field and those seem to return results in the order expected. For instance, Acura MDX has ( large, SUV, 4WD Luxury) in the simi field. Running a query with those words against the simi field returns the expected models (X5, Audi Q5, etc) and then the subsequent documents have decreasing relevance. So the basic query mechanism seems to be fine. The issue just seems to be with MoreLikeThis component and handler. I can post the index on a public SOLR instance - any suggestions? (or for hosting) On Sun, Mar 31, 2013 at 1:54 PM, Gagandeep singh gagan.g...@gmail.com wrote: If you can bring up your solr setup on a public machine then im sure a lot of debugging can be done. Without that, i think what you should look at is the tf-idf scores of the terms like camry etc. Usually idf is the deciding factor into which results show at the top (tf should be 1 for your data). Enable debugQuery=true and look at explain section to see show score is getting calculated. You should try giving different boosts to class, type, drive, size to control the results. On Sun, Mar 31, 2013 at 8:52 PM, dc tech dctech1...@gmail.com wrote: I am running some experiments on more like this and the results seem rather odd - I am doing something wrong but just cannot figure out what. Basically, the similarity results are decent - but not great. *Issue 1 = Quality* Toyota Camry : finds Altima (good) but then next one is Camry Hybrid whereas it should have found Accord. I have normalized the data into a simi field which has only the attributes that I care about. Without the simi field, I could not get mlt.qf boosts to work well enough to return results *Issue 2* Some fields do not work at all. For instance, text+simi (in mlt.fl) works whereas just simi does not. So some weirdness that am just not understanding. Would be grateful for your guidance ! Here is the setup: *1. SOLR Version* solr-spec 4.2.0.2013.03.06.22.32.13 solr-impl 4.2.0 1453694 rmuir - 2013-03-06 22:32:13 lucene-spec 4.2.0 lucene-impl 4.2.0 1453694 - rmuir - 2013-03-06 22:25:29 *2. Machine Information* Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM (1.6.0_23 19.0-b09) Windows 7 Home 64 Bit with 4 GB RAM *3. Sample Data * I created this 'dummy' data of cars - the idea being that these would be sufficient and simple to generate similarity and understand how it would work. There are 181 rows in the data set (I have attached it for reference in CSV format) [image: Inline image 1] *4. SCHEMA* *Field Definitions* field name=id type=string indexed=true stored=true termVectors=true multiValued=false/ field name=make type=string indexed=true stored=true termVectors=true multiValued=false/ field name=model type=string indexed=true stored=true termVectors=true multiValued=false/ field name=class type=string indexed=true stored=true termVectors=true multiValued=false/ field name=type type=string indexed=true stored=true termVectors=true multiValued=false/ field name=drive type=string indexed=true stored=true termVectors=true multiValued=false/ field name=comment type=text_general indexed=true stored=true termVectors=true multiValued=true/ field name=size type=string indexed=true stored=true termVectors=true multiValued=false/ * * *Copy Fields* copyField source=make dest=make_en / !-- Search -- copyField source=model dest=model_en / !-- Search -- copyField source=class dest=class_en / !-- Search -- copyField source=type dest=type_en / !-- Search -- copyField source=drive dest=drive_en / !-- Search -- copyField source=comment dest=comment_en / !-- Search -- copyField source=size dest=size_en / !-- Search -- copyField source=id dest=text / !-- Glob -- copyField source=make dest=text / !-- Glob -- copyField source=model dest=text / !-- Glob -- copyField source=class dest=text / !-- Glob -- copyField source=type dest=text / !-- Glob -- copyField source=drive dest=text / !-- Glob -- copyField source=comment dest=text / !-- Glob -- copyField source=size dest=text / !-- Glob -- copyField source=size dest=text / !-- Glob -- *copyField source=class dest=simi_en / !-- similarity --*
java.lang.OutOfMemoryError: Map failed
Hi Recently solr crashed. I've found this in the error log. My commit settings are loking like this: autoCommit maxTime1/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime2000/maxTime /autoSoftCommit The machine has 10GB of memory. Tomcat is running with -Xms2048m -Xmx6144m Versions Solr: 4.2 Tomcat: 7.0.33 Java: 1.7 Anybody any idea? Thx! Arkadi SEVERE: auto commit error...:org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1415) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1527) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:562) at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:849) at org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:283) at org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(MMapDirectory.java:228) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:195) at org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.init(CompressingStoredFieldsReader.java:96) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:113) at org.apache.lucene.index.SegmentCoreReaders.init(SegmentCoreReaders.java:147) at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:56) at org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:121) at org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:269) at org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2961) at org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2952) at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:368) at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:270) at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:255) at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:249) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1353) ... 11 more Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:846) ... 28 more SEVERE: auto commit error...:java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2661) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2827) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2807) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:541) at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722)
Out of memory on some faceting queries
On some queries I get out of memory errors: {error:{msg:java.lang.OutOfMemoryError: Java heap space,trace:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space\n\tat org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:462)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:290)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:365)\n\tat org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)\n\tat org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)\n\tat org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)\n\tat org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)\n\tat org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)\n\tat org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)\n\tat org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)\n\tat org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)\n\tat java.lang.Thread.run(Thread.java:679)\nCaused by: java.lang.OutOfMemoryError: Java heap space\n\tat org.apache.lucene.index.DocTermOrds.uninvert(DocTermOrds.java:273)\n\tat org.apache.solr.request.UnInvertedField.init(UnInvertedField.java:178)\n\tat org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:669)\n\tat org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:325)\n\tat org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:423)\n\tat org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:205)\n\tat org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:78)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1816)\n\tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:448)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:269)\n\tat org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)\n\tat org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)\n\tat org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560)\n\tat org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)\n\tat org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)\n\tat org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)\n\tat org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:365)\n\tat org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)\n\tat org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)\n\tat
AW: java.lang.OutOfMemoryError: Map failed
Hi Arkadi, this error usually indicates that virtual memory is not sufficient (should be unlimited). Please see http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/69168 Regards, André Von: Arkadi Colson [ark...@smartbit.be] Gesendet: Dienstag, 2. April 2013 10:24 An: solr-user@lucene.apache.org Betreff: java.lang.OutOfMemoryError: Map failed Hi Recently solr crashed. I've found this in the error log. My commit settings are loking like this: autoCommit maxTime1/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime2000/maxTime /autoSoftCommit The machine has 10GB of memory. Tomcat is running with -Xms2048m -Xmx6144m Versions Solr: 4.2 Tomcat: 7.0.33 Java: 1.7 Anybody any idea? Thx! Arkadi SEVERE: auto commit error...:org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1415) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1527) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:562) at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:849) at org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:283) at org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(MMapDirectory.java:228) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:195) at org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.init(CompressingStoredFieldsReader.java:96) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:113) at org.apache.lucene.index.SegmentCoreReaders.init(SegmentCoreReaders.java:147) at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:56) at org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:121) at org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:269) at org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2961) at org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2952) at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:368) at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:270) at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:255) at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:249) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1353) ... 11 more Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:846) ... 28 more SEVERE: auto commit error...:java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2661) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2827) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2807) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:541) at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at
Re: AW: java.lang.OutOfMemoryError: Map failed
Hmmm I checked it and it seems to be ok: root@solr01-dcg:~# ulimit -v unlimited Any other tips or do you need more debug info? BR On 04/02/2013 11:15 AM, André Widhani wrote: Hi Arkadi, this error usually indicates that virtual memory is not sufficient (should be unlimited). Please see http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/69168 Regards, André Von: Arkadi Colson [ark...@smartbit.be] Gesendet: Dienstag, 2. April 2013 10:24 An: solr-user@lucene.apache.org Betreff: java.lang.OutOfMemoryError: Map failed Hi Recently solr crashed. I've found this in the error log. My commit settings are loking like this: autoCommit maxTime1/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime2000/maxTime /autoSoftCommit The machine has 10GB of memory. Tomcat is running with -Xms2048m -Xmx6144m Versions Solr: 4.2 Tomcat: 7.0.33 Java: 1.7 Anybody any idea? Thx! Arkadi SEVERE: auto commit error...:org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1415) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1527) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:562) at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:849) at org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:283) at org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(MMapDirectory.java:228) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:195) at org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.init(CompressingStoredFieldsReader.java:96) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:113) at org.apache.lucene.index.SegmentCoreReaders.init(SegmentCoreReaders.java:147) at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:56) at org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:121) at org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:269) at org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2961) at org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2952) at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:368) at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:270) at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:255) at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:249) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1353) ... 11 more Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:846) ... 28 more SEVERE: auto commit error...:java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2661) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2827) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2807) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:541) at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at
AW: AW: java.lang.OutOfMemoryError: Map failed
The output is from the root user. Are you running Solr as root? If not, please try again using the operating system user that runs Solr. André Von: Arkadi Colson [ark...@smartbit.be] Gesendet: Dienstag, 2. April 2013 11:26 An: solr-user@lucene.apache.org Cc: André Widhani Betreff: Re: AW: java.lang.OutOfMemoryError: Map failed Hmmm I checked it and it seems to be ok: root@solr01-dcg:~# ulimit -v unlimited Any other tips or do you need more debug info? BR On 04/02/2013 11:15 AM, André Widhani wrote: Hi Arkadi, this error usually indicates that virtual memory is not sufficient (should be unlimited). Please see http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/69168 Regards, André Von: Arkadi Colson [ark...@smartbit.be] Gesendet: Dienstag, 2. April 2013 10:24 An: solr-user@lucene.apache.org Betreff: java.lang.OutOfMemoryError: Map failed Hi Recently solr crashed. I've found this in the error log. My commit settings are loking like this: autoCommit maxTime1/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime2000/maxTime /autoSoftCommit The machine has 10GB of memory. Tomcat is running with -Xms2048m -Xmx6144m Versions Solr: 4.2 Tomcat: 7.0.33 Java: 1.7 Anybody any idea? Thx! Arkadi SEVERE: auto commit error...:org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1415) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1527) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:562) at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:849) at org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:283) at org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(MMapDirectory.java:228) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:195) at org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.init(CompressingStoredFieldsReader.java:96) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:113) at org.apache.lucene.index.SegmentCoreReaders.init(SegmentCoreReaders.java:147) at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:56) at org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:121) at org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:269) at org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:2961) at org.apache.lucene.index.IndexWriter.maybeApplyDeletes(IndexWriter.java:2952) at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:368) at org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:270) at org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:255) at org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:249) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1353) ... 11 more Caused by: java.lang.OutOfMemoryError: Map failed at sun.nio.ch.FileChannelImpl.map0(Native Method) at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:846) ... 28 more SEVERE: auto commit error...:java.lang.IllegalStateException: this writer hit an OutOfMemoryError; cannot commit at org.apache.lucene.index.IndexWriter.prepareCommitInternal(IndexWriter.java:2661) at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2827) at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2807)
Re: AW: AW: java.lang.OutOfMemoryError: Map failed
It is running as root: root@solr01-dcg:~# ps aux | grep tom root 1809 10.2 67.5 49460420 6931232 ?Sl Mar28 706:29 /usr/bin/java -Djava.util.logging.config.file=/usr/local/tomcat/conf/logging.properties -server -Xms2048m -Xmx6144m -XX:PermSize=64m -XX:MaxPermSize=128m -XX:+UseG1GC -verbose:gc -Xloggc:/solr/tomcat-logs/gc.log -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Duser.timezone=UTC -Dfile.encoding=UTF8 -Dsolr.solr.home=/opt/solr/ -Dport=8983 -Dcollection.configName=smsc -DzkClientTimeout=2 -DzkHost=solr01-dcg.intnet.smartbit.be:2181,solr01-gs.intnet.smartbit.be:2181,solr02-dcg.intnet.smartbit.be:2181,solr02-gs.intnet.smartbit.be:2181,solr03-dcg.intnet.smartbit.be:2181,solr03-gs.intnet.smartbit.be:2181 -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port= -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Djava.endorsed.dirs=/usr/local/tomcat/endorsed -classpath /usr/local/tomcat/bin/bootstrap.jar:/usr/local/tomcat/bin/tomcat-juli.jar -Dcatalina.base=/usr/local/tomcat -Dcatalina.home=/usr/local/tomcat -Djava.io.tmpdir=/usr/local/tomcat/temp org.apache.catalina.startup.Bootstrap start Arkadi On 04/02/2013 11:29 AM, André Widhani wrote: The output is from the root user. Are you running Solr as root? If not, please try again using the operating system user that runs Solr. André Von: Arkadi Colson [ark...@smartbit.be] Gesendet: Dienstag, 2. April 2013 11:26 An: solr-user@lucene.apache.org Cc: André Widhani Betreff: Re: AW: java.lang.OutOfMemoryError: Map failed Hmmm I checked it and it seems to be ok: root@solr01-dcg:~# ulimit -v unlimited Any other tips or do you need more debug info? BR On 04/02/2013 11:15 AM, André Widhani wrote: Hi Arkadi, this error usually indicates that virtual memory is not sufficient (should be unlimited). Please see http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/69168 Regards, André Von: Arkadi Colson [ark...@smartbit.be] Gesendet: Dienstag, 2. April 2013 10:24 An: solr-user@lucene.apache.org Betreff: java.lang.OutOfMemoryError: Map failed Hi Recently solr crashed. I've found this in the error log. My commit settings are loking like this: autoCommit maxTime1/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime2000/maxTime /autoSoftCommit The machine has 10GB of memory. Tomcat is running with -Xms2048m -Xmx6144m Versions Solr: 4.2 Tomcat: 7.0.33 Java: 1.7 Anybody any idea? Thx! Arkadi SEVERE: auto commit error...:org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1415) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1527) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:562) at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:849) at org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:283) at org.apache.lucene.store.MMapDirectory$MMapIndexInput.init(MMapDirectory.java:228) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:195) at org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.init(CompressingStoredFieldsReader.java:96) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:113) at org.apache.lucene.index.SegmentCoreReaders.init(SegmentCoreReaders.java:147) at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:56) at org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:121) at org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:269)
Re: Out of memory on some faceting queries
On Tue, 2013-04-02 at 11:09 +0200, Dotan Cohen wrote: On some queries I get out of memory errors: {error:{msg:java.lang.OutOfMemoryError: Java heap [...] org.apache.lucene.index.DocTermOrds.uninvert(DocTermOrds.java:273)\n\tat org.apache.solr.request.UnInvertedField.init(UnInvertedField.java:178)\n\tat [...] Yep, your OOM is due to faceting. How many documents does your index have, how many fields do you facet on and approximately how many unique values does your facet fields have? I notice that this only occurs on queries that run facets. I start Solr with the following command: sudo nohup java -XX:NewRatio=1 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -Dsolr.solr.home=/mnt/SolrFiles100/solr -jar /opt/solr-4.1.0/example/start.jar You are not specifying any maximum heap size (-Xmx), which you should do in order to avoid unpleasant surprises. Facets and sorting are often memory hungry, but your system seems to have 13GB free RAM so the easy solution attempt would be to increase the heap until Solr serves the facets without OOM. - Toke Eskildsen, State and University Library, Denmark
Re: Out of memory on some faceting queries
On Tue, Apr 2, 2013 at 12:59 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote: How many documents does your index have, how many fields do you facet on and approximately how many unique values does your facet fields have? 8971763 documents, growing at a rate of about 500 per minute. We actually expect that to be ~5 per minute once we get out of testing. Most documents are less than a KiB in the 'text' field, and they have a few other fields which store short strings, dates, or ints. You can think of these documents like tweets: short general purpose text messages. I notice that this only occurs on queries that run facets. I start Solr with the following command: sudo nohup java -XX:NewRatio=1 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -Dsolr.solr.home=/mnt/SolrFiles100/solr -jar /opt/solr-4.1.0/example/start.jar You are not specifying any maximum heap size (-Xmx), which you should do in order to avoid unpleasant surprises. Facets and sorting are often memory hungry, but your system seems to have 13GB free RAM so the easy solution attempt would be to increase the heap until Solr serves the facets without OOM. Thanks, I will start with -Xmx8g and test. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: AW: AW: java.lang.OutOfMemoryError: Map failed
I have seen the exact same on Ubuntu Server 12.04. It helped adding some swap space, but I do not understand why this is necessary, since OS ought to just use the actual memory mapped files if there is not room in (virtual) memory, swapping pages in and out on demand. Note that I saw this for memory mapped files opened for read+write - not in the exact same context as you see it where MMapDirectory is trying to map memory mapped files. If you find a solution/explanation, please post it here. I really want to know more about why FileChannel.map can cause OOM. I do not think the OOM is a real OOM indicating no more space on java heap, but is more an exception saying that OS has no more memory (in some interpretation of that). Regards, Per Steffensen On 4/2/13 11:32 AM, Arkadi Colson wrote: It is running as root: root@solr01-dcg:~# ps aux | grep tom root 1809 10.2 67.5 49460420 6931232 ?Sl Mar28 706:29 /usr/bin/java -Djava.util.logging.config.file=/usr/local/tomcat/conf/logging.properties -server -Xms2048m -Xmx6144m -XX:PermSize=64m -XX:MaxPermSize=128m -XX:+UseG1GC -verbose:gc -Xloggc:/solr/tomcat-logs/gc.log -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Duser.timezone=UTC -Dfile.encoding=UTF8 -Dsolr.solr.home=/opt/solr/ -Dport=8983 -Dcollection.configName=smsc -DzkClientTimeout=2 -DzkHost=solr01-dcg.intnet.smartbit.be:2181,solr01-gs.intnet.smartbit.be:2181,solr02-dcg.intnet.smartbit.be:2181,solr02-gs.intnet.smartbit.be:2181,solr03-dcg.intnet.smartbit.be:2181,solr03-gs.intnet.smartbit.be:2181 -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port= -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Djava.endorsed.dirs=/usr/local/tomcat/endorsed -classpath /usr/local/tomcat/bin/bootstrap.jar:/usr/local/tomcat/bin/tomcat-juli.jar -Dcatalina.base=/usr/local/tomcat -Dcatalina.home=/usr/local/tomcat -Djava.io.tmpdir=/usr/local/tomcat/temp org.apache.catalina.startup.Bootstrap start Arkadi On 04/02/2013 11:29 AM, André Widhani wrote: The output is from the root user. Are you running Solr as root? If not, please try again using the operating system user that runs Solr. André Von: Arkadi Colson [ark...@smartbit.be] Gesendet: Dienstag, 2. April 2013 11:26 An: solr-user@lucene.apache.org Cc: André Widhani Betreff: Re: AW: java.lang.OutOfMemoryError: Map failed Hmmm I checked it and it seems to be ok: root@solr01-dcg:~# ulimit -v unlimited Any other tips or do you need more debug info? BR On 04/02/2013 11:15 AM, André Widhani wrote: Hi Arkadi, this error usually indicates that virtual memory is not sufficient (should be unlimited). Please see http://comments.gmane.org/gmane.comp.jakarta.lucene.solr.user/69168 Regards, André Von: Arkadi Colson [ark...@smartbit.be] Gesendet: Dienstag, 2. April 2013 10:24 An: solr-user@lucene.apache.org Betreff: java.lang.OutOfMemoryError: Map failed Hi Recently solr crashed. I've found this in the error log. My commit settings are loking like this: autoCommit maxTime1/maxTime openSearcherfalse/openSearcher /autoCommit autoSoftCommit maxTime2000/maxTime /autoSoftCommit The machine has 10GB of memory. Tomcat is running with -Xms2048m -Xmx6144m Versions Solr: 4.2 Tomcat: 7.0.33 Java: 1.7 Anybody any idea? Thx! Arkadi SEVERE: auto commit error...:org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1415) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1527) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:562) at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:178) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:292) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: java.io.IOException: Map failed at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:849) at org.apache.lucene.store.MMapDirectory.map(MMapDirectory.java:283) at
Re: Out of memory on some faceting queries
On Tue, 2013-04-02 at 12:16 +0200, Dotan Cohen wrote: 8971763 documents, growing at a rate of about 500 per minute. We actually expect that to be ~5 per minute once we get out of testing. 9M documents in a heavily updated index with faceting. Maybe you are committing faster than the faceting can be prepared? https://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarmingSearchers.3DX.22_mean.3F Regards, Toke Eskildsen
Collection name via Collections API (Solr 4.x)
Hello, I'm using Solr collections API to create a collection. http://127.0.0.1:8983/solr/admin/collections?action=CREATEname=test2numShards=1replicationFactor=2collection.configName=default I'm expecting new collection to be named test2 what I get instead is test2_shard1_replica2. I don't want to tie my index name to any curent settings. Is there any way to set collection name precisely? Thank you, Lukasz -- View this message in context: http://lucene.472066.n3.nabble.com/Collection-name-via-Collections-API-Solr-4-x-tp4053155.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Collection name via Collections API (Solr 4.x)
Collection API is a wrapper for the CORE API, If you don't want that the API defines the name for you, then use the CORE API, you can define the collection name and the shard id. curl 'http://localhost:8983/solr/admin/cores?action=CREATEname=corenamecollection=collection1shard=XX' -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Tuesday, April 2, 2013 at 1:01 PM, Lukasz Kujawa wrote: Hello, I'm using Solr collections API to create a collection. http://127.0.0.1:8983/solr/admin/collections?action=CREATEname=test2numShards=1replicationFactor=2collection.configName=default I'm expecting new collection to be named test2 what I get instead is test2_shard1_replica2. I don't want to tie my index name to any curent settings. Is there any way to set collection name precisely? Thank you, Lukasz -- View this message in context: http://lucene.472066.n3.nabble.com/Collection-name-via-Collections-API-Solr-4-x-tp4053155.html Sent from the Solr - User mailing list archive at Nabble.com (http://Nabble.com).
Re: Collection name via Collections API (Solr 4.x)
Also, I am assuming that the collection name in this case should be 'test2'. The replica names would be on the lines of what you've mentioned. Is that not the case? On Tue, Apr 2, 2013 at 5:31 PM, Lukasz Kujawa luk...@php.net wrote: Hello, I'm using Solr collections API to create a collection. http://127.0.0.1:8983/solr/admin/collections?action=CREATEname=test2numShards=1replicationFactor=2collection.configName=default I'm expecting new collection to be named test2 what I get instead is test2_shard1_replica2. I don't want to tie my index name to any curent settings. Is there any way to set collection name precisely? Thank you, Lukasz -- View this message in context: http://lucene.472066.n3.nabble.com/Collection-name-via-Collections-API-Solr-4-x-tp4053155.html Sent from the Solr - User mailing list archive at Nabble.com. -- Anshum Gupta http://www.anshumgupta.net
Query using function query result
Hi i want to query documents which match a certain dynamic criteria. like, How do i get all documents, where sub(field1,field2) 0 ? i tried _val_: sub(field1,field2) and used fq:[_val_:[0 TO *] But it doesnt work. ./Zahoor
Re: Collection name via Collections API (Solr 4.x)
In this link you can see what is what http://wiki.apache.org/solr/SolrCloud#Glossary The collection represents a single index, the solrCores AKA core, encapsulates a single physical index, One or more make up a logical shard which make up a collection. You can have a collection with the same name of the SolrCore if you want. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Tuesday, April 2, 2013 at 1:53 PM, Anshum Gupta wrote: Also, I am assuming that the collection name in this case should be 'test2'. The replica names would be on the lines of what you've mentioned. Is that not the case? On Tue, Apr 2, 2013 at 5:31 PM, Lukasz Kujawa luk...@php.net (mailto:luk...@php.net) wrote: Hello, I'm using Solr collections API to create a collection. http://127.0.0.1:8983/solr/admin/collections?action=CREATEname=test2numShards=1replicationFactor=2collection.configName=default I'm expecting new collection to be named test2 what I get instead is test2_shard1_replica2. I don't want to tie my index name to any curent settings. Is there any way to set collection name precisely? Thank you, Lukasz -- View this message in context: http://lucene.472066.n3.nabble.com/Collection-name-via-Collections-API-Solr-4-x-tp4053155.html Sent from the Solr - User mailing list archive at Nabble.com (http://Nabble.com). -- Anshum Gupta http://www.anshumgupta.net
Re: Top 10 Terms in Index (by date)
Oh, I see, essentially you want to get the sum of the term frequencies for every term in a subset of documents (instead of the document frequency as the FacetComponent would give you). I don't know of an easy/out of the box solution for this. I know the TermVectorComponent will give you the tf for every term in a document, but I'm not sure if you can filter or sort on it. Maybe you can do something like: https://issues.apache.org/jira/browse/LUCENE-2393 or what's suggested here: http://search-lucene.com/m/of5Fn1PUOHU/ but I have never used something like that. Tomás On Mon, Apr 1, 2013 at 9:58 PM, Andy Pickler andy.pick...@gmail.com wrote: I need total number of occurrences across all documents for each term. Imagine this... Post #1: I think, therefore I am like you Reply #1: You think too much Reply #2 I think that I think much as you Each of those documents are put into 'content'. Pretending I don't have stop words, the top term query (not considering dateCreated in this example) would result in something like... think: 4 I: 4 you: 3 much: 2 ... Thus, just a number of documents approach doesn't work, because if a word occurs more than one time in a document it needs to be counted that many times. That seemed to rule out faceting like you mentioned as well as the TermsComponent (which as I understand also only counts documents). Thanks, Andy Pickler On Mon, Apr 1, 2013 at 4:31 PM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: So you have one document per user comment? Why not use faceting plus filtering on the dateCreated field? That would count number of documents for each term (so, in your case, if a term is used twice in one comment it would only count once). Is that what you are looking for? Tomás On Mon, Apr 1, 2013 at 6:32 PM, Andy Pickler andy.pick...@gmail.com wrote: Our company has an application that is Facebook-like for usage by enterprise customers. We'd like to do a report of top 10 terms entered by users over (some time period). With that in mind I'm using the DataImportHandler to put all the relevant data from our database into a Solr 'content' field: field name=content type=text_general indexed=true stored=false multiValued=false required=true termVectors=true/ Along with the content is the 'dateCreated' for that content: field name=dateCreated type=tdate indexed=true stored=false multiValued=false required=true/ I'm struggling with the TermVectorComponent documentation to understand how I can put together a query that answers the 'report' mentioned above. For each document I need each term counted however many times it is entered (content of I think what I think would report 'think' as used twice). Does anyone have any insight as to whether I'm headed in the right direction and then what my query would be? Thanks, Andy Pickler
Re: Out of memory on some faceting queries
On Tue, Apr 2, 2013 at 2:41 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote: 9M documents in a heavily updated index with faceting. Maybe you are committing faster than the faceting can be prepared? https://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarmingSearchers.3DX.22_mean.3F Thank you Toke, this is exactly on my list of things to learn about Solr. We do get the error mentioned and we cannot reduce the amount of commits. Also, I do believe that we have the necessary server resources (16 GiB RAM). I have increased maxWarmingSearchers to 4, let's see how this goes. Thank you. -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Slaves always replicate entire index Index versions
I moved solr 4.1 to solr 4.2 on one of slave server earlier my index directory has index.timestamp, but now, it has only index folder no timestamp. Is this is bug.?? Though size of index is same as on master . It shows replication running on dasboard with both master and slave version. what happened to timestamp in index directory index.timestamp -- earlier with 4.1 index -- this is new folder Please reply asap. thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Slaves-always-replicate-entire-index-Index-versions-tp4041256p4053179.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Top 10 Terms in Index (by date)
A key problem with those approaches as well as Lucene's HighFreqTerms class ( http://lucene.apache.org/core/4_2_0/misc/org/apache/lucene/misc/HighFreqTerms.html) is that none of them seem to have the ability to combine with a date range query...which is key in my scenario. I'm kinda thinking that what I'm asking to do just isn't supported by Lucene or Solr, and that I'll have to pursue another avenue. If anyone has any other suggestions, I'm all ears. I'm starting to wonder if I need to have some nightly batch job that executes against my database and builds up that day's top terms in a table or something. Thanks, Andy Pickler On Tue, Apr 2, 2013 at 7:16 AM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: Oh, I see, essentially you want to get the sum of the term frequencies for every term in a subset of documents (instead of the document frequency as the FacetComponent would give you). I don't know of an easy/out of the box solution for this. I know the TermVectorComponent will give you the tf for every term in a document, but I'm not sure if you can filter or sort on it. Maybe you can do something like: https://issues.apache.org/jira/browse/LUCENE-2393 or what's suggested here: http://search-lucene.com/m/of5Fn1PUOHU/ but I have never used something like that. Tomás On Mon, Apr 1, 2013 at 9:58 PM, Andy Pickler andy.pick...@gmail.com wrote: I need total number of occurrences across all documents for each term. Imagine this... Post #1: I think, therefore I am like you Reply #1: You think too much Reply #2 I think that I think much as you Each of those documents are put into 'content'. Pretending I don't have stop words, the top term query (not considering dateCreated in this example) would result in something like... think: 4 I: 4 you: 3 much: 2 ... Thus, just a number of documents approach doesn't work, because if a word occurs more than one time in a document it needs to be counted that many times. That seemed to rule out faceting like you mentioned as well as the TermsComponent (which as I understand also only counts documents). Thanks, Andy Pickler On Mon, Apr 1, 2013 at 4:31 PM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: So you have one document per user comment? Why not use faceting plus filtering on the dateCreated field? That would count number of documents for each term (so, in your case, if a term is used twice in one comment it would only count once). Is that what you are looking for? Tomás On Mon, Apr 1, 2013 at 6:32 PM, Andy Pickler andy.pick...@gmail.com wrote: Our company has an application that is Facebook-like for usage by enterprise customers. We'd like to do a report of top 10 terms entered by users over (some time period). With that in mind I'm using the DataImportHandler to put all the relevant data from our database into a Solr 'content' field: field name=content type=text_general indexed=true stored=false multiValued=false required=true termVectors=true/ Along with the content is the 'dateCreated' for that content: field name=dateCreated type=tdate indexed=true stored=false multiValued=false required=true/ I'm struggling with the TermVectorComponent documentation to understand how I can put together a query that answers the 'report' mentioned above. For each document I need each term counted however many times it is entered (content of I think what I think would report 'think' as used twice). Does anyone have any insight as to whether I'm headed in the right direction and then what my query would be? Thanks, Andy Pickler
performance on concurrent search request
In this thread about performance on concurrent search requests, Otis said: http://lucene.472066.n3.nabble.com/how-to-improve-concurrent-request-performance-and-stress-testing-td496411.html /Imagine this type of code: synchronized (someGlobalObject) { // search } What happens when 100 threads his this spot? The first one to get there gets in and runs the search and 99 of them wait. What happens if that // search also involves expensive operations, lots of IO, warming up, cache population, etc? Those 99 threads will have to wait a while :) That's why it is recommended to warm up the searcher ahead of time before exposing it to real requests. However, even if you warm things up, that sync block will remain there, and at some point this will become a bottleneck. What that point is depends on the hardware, index size, query complexity and rat, even JVM. Otis / I'm wondering if this synchronized is still an issue in Solr 4.x? Is it because how Solr deals with the index searcher or is it because how it is implemented in Lucene? -- View this message in context: http://lucene.472066.n3.nabble.com/performance-on-concurrent-search-request-tp4053182.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Flow Chart of Solr
(13/04/02 21:45), Furkan KAMACI wrote: Is there any documentation something like flow chart of Solr. i.e. Documents comes into Solr(maybe indicating which classes get documents) and goes to parsing process (i.e. stemming processes etc.) and then reverse indexes are get so on so forth? There is an interesting ticket: Architecture Diagrams needed for Lucene, Solr and Nutch https://issues.apache.org/jira/browse/LUCENE-2412 koji -- http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html
Re: Out of memory on some faceting queries
On Tue, 2013-04-02 at 15:55 +0200, Dotan Cohen wrote: [Tokd: maxWarmingSearchers limit exceeded?] Thank you Toke, this is exactly on my list of things to learn about Solr. We do get the error mentioned and we cannot reduce the amount of commits. Also, I do believe that we have the necessary server resources (16 GiB RAM). Memory does not help you if you commit too frequently. If you commit each X seconds and warming takes X+Y seconds, then you will run out of memory at some point. I have increased maxWarmingSearchers to 4, let's see how this goes. If you still get the error with 4 concurrent searchers, you will have to either speed up warmup time or commit less frequently. You should be able to reduce facet startup time by switching to segment based faceting (at the cost of worse search-time performance) or maybe by using DocValues. Some of the current threads on the solr-user list is about these topics. How often do you commit and how many unique values does your facet fields have? Regards, Toke Eskildsen
Re: Out of memory on some faceting queries
On Tue, Apr 2, 2013 at 5:33 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote: On Tue, 2013-04-02 at 15:55 +0200, Dotan Cohen wrote: [Tokd: maxWarmingSearchers limit exceeded?] Thank you Toke, this is exactly on my list of things to learn about Solr. We do get the error mentioned and we cannot reduce the amount of commits. Also, I do believe that we have the necessary server resources (16 GiB RAM). Memory does not help you if you commit too frequently. If you commit each X seconds and warming takes X+Y seconds, then you will run out of memory at some point. I have increased maxWarmingSearchers to 4, let's see how this goes. If you still get the error with 4 concurrent searchers, you will have to either speed up warmup time or commit less frequently. You should be able to reduce facet startup time by switching to segment based faceting (at the cost of worse search-time performance) or maybe by using DocValues. Some of the current threads on the solr-user list is about these topics. How often do you commit and how many unique values does your facet fields have? Regards, Toke Eskildsen -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Flow Chart of Solr
On 04/02/2013 04:20 PM, Koji Sekiguchi wrote: (13/04/02 21:45), Furkan KAMACI wrote: Is there any documentation something like flow chart of Solr. i.e. Documents comes into Solr(maybe indicating which classes get documents) and goes to parsing process (i.e. stemming processes etc.) and then reverse indexes are get so on so forth? There is an interesting ticket: Architecture Diagrams needed for Lucene, Solr and Nutch https://issues.apache.org/jira/browse/LUCENE-2412 koji I like this one, it is a bit more detailed : http://www.cominvent.com/2011/04/04/solr-architecture-diagram/ -- André Bois-Crettez Search technology, Kelkoo http://www.kelkoo.com/ Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
Re: Flow Chart of Solr
Actually maybe one the most important core thing is that Analysis part at last diagram but there is nothing about it i.e. stamming, lemmitazing etc. at any of them. 2013/4/2 Andre Bois-Crettez andre.b...@kelkoo.com On 04/02/2013 04:20 PM, Koji Sekiguchi wrote: (13/04/02 21:45), Furkan KAMACI wrote: Is there any documentation something like flow chart of Solr. i.e. Documents comes into Solr(maybe indicating which classes get documents) and goes to parsing process (i.e. stemming processes etc.) and then reverse indexes are get so on so forth? There is an interesting ticket: Architecture Diagrams needed for Lucene, Solr and Nutch https://issues.apache.org/**jira/browse/LUCENE-2412https://issues.apache.org/jira/browse/LUCENE-2412 koji I like this one, it is a bit more detailed : http://www.cominvent.com/2011/**04/04/solr-architecture-**diagram/http://www.cominvent.com/2011/04/04/solr-architecture-diagram/ -- André Bois-Crettez Search technology, Kelkoo http://www.kelkoo.com/ Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
Re: Slaves always replicate entire index Index versions
The index folder is indeed gone but it seems to work. Maybe just a structural change... Met vriendelijke groeten Arkadi Colson Smartbit bvba • Hoogstraat 13 • 3670 Meeuwen T +32 11 64 08 80 • F +32 11 64 08 81 On 04/02/2013 04:08 PM, yayati wrote: I moved solr 4.1 to solr 4.2 on one of slave server earlier my index directory has index.timestamp, but now, it has only index folder no timestamp. Is this is bug.?? Though size of index is same as on master . It shows replication running on dasboard with both master and slave version. what happened to timestamp in index directory index.timestamp -- earlier with 4.1 index -- this is new folder Please reply asap. thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Slaves-always-replicate-entire-index-Index-versions-tp4041256p4053179.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Out of memory on some faceting queries
On Tue, Apr 2, 2013 at 5:33 PM, Toke Eskildsen t...@statsbiblioteket.dk wrote: Memory does not help you if you commit too frequently. If you commit each X seconds and warming takes X+Y seconds, then you will run out of memory at some point. How might I time the warming? I've been googling warming since your earlier message but there does not seem to be any really good documentation on the subject. If there is anything that you feel I should be reading I would appreciate a link or a keyword to search on. I've read the Solr wiki on caching and performance, but other than that I don't see the issue addressed. I have increased maxWarmingSearchers to 4, let's see how this goes. If you still get the error with 4 concurrent searchers, you will have to either speed up warmup time or commit less frequently. You should be able to reduce facet startup time by switching to segment based faceting (at the cost of worse search-time performance) or maybe by using DocValues. Some of the current threads on the solr-user list is about these topics. How often do you commit and how many unique values does your facet fields have? Batches of 20-50 results are added to solr a few times a minute, and a commit is done after each batch since I'm calling Solr as such: http://127.0.0.1:8983/solr/core/update/json?commit=true Should I remove commit=true and run a cron job to commit once per minute? -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Out of memory on some faceting queries
How often do you commit and how many unique values does your facet fields have? Most of the time I facet on one field that has about twenty unique values. However, once per day I would like to facet on the text field, which is a free-text field usually around 1 KiB (about 100 words), in order to determine what the top keywords / topics are. That query would take up to 200 seconds to run, but it does not have to return the results in real-time (the output goes to another process, not to a waiting user). -- Dotan Cohen http://gibberish.co.il http://what-is-what.com
Re: Flow Chart of Solr
For beginners is complicate understand the complexity of solr / lucene, I'm trying devel a custom search component and it's too hard keep in mind the flow, inheritance and iteration between classes. I think that there is a gap between software doc and user doc, or maybe I don't search enough T_T. Java doc not always is clear always. The fact that I'm beginner in solr world don't help. Either way, this thread was very helpful, I found some very good resources here :) Cumprimentos -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Tuesday, April 2, 2013 at 3:51 PM, Furkan KAMACI wrote: Actually maybe one the most important core thing is that Analysis part at last diagram but there is nothing about it i.e. stamming, lemmitazing etc. at any of them. 2013/4/2 Andre Bois-Crettez andre.b...@kelkoo.com (mailto:andre.b...@kelkoo.com) On 04/02/2013 04:20 PM, Koji Sekiguchi wrote: (13/04/02 21:45), Furkan KAMACI wrote: Is there any documentation something like flow chart of Solr. i.e. Documents comes into Solr(maybe indicating which classes get documents) and goes to parsing process (i.e. stemming processes etc.) and then reverse indexes are get so on so forth? There is an interesting ticket: Architecture Diagrams needed for Lucene, Solr and Nutch https://issues.apache.org/**jira/browse/LUCENE-2412https://issues.apache.org/jira/browse/LUCENE-2412 koji I like this one, it is a bit more detailed : http://www.cominvent.com/2011/**04/04/solr-architecture-**diagram/http://www.cominvent.com/2011/04/04/solr-architecture-diagram/ -- André Bois-Crettez Search technology, Kelkoo http://www.kelkoo.com/ Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
Re: Flow Chart of Solr
You are right about mentioning developer doc and user doc. Users separate about it. Some of them uses Solr for indexing and monitoring via admin face and that is quietly enough for them however some people wants to modify it so it would be nice if there had been some documentation for developer side too. 2013/4/2 Yago Riveiro yago.rive...@gmail.com For beginners is complicate understand the complexity of solr / lucene, I'm trying devel a custom search component and it's too hard keep in mind the flow, inheritance and iteration between classes. I think that there is a gap between software doc and user doc, or maybe I don't search enough T_T. Java doc not always is clear always. The fact that I'm beginner in solr world don't help. Either way, this thread was very helpful, I found some very good resources here :) Cumprimentos -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Tuesday, April 2, 2013 at 3:51 PM, Furkan KAMACI wrote: Actually maybe one the most important core thing is that Analysis part at last diagram but there is nothing about it i.e. stamming, lemmitazing etc. at any of them. 2013/4/2 Andre Bois-Crettez andre.b...@kelkoo.com (mailto: andre.b...@kelkoo.com) On 04/02/2013 04:20 PM, Koji Sekiguchi wrote: (13/04/02 21:45), Furkan KAMACI wrote: Is there any documentation something like flow chart of Solr. i.e. Documents comes into Solr(maybe indicating which classes get documents) and goes to parsing process (i.e. stemming processes etc.) and then reverse indexes are get so on so forth? There is an interesting ticket: Architecture Diagrams needed for Lucene, Solr and Nutch https://issues.apache.org/**jira/browse/LUCENE-2412 https://issues.apache.org/jira/browse/LUCENE-2412 koji I like this one, it is a bit more detailed : http://www.cominvent.com/2011/**04/04/solr-architecture-**diagram/ http://www.cominvent.com/2011/04/04/solr-architecture-diagram/ -- André Bois-Crettez Search technology, Kelkoo http://www.kelkoo.com/ Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
Re: [ANNOUNCE] Solr wiki editing change
Please add RyanErnst to the contributors group. Thanks! On Mon, Apr 1, 2013 at 7:04 PM, Steve Rowe sar...@gmail.com wrote: On Apr 1, 2013, at 9:40 PM, Vaillancourt, Tim tvaillanco...@ea.com wrote: I would also like to contribute to SolrCloud's wiki where possible. Please add myself (TimVaillancourt) when you have a chance. Added to solr wiki ContributorsGroup.
Re: [ANNOUNCE] Solr wiki editing change
On Apr 2, 2013, at 11:23 AM, Ryan Ernst r...@iernst.net wrote: Please add RyanErnst to the contributors group. Thanks! Added to solr wiki ContributorsGroup.
Re: Out of memory on some faceting queries
On 04/02/2013 05:04 PM, Dotan Cohen wrote: How might I time the warming? I've been googling warming since your earlier message but there does not seem to be any really good documentation on the subject. If there is anything that you feel I should be reading I would appreciate a link or a keyword to search on. I've read the Solr wiki on caching and performance, but other than that I don't see the issue addressed. warmupTime is available on the admin page for each type of cache (in milliseconds) : http://solr-box:8983/solr/#/core1/plugins/cache Or if you are only interested in the total : http://solr-box:8983/solr/core1/admin/mbeans?stats=truekey=searcher Batches of 20-50 results are added to solr a few times a minute, and a commit is done after each batch since I'm calling Solr as such: http://127.0.0.1:8983/solr/core/update/json?commit=true Should I remove commit=true and run a cron job to commit once per minute? Even better, it sounds like a job for CommitWithin : http://wiki.apache.org/solr/CommitWithin André Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
Re: [ANNOUNCE] Solr wiki editing change
Hi; Please add FurkanKAMACI to the group. Thanks; Furkan KAMACI 2013/4/2 Steve Rowe sar...@gmail.com On Apr 2, 2013, at 11:23 AM, Ryan Ernst r...@iernst.net wrote: Please add RyanErnst to the contributors group. Thanks! Added to solr wiki ContributorsGroup.
Job: Apache solr (Recruiting)
We have openings for Middleware architects (Apache solr) *Locations:* Mountain View, California,New York City, NY, Houston, TEXAS Mail me your resumes to jess...@kudukisgroup.com. We can discuss more over the phone. Thanks, Jessica
Re: [ANNOUNCE] Solr wiki editing change
On Apr 2, 2013, at 11:28 AM, Furkan KAMACI furkankam...@gmail.com wrote: Please add FurkanKAMACI to the group. Added to solr wiki ContributorsGroup.
Solrj 4.2 - CloudSolrServer aliases are not loaded
Hello, I am using the new collection alias feature, and it seems CloudSolrServer class (solrj 4.2.0) does not allow to use it, either for update or select. When I'm requesting the CloudSolrServer with a collection alias name, I have the error: org.apache.solr.common.SolrException: Collection not found: aliasedCollection The collection alias cannot be found because, in CloudSolrServer#getCollectionList (line 319) method, the alias variable is always empty. When I'm requesting the CloudSolrServer, the connect method is called and it calls the ZkStateReader#createClusterStateWatchersAndUpdate method. In the ZkStateReader#createClusterStateWatchersAndUpdate method, the aliases are not loaded. line 295, the data from /clusterstate.json are loaded : ClusterState clusterState = ClusterState.load(zkClient, liveNodeSet); this.clusterState = clusterState; Should we have the same data loading from /aliases.json, in order to fill the aliases field ? line 299, a Watcher for aliases is created but does not seem used. As a workaround to avoid the error, I have to force the aliases loading at my application start and when the aliases are updated: CloudSolrServer solrServer = new CloudSolrServer(localhost:2181); solrServer.setDefaultCollection(aliasedCollection); solrServer.connect(); solrServer.getZkStateReader().updateAliases(); Is there a better way to use collection aliases with solrj ? Elodie Sannier Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
Re: Collection name via Collections API (Solr 4.x)
If I use admin API instead of collection API according to my understanding the new core will be only available on that server. If I will query different solr server I will get an error. If I use collections API and I query a server which physically doesn't hold the data I will still get results. Creating cores manually across all Solr servers doesn't feel like the right way to go. -- View this message in context: http://lucene.472066.n3.nabble.com/Collection-name-via-Collections-API-Solr-4-x-tp4053155p4053230.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr URL uses non-standard format with pound sign
The Solr URL in Solr 4.2 for my localhost installation looks like this: http://localhost:8883/solr/#/development_shard1_replica1 This URL when constructed dynamically in Ruby will not validate with the Ruby URI:HTTP class because of the # sign in the path. This is a non-standard URL as per RFC1738. Here is the error message: #URI::InvalidComponentError: bad component(expected absolute path component): /solr/#/development_shard1_replica1 Is there another way to access the Solr URL without using the # sign? Thanks, Dennis Haller
Re: Solr URL uses non-standard format with pound sign
: The Solr URL in Solr 4.2 for my localhost installation looks like this: : http://localhost:8883/solr/#/development_shard1_replica1 : : This URL when constructed dynamically in Ruby will not validate with the : Ruby URI:HTTP class because of the # sign in the path. This is a : non-standard URL as per RFC1738. 1) RFC 1738 is antiquated, Among other things, RFC 3986 is much relevant and clarifies that # is a fragment identifier 2) the URL you are refering to is a *UI* view, and the fragement (/development_shard1_replica1) is dealt with entirely by your web browser via javascript. 3) for dealing with solr's HTTP APIs programaticly the type of base url you want will either be http://localhost:8883/solr/; or http://localhost:8883/solr/development_shard1_replica1; depending on wether your client code is expecting a base url for the entire server (to query multiple SolrCores), or a base url for a single SolrCore. -Hoss
Re: Flow Chart of Solr
I think there is a gap in the support of one's path of learning Solr . I'll try to describe it based on my own experience. Hopefully, it is helpful. At First, there is a Solr is a blackbox stage, where the person may not know Java and is just using out of the box components. Wiki is reasonably helpful there and there are other resources (blogs, etc). At this point, Lucene is a black box within the black box and is something that is safely ignored. At the second stage, one hits the period where he/she understands what is going on in their basic scenario and is trying to get into more advanced case. This could be putting together a complex analyzer chain, trying to use Update Request Processors or optimizing slow/OOM imports or doing complex queries. Suddenly, they are pointed directly at Javadocs and have to figure out the way around Java-based instructions. A Java programmer can bridge that gap and get over the curve, but I suspect others get lost very quickly and get stuck even when they don't need to be good programmers. An example in my mind would be something like RegexReplaceProcessor. One has to climb up and down the inheritance chain of the Javadoc to figure out what can be done and what the parameters are. And the parameters syntax is Java regular expressions rather than something used in copyField, so they need to jump over and figure that out. So, it is fairly hard to envisage those pieces and how they can combine together. Similarly, some of the stuff is described in Jira requests, but also in a way that requires a programmer's mind-set to parse it out. I think a lot of people drop out at this stage and fall-back to 'black-box' view of Solr. Most of the questions I see on Stack Overflow are conceptual troubles at this stage. And then, those who get to the third stage, jump to the advanced level where one could just read the source code to figure out what is going on. I found www.grepcode.com to be useful (though it is quite slow now and is a bit behind for Solr). Somewhere around here, one also starts to realize the fuzzy relation between the Lucene and Solr code and becomes somewhat clearer what Solr's benefits actually are (as opposed to bare Lucene's). This also generates its own frustration and confusion of course, because suddenly one starts to wish for Lucene's features that Solr does not use (e.g. split/sync analyzer chains, some alternative facet implementation features, etc). And finally (at the end of the beginning), you become the contributor and become very familiar with subversion/ant/etc. Though, I suspect, the contributors become more specialized and actually understand less about other parts of the system (e.g. Is anyone still fully understanding DIH?). I am not blaming anyone with this story for the lack of support. I think Solr is - in many ways - better documented than many other open source projects. And the new manual being contributed to replace Wiki will (soon?) make this even better. And, of course, this mailing list is indescribably awesome. I am just trying to provide a fresh view of what I went through and where I see people getting stuck. I think a bit more effort in documenting that second stage would bring more people to the community. I am trying to do my share through Wiki updates, questions here, Jira issues, my upcoming book and some other little things. I see others do the same. Perhaps, the diagram is something that we should explicitly try to do. Though, I think it would be more fun to do it as a Scrollorama Inception Explained style ( http://www.inception-explained.com/). :-) Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Apr 2, 2013 at 11:22 AM, Furkan KAMACI furkankam...@gmail.comwrote: You are right about mentioning developer doc and user doc. Users separate about it. Some of them uses Solr for indexing and monitoring via admin face and that is quietly enough for them however some people wants to modify it so it would be nice if there had been some documentation for developer side too. 2013/4/2 Yago Riveiro yago.rive...@gmail.com For beginners is complicate understand the complexity of solr / lucene, I'm trying devel a custom search component and it's too hard keep in mind the flow, inheritance and iteration between classes. I think that there is a gap between software doc and user doc, or maybe I don't search enough T_T. Java doc not always is clear always. The fact that I'm beginner in solr world don't help. Either way, this thread was very helpful, I found some very good resources here :) Cumprimentos -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Tuesday, April 2, 2013 at 3:51 PM, Furkan KAMACI wrote: Actually maybe one the most
Re: Collection name via Collections API (Solr 4.x)
Solr 4.2 implements a feature to proxy requests if the core not exists in node requested. https://issues.apache.org/jira/browse/SOLR-4210 Actually exists a bug in this mechanism https://issues.apache.org/jira/browse/SOLR-4584 Without the proxy feature, creating the cores using manually or on automatic way, you only can query the collection in nodes that have least 1 replica of the collection. If you have a solrCluster with 4 nodes and the collection only have 2 shards without replicas, then you can only query the collection in 50% of the cluster. (assuming that proxy request mechanism doesn't work properly) When I said to create manually the collection, you need to create manually all shards that form the collection and the replicas in the others nodes of the cluster. It takes work, but if you want have some control you need to pay the price. If it is possible that you can manage the name of shard with the collection API, the documentation doesn't say how. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Tuesday, April 2, 2013 at 5:15 PM, Lukasz Kujawa wrote: l
Re: Collection name via Collections API (Solr 4.x)
Thank you for you answers Yriveiro. I'm trying to use Solr for a big SaaS platform. The reason why I want everything dynamic is each user will get own Solr collection. It looks like there are still many issues with the distributed computing. I hope 4.3 will arrive soon ;-) Anyway.. once again thank you for your time. -- View this message in context: http://lucene.472066.n3.nabble.com/Collection-name-via-Collections-API-Solr-4-x-tp4053155p4053245.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Flow Chart of Solr
Alexandre, You describe the normal path when a beginner try to use a source of code that doesn't understand, black-box, reading code, hacking, ok now I know 10% of the project, with lucky :p. First at all, the Solr community is fantastic and always helps when I need it. IMHO the devel documentation is dispersed in a lot of sources, blogs, wiki, lucidWorks wiki (I know that this wiki was donated to apache and it's in progress to present to the world as part of the project). The curve for do funny thing with Solr at source level is hard, I see a lot of webinars teaching how deploy and use solr, but not how developing a ResponseWriter or a SearchComponent. Unfortunately I don't have the knowledge to contribute right, in the future … will see. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Tuesday, April 2, 2013 at 5:24 PM, Alexandre Rafalovitch wrote: ommunity. I am trying to do my share throu
Re: Collection name via Collections API (Solr 4.x)
I use solr with a similar propose, I'm understand that you want have control that as the sharing is done :) Regards. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Tuesday, April 2, 2013 at 5:54 PM, Lukasz Kujawa wrote: Thank you for you answers Yriveiro. I'm trying to use Solr for a big SaaS platform. The reason why I want everything dynamic is each user will get own Solr collection. It looks like there are still many issues with the distributed computing. I hope 4.3 will arrive soon ;-) Anyway.. once again thank you for your time. -- View this message in context: http://lucene.472066.n3.nabble.com/Collection-name-via-Collections-API-Solr-4-x-tp4053155p4053245.html Sent from the Solr - User mailing list archive at Nabble.com (http://Nabble.com).
Re: Solrj 4.2 - CloudSolrServer aliases are not loaded
Answers inline: On Apr 2, 2013, at 11:45 AM, Elodie Sannier elodie.sann...@kelkoo.fr wrote: Hello, I am using the new collection alias feature, and it seems CloudSolrServer class (solrj 4.2.0) does not allow to use it, either for update or select. When I'm requesting the CloudSolrServer with a collection alias name, I have the error: org.apache.solr.common.SolrException: Collection not found: aliasedCollection The collection alias cannot be found because, in CloudSolrServer#getCollectionList (line 319) method, the alias variable is always empty. When I'm requesting the CloudSolrServer, the connect method is called and it calls the ZkStateReader#createClusterStateWatchersAndUpdate method. In the ZkStateReader#createClusterStateWatchersAndUpdate method, the aliases are not loaded. line 295, the data from /clusterstate.json are loaded : ClusterState clusterState = ClusterState.load(zkClient, liveNodeSet); this.clusterState = clusterState; Should we have the same data loading from /aliases.json, in order to fill the aliases field ? line 299, a Watcher for aliases is created but does not seem used. The Watcher is used. It updates the Aliases if they changed - there is some lag time though. There is some work that tries to avoid the lag in the update being a problem, but I'm guessing somehow it's not covering your case. It wouldn't hurt to add the updateAliases call automatically on ZkStateReader init. If the watcher was indeed not being used, that would not solve things though - the client still needs to be able to detect alias additions and changes. Your best bet is to file a JIRA issue so we can work on a test that mimics what you are seeing. - Mark As a workaround to avoid the error, I have to force the aliases loading at my application start and when the aliases are updated: CloudSolrServer solrServer = new CloudSolrServer(localhost:2181); solrServer.setDefaultCollection(aliasedCollection); solrServer.connect(); solrServer.getZkStateReader().updateAliases(); Is there a better way to use collection aliases with solrj ? Elodie Sannier Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
A request handler that manipulated the index
I am thinking about trying to structure a problem as a Solr plugin. The nature of the plugin is that it would need to read and write the lucene index to do its work. It could not be cleanly split into URP 'over here' and a Search Component 'over there'. Are there invariants of Solr that would preclude this, like assumptions in the implementation of the cache?
Re: Solrj 4.2 - CloudSolrServer aliases are not loaded
I think the current tests probably build the cloudsolrserver before creating the aliases - sounds like we need to do some creating the cloudsolrserver after. - Mark On Apr 2, 2013, at 1:31 PM, Mark Miller markrmil...@gmail.com wrote: Answers inline: On Apr 2, 2013, at 11:45 AM, Elodie Sannier elodie.sann...@kelkoo.fr wrote: Hello, I am using the new collection alias feature, and it seems CloudSolrServer class (solrj 4.2.0) does not allow to use it, either for update or select. When I'm requesting the CloudSolrServer with a collection alias name, I have the error: org.apache.solr.common.SolrException: Collection not found: aliasedCollection The collection alias cannot be found because, in CloudSolrServer#getCollectionList (line 319) method, the alias variable is always empty. When I'm requesting the CloudSolrServer, the connect method is called and it calls the ZkStateReader#createClusterStateWatchersAndUpdate method. In the ZkStateReader#createClusterStateWatchersAndUpdate method, the aliases are not loaded. line 295, the data from /clusterstate.json are loaded : ClusterState clusterState = ClusterState.load(zkClient, liveNodeSet); this.clusterState = clusterState; Should we have the same data loading from /aliases.json, in order to fill the aliases field ? line 299, a Watcher for aliases is created but does not seem used. The Watcher is used. It updates the Aliases if they changed - there is some lag time though. There is some work that tries to avoid the lag in the update being a problem, but I'm guessing somehow it's not covering your case. It wouldn't hurt to add the updateAliases call automatically on ZkStateReader init. If the watcher was indeed not being used, that would not solve things though - the client still needs to be able to detect alias additions and changes. Your best bet is to file a JIRA issue so we can work on a test that mimics what you are seeing. - Mark As a workaround to avoid the error, I have to force the aliases loading at my application start and when the aliases are updated: CloudSolrServer solrServer = new CloudSolrServer(localhost:2181); solrServer.setDefaultCollection(aliasedCollection); solrServer.connect(); solrServer.getZkStateReader().updateAliases(); Is there a better way to use collection aliases with solrj ? Elodie Sannier Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
Re: Flow Chart of Solr
Yago, My point - perhaps lost in too much text - was that Solr is presented - and can function - as a black-box. Which makes it different from more traditional open-source project. So, the stage-2 happens exactly when the non-programmers have to cross the boundary from the black-box into code-first approach and the hand-off is not particularly smooth. Or even when - say - php or .Net programmer tries to get beyond the basic operations their client library and has the understand the server-side aspects of Solr. Regards, Alex. On Tue, Apr 2, 2013 at 1:19 PM, Yago Riveiro yago.rive...@gmail.com wrote: Alexandre, You describe the normal path when a beginner try to use a source of code that doesn't understand, black-box, reading code, hacking, ok now I know 10% of the project, with lucky :p. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
WADL for REST service?
Hallo, does a WADL exists for the REST service of SOLR? Ciao Peter Schütt
Re: Solrj 4.2 - CloudSolrServer aliases are not loaded
I've created https://issues.apache.org/jira/browse/SOLR-4664 - Mark On Apr 2, 2013, at 2:07 PM, Mark Miller markrmil...@gmail.com wrote: I think the current tests probably build the cloudsolrserver before creating the aliases - sounds like we need to do some creating the cloudsolrserver after. - Mark On Apr 2, 2013, at 1:31 PM, Mark Miller markrmil...@gmail.com wrote: Answers inline: On Apr 2, 2013, at 11:45 AM, Elodie Sannier elodie.sann...@kelkoo.fr wrote: Hello, I am using the new collection alias feature, and it seems CloudSolrServer class (solrj 4.2.0) does not allow to use it, either for update or select. When I'm requesting the CloudSolrServer with a collection alias name, I have the error: org.apache.solr.common.SolrException: Collection not found: aliasedCollection The collection alias cannot be found because, in CloudSolrServer#getCollectionList (line 319) method, the alias variable is always empty. When I'm requesting the CloudSolrServer, the connect method is called and it calls the ZkStateReader#createClusterStateWatchersAndUpdate method. In the ZkStateReader#createClusterStateWatchersAndUpdate method, the aliases are not loaded. line 295, the data from /clusterstate.json are loaded : ClusterState clusterState = ClusterState.load(zkClient, liveNodeSet); this.clusterState = clusterState; Should we have the same data loading from /aliases.json, in order to fill the aliases field ? line 299, a Watcher for aliases is created but does not seem used. The Watcher is used. It updates the Aliases if they changed - there is some lag time though. There is some work that tries to avoid the lag in the update being a problem, but I'm guessing somehow it's not covering your case. It wouldn't hurt to add the updateAliases call automatically on ZkStateReader init. If the watcher was indeed not being used, that would not solve things though - the client still needs to be able to detect alias additions and changes. Your best bet is to file a JIRA issue so we can work on a test that mimics what you are seeing. - Mark As a workaround to avoid the error, I have to force the aliases loading at my application start and when the aliases are updated: CloudSolrServer solrServer = new CloudSolrServer(localhost:2181); solrServer.setDefaultCollection(aliasedCollection); solrServer.connect(); solrServer.getZkStateReader().updateAliases(); Is there a better way to use collection aliases with solrj ? Elodie Sannier Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
RE: Confusion over Solr highlight hl.q parameter
Thanks Koji, this helped with some of our problems, but it is still not perfect. This query, for example, returns no highlighting: ?q=id:abc123hl.q=text_it_IT:l'assiemehl.fl=text_it_IThl=truedefType=edismax But this one does (when it is, in effect, the same query): ?q=text_it_IT:l'assiemehl=truedefType=edismaxhl.fl=text_it_IT I've tried many combinations but can't seem to get the right one to work. Is this possibly a bug? -Original Message- From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: Saturday, March 16, 2013 6:14 PM To: solr-user@lucene.apache.org Subject: Re: Confusion over Solr highlight hl.q parameter (13/03/16 4:08), Van Tassell, Kristian wrote: Hello everyone, If I search for a term “baz” and tell it to highlight it, it highlights just fine. If, however, I search for “foo bar” using the q parameter, which appears in that same document/same field, and use the hl.q parameter to search and highlight “baz”, I get no highlighting results for “baz”. ?q=パーツにおける機能強化 qf=text_ja_JP defType=edismax hl=true hl.simple.pre=em hl.simple.post=/em hl.fl=text_ja_JP The above highlights query term just fine. ?q=1234 hl.q=パーツにおける機能強化 qf=id defType=edismax hl=true hl.simple.pre=em hl.simple.post=/em hl.fl=text_ja_JP This one returns zero highlighting hits. I'm just guessing, Solr highlighter tries to highlight パーツにおける機能強化 in your default search field? Can you try hl.q=text_ja_JP:パーツにおける機能強化 . koji -- http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html
Re: Flow Chart of Solr
I think about myself as an example. I have started to make research about Solr just for some weeks. I have learned Solr and its related projects. My next step writing down the main steps Solr. We have separated learning curve of Solr into two main categories. First one is who are using it as out of the box components. Second one is developer side. Actually developer side branches into two way. First one is general steps of it. i.e. document comes into Solr (i.e. crawled data of Nutch). which analyzing processes are going to done (stamming, hamming etc.), what will be doing after parsing step by step. When a search query happens what happens step by step, at which step scores are calculated so on so forth. Second one is more code specific i.e. which handlers takes into account data that will going to be indexed(no need the explain every handler at this step) . Which are the analyzer, tokenizer classes and what are the flow between them. How response handlers works and what are they. Also explaining about cloud side is other work. Some of explanations are currently presents at wiki (but some of them are at very deep places at wiki and it is not easy to find the parent topic of it, maybe starting wiki from a top age and branching all other topics as possible as from it could be better) If we could show the big picture, and beside of it the smaller pictures within it, it would be great (if you know the main parts it will be easy to go deep into the code i.e. you don't need to explain every handler, if you show the way to the developer he/she could debug and find the needs) When I think about myself as an example, I have to write down the steps of Solr a bit detail even I read many pages at wiki and a book about it, I see that it is not easy even writing down the big picture of developer side. 2013/4/2 Alexandre Rafalovitch arafa...@gmail.com Yago, My point - perhaps lost in too much text - was that Solr is presented - and can function - as a black-box. Which makes it different from more traditional open-source project. So, the stage-2 happens exactly when the non-programmers have to cross the boundary from the black-box into code-first approach and the hand-off is not particularly smooth. Or even when - say - php or .Net programmer tries to get beyond the basic operations their client library and has the understand the server-side aspects of Solr. Regards, Alex. On Tue, Apr 2, 2013 at 1:19 PM, Yago Riveiro yago.rive...@gmail.com wrote: Alexandre, You describe the normal path when a beginner try to use a source of code that doesn't understand, black-box, reading code, hacking, ok now I know 10% of the project, with lucky :p. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book)
Solr 4.2 Cloud Replication Replica has higher version than Master?
I am currently looking at moving our Solr cluster to 4.2 and noticed a strange issue while testing today. Specifically the replica has a higher version than the master which is causing the index to not replicate. Because of this the replica has fewer documents than the master. What could cause this and how can I resolve it short of taking down the index and scping the right version in? MASTER: Last Modified:about an hour ago Num Docs:164880 Max Doc:164880 Deleted Docs:0 Version:2387 Segment Count:23 REPLICA: Last Modified: about an hour ago Num Docs:164773 Max Doc:164773 Deleted Docs:0 Version:3001 Segment Count:30 in the replicas log it says this: INFO: Creating new http client, config:maxConnectionsPerHost=20maxConnections=1connTimeout=3socketTimeout=3retry=false Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solrSTART replicas=[ http://10.38.33.16:7575/solr/dsc-shard5-core1/] nUpdates=100 Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solr Received 100 versions from 10.38.33.16:7575/solr/dsc-shard5-core1/ Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solr Our versions are newer. ourLowThreshold=1431233788792274944 otherHigh=1431233789440294912 Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solrDONE. sync succeeded which again seems to point that it thinks it has a newer version of the index so it aborts. This happened while having 10 threads indexing 10,000 items writing to a 6 shard (1 replica each) cluster. Any thoughts on this or what I should look for would be appreciated.
Re: Solr 4.2 Cloud Replication Replica has higher version than Master?
I don't think the versions you are thinking of apply here. Peersync does not look at that - it looks at version numbers for updates in the transaction log - it compares the last 100 of them on leader and replica. What it's saying is that the replica seems to have versions that the leader does not. Have you scanned the logs for any interesting exceptions? Did the leader change during the heavy indexing? Did any zk session timeouts occur? - Mark On Apr 2, 2013, at 4:52 PM, Jamie Johnson jej2...@gmail.com wrote: I am currently looking at moving our Solr cluster to 4.2 and noticed a strange issue while testing today. Specifically the replica has a higher version than the master which is causing the index to not replicate. Because of this the replica has fewer documents than the master. What could cause this and how can I resolve it short of taking down the index and scping the right version in? MASTER: Last Modified:about an hour ago Num Docs:164880 Max Doc:164880 Deleted Docs:0 Version:2387 Segment Count:23 REPLICA: Last Modified: about an hour ago Num Docs:164773 Max Doc:164773 Deleted Docs:0 Version:3001 Segment Count:30 in the replicas log it says this: INFO: Creating new http client, config:maxConnectionsPerHost=20maxConnections=1connTimeout=3socketTimeout=3retry=false Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solrSTART replicas=[ http://10.38.33.16:7575/solr/dsc-shard5-core1/] nUpdates=100 Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solr Received 100 versions from 10.38.33.16:7575/solr/dsc-shard5-core1/ Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solr Our versions are newer. ourLowThreshold=1431233788792274944 otherHigh=1431233789440294912 Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solrDONE. sync succeeded which again seems to point that it thinks it has a newer version of the index so it aborts. This happened while having 10 threads indexing 10,000 items writing to a 6 shard (1 replica each) cluster. Any thoughts on this or what I should look for would be appreciated.
Re: Add fuzzy to edismax specs?
Note that the pf field already parses this syntax as of 4.0, but then it is used as a phrase-slop value. You could probably use same parsing code for qf. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 29. mars 2013 kl. 18:33 skrev Walter Underwood wun...@wunderwood.org: I've implemented this for the second time, so it is probably time to contribute it. I find it really useful. I've extended the query spec parser for edismax to also accept a tilde and to generate a FuzzyQuery. I used this at Netflix (on 1.3 with dismax), and re-implemented it for 3.3 here at Chegg. We've had it in production for nearly a year. I'll need to re-port this as part of our move to 4.x. Here is what the spec looks like. This expands to a fuzzy search on title with a similarity of 0.75, and so on. str name=qftitle~0.75^4 long_title^4 title_stem^2 author~0.75/str I'm not 100% sure I understand the spec parser in edismax, so I'd like some review when this is ready. I'd probably only do it for edismax. See: https://issues.apache.org/jira/browse/SOLR-629 wunder -- Walter Underwood wun...@wunderwood.org Search Guy, Chegg.com
Re: Solr Phonetic Search Highlight issue in search results
If you want to highlight, you need to turn on highlighting for the actual field you search, and that field needs to be stored, i.e. hl.fl=ContentSearchPhonetic -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 1. apr. 2013 kl. 14:16 skrev Erick Erickson erickerick...@gmail.com: Good question, you're causing me to think... about code I know very little about G. So rather than spouting off, I tried it and.. it works fine for me, either with or without using fast vector highlighter on, admittedly, a very simple test. So I think I'd try peeling off all the extra stuff you've put into your configs (sorry, I don't have time right now to try to reproduce) and get the very simple case working, then build the rest back up and see where the problem begins. Sorry for the mis-direction! Erick On Mon, Apr 1, 2013 at 1:07 AM, Soumyanayan Kar soumyanayan@rebaca.com wrote: Hi Erick, Thanks for the reply. But help me understand this: If Solr is able to isolate the two documents which contain the term fact being the phonetic equivalent of the search term fakt, then why will it be unable to highlight the terms based on the same logic it uses to search the documents. Also, it is correctly highlighting the results in other searches which are also approximate searches and not exact ones for eg. Fuzzy or Synonym search. In these cases also the highlights in the search results are far from the actual search term but still they are getting correctly highlighted. Maybe I am getting it completely wrong but it looks like there is something wrong with my implementation. Thanks Regards, Soumya. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 27 March 2013 06:07 AM To: solr-user@lucene.apache.org Subject: Re: Solr Phonetic Search Highlight issue in search results How would you expect it to highlight successfully? The term is fakt, there's nothing built in (and, indeed couldn't be) to un-phoneticize it into fact and apply that to the Content field. The whole point of phonetic processing is to do a lossy translation from the word into some variant, losing precision all the way. So this behavior is unsurprising... Best Erick On Tue, Mar 26, 2013 at 7:28 AM, Soumyanayan Kar soumyanayan@rebaca.com wrote: When we are issuing a query with Phonetic Search, it is returning the correct documents but not returning the highlights. When we use Stemming or Synonym searches we are getting the proper highlights. For example, when we execute a phonetic query for the term fakt(ContentSearchPhonetic:fakt) in the Solr Admin interface, it returns two documents containing the term fact(phonetic token equivalent), but the list of highlights is empty as shown in the response below. response lst name=responseHeader int name=status0/int int name=QTime16/int lst name=params str name=qContentSearchPhonetic:fakt/str str name=wtxml/str /lst /lst result name=response numFound=2 start=0 doc long name=DocId1/long str name=DocTitleDoc 1/str str name=ContentAnyway, this game was excellent and was well worth the time. The graphics are truly amazing and the sound track was pretty pleasant also. The preacher was in fact a thief./str long name=_version_1430480998833848320/long /doc doc long name=DocId2/long str name=DocTitleDoc 2/str str name=Contentstunning. The preacher was in fact an excellent thief who had stolen the original manuscript of Hamlet from an exhibit on the Riviera, where he also acquired his remarkable and tan./str long name=_version_1430480998841188352/long /doc /result lst name=highlighting lst name=1/ lst name=2/ /lst /response Relevant section of Solr schema: field name=DocId type=long indexed=true stored=true required=true/ field name=DocTitle type=string indexed=false stored=true required=true/ field name=Content type=text_general indexed=false stored=true required=true/ field name=ContentSearch type=text_general indexed=true stored=false multiValued=true/ field name=ContentSearchStemming type=text_stem indexed=true stored=false multiValued=true/ field name=ContentSearchPhonetic type=text_phonetic indexed=true stored=false multiValued=true/ field name=ContentSearchSynonym type=text_synonym indexed=true stored=false multiValued=true/ uniqueKeyDocId/uniqueKey copyField source=Content dest=ContentSearch/ copyField source=Content dest=ContentSearchStemming/ copyField source=Content dest=ContentSearchPhonetic/ copyField source=Content dest=ContentSearchSynonym/
Re: Solr 4.2 Cloud Replication Replica has higher version than Master?
Looking at the master it looks like at some point there were shards that went down. I am seeing things like what is below. NFO: A cluster state change: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live nodes size: 12) Apr 2, 2013 8:12:52 PM org.apache.solr.common.cloud.ZkStateReader$3 process INFO: Updating live nodes... (9) Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext runLeaderProcess INFO: Running the leader process. Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext shouldIBeLeader INFO: Checking if I should try and be the leader. Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext shouldIBeLeader INFO: My last published State was Active, it's okay to be the leader. Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext runLeaderProcess INFO: I may be the new leader - try and sync On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller markrmil...@gmail.com wrote: I don't think the versions you are thinking of apply here. Peersync does not look at that - it looks at version numbers for updates in the transaction log - it compares the last 100 of them on leader and replica. What it's saying is that the replica seems to have versions that the leader does not. Have you scanned the logs for any interesting exceptions? Did the leader change during the heavy indexing? Did any zk session timeouts occur? - Mark On Apr 2, 2013, at 4:52 PM, Jamie Johnson jej2...@gmail.com wrote: I am currently looking at moving our Solr cluster to 4.2 and noticed a strange issue while testing today. Specifically the replica has a higher version than the master which is causing the index to not replicate. Because of this the replica has fewer documents than the master. What could cause this and how can I resolve it short of taking down the index and scping the right version in? MASTER: Last Modified:about an hour ago Num Docs:164880 Max Doc:164880 Deleted Docs:0 Version:2387 Segment Count:23 REPLICA: Last Modified: about an hour ago Num Docs:164773 Max Doc:164773 Deleted Docs:0 Version:3001 Segment Count:30 in the replicas log it says this: INFO: Creating new http client, config:maxConnectionsPerHost=20maxConnections=1connTimeout=3socketTimeout=3retry=false Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solrSTART replicas=[ http://10.38.33.16:7575/solr/dsc-shard5-core1/] nUpdates=100 Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solr Received 100 versions from 10.38.33.16:7575/solr/dsc-shard5-core1/ Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solr Our versions are newer. ourLowThreshold=1431233788792274944 otherHigh=1431233789440294912 Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solrDONE. sync succeeded which again seems to point that it thinks it has a newer version of the index so it aborts. This happened while having 10 threads indexing 10,000 items writing to a 6 shard (1 replica each) cluster. Any thoughts on this or what I should look for would be appreciated.
Re: Solr 4.2 Cloud Replication Replica has higher version than Master?
here is another one that looks interesting Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: ClusterState says we are the leader, but locally we don't think so at org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293) at org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson jej2...@gmail.com wrote: Looking at the master it looks like at some point there were shards that went down. I am seeing things like what is below. NFO: A cluster state change: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live nodes size: 12) Apr 2, 2013 8:12:52 PM org.apache.solr.common.cloud.ZkStateReader$3 process INFO: Updating live nodes... (9) Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext runLeaderProcess INFO: Running the leader process. Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext shouldIBeLeader INFO: Checking if I should try and be the leader. Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext shouldIBeLeader INFO: My last published State was Active, it's okay to be the leader. Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext runLeaderProcess INFO: I may be the new leader - try and sync On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller markrmil...@gmail.com wrote: I don't think the versions you are thinking of apply here. Peersync does not look at that - it looks at version numbers for updates in the transaction log - it compares the last 100 of them on leader and replica. What it's saying is that the replica seems to have versions that the leader does not. Have you scanned the logs for any interesting exceptions? Did the leader change during the heavy indexing? Did any zk session timeouts occur? - Mark On Apr 2, 2013, at 4:52 PM, Jamie Johnson jej2...@gmail.com wrote: I am currently looking at moving our Solr cluster to 4.2 and noticed a strange issue while testing today. Specifically the replica has a higher version than the master which is causing the index to not replicate. Because of this the replica has fewer documents than the master. What could cause this and how can I resolve it short of taking down the index and scping the right version in? MASTER: Last Modified:about an hour ago Num Docs:164880 Max Doc:164880 Deleted Docs:0 Version:2387 Segment Count:23 REPLICA: Last Modified: about an hour ago Num Docs:164773 Max Doc:164773 Deleted Docs:0 Version:3001 Segment Count:30 in the replicas log it says this: INFO: Creating new http client, config:maxConnectionsPerHost=20maxConnections=1connTimeout=3socketTimeout=3retry=false Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solrSTART replicas=[ http://10.38.33.16:7575/solr/dsc-shard5-core1/] nUpdates=100 Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solr Received 100 versions from 10.38.33.16:7575/solr/dsc-shard5-core1/ Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync handleVersions INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solr Our versions are newer. ourLowThreshold=1431233788792274944 otherHigh=1431233789440294912 Apr 2, 2013 8:15:06 PM org.apache.solr.update.PeerSync sync INFO: PeerSync: core=dsc-shard5-core2 url=http://10.38.33.17:7577/solrDONE. sync succeeded which again seems to point that it thinks it has a newer version of the index so it aborts. This happened while having 10 threads indexing 10,000 items writing to a 6 shard (1 replica each) cluster. Any thoughts on this or what I should
Lengthy description is converted to hash symbols
Hi, I have a field that is defined to be of type text_en. Occasionally, I notice that lengthy strings are converted to hash symbols. Here is a snippet of my field type: fieldType name=text_en class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType field name=description type=text_en indexed=true stored=true required=false / Here is an example of the field's value: str name=description###/str Any ideas why this might be happening? -- View this message in context: http://lucene.472066.n3.nabble.com/Lengthy-description-is-converted-to-hash-symbols-tp4053338.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 4.2 Cloud Replication Replica has higher version than Master?
sorry for spamming here shard5-core2 is the instance we're having issues with... Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log SEVERE: shard update error StdNode: http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException: Server at http://10.38.33.17:7577/solr/dsc-shard5-core2 returned non ok status:503, message:Service Unavailable at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson jej2...@gmail.com wrote: here is another one that looks interesting Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: ClusterState says we are the leader, but locally we don't think so at org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293) at org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson jej2...@gmail.com wrote: Looking at the master it looks like at some point there were shards that went down. I am seeing things like what is below. NFO: A cluster state change: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live nodes size: 12) Apr 2, 2013 8:12:52 PM org.apache.solr.common.cloud.ZkStateReader$3 process INFO: Updating live nodes... (9) Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext runLeaderProcess INFO: Running the leader process. Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext shouldIBeLeader INFO: Checking if I should try and be the leader. Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext shouldIBeLeader INFO: My last published State was Active, it's okay to be the leader. Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext runLeaderProcess INFO: I may be the new leader - try and sync On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller markrmil...@gmail.comwrote: I don't think the versions you are thinking of apply here. Peersync does not look at that - it looks at version numbers for updates in the transaction log - it compares the last 100 of them on leader and replica. What it's saying is that the replica seems to have versions that the leader does not. Have you scanned the logs for any interesting exceptions? Did the leader change during the heavy indexing? Did any zk session timeouts occur? - Mark On Apr 2, 2013, at 4:52 PM, Jamie Johnson jej2...@gmail.com wrote: I am currently looking at moving our Solr cluster to 4.2 and noticed a strange issue while testing today. Specifically the replica has a higher version than the master which is causing the index to not replicate. Because of this the replica has fewer documents than the master. What could cause this and how can I resolve it short of taking down the index and scping the right version in? MASTER: Last Modified:about an hour ago Num Docs:164880 Max
Re: Lengthy description is converted to hash symbols
Can you enter the text on the Solr Admin UI Analysis page? Then you could tell which stage the issue occurs. StandardTokenizer has a default token length limit of 255. You can override with the maxTokenLength attribute: tokenizer class=solr.StandardTokenizerFactory maxTokenLength=1024 / See: https://lucene.apache.org/core/4_2_0/analyzers-common/org/apache/lucene/analysis/standard/StandardTokenizerFactory.html But the # sounds like a bug. -- Jack Krupansky -Original Message- From: Danny Watari Sent: Tuesday, April 02, 2013 5:45 PM To: solr-user@lucene.apache.org Subject: Lengthy description is converted to hash symbols Hi, I have a field that is defined to be of type text_en. Occasionally, I notice that lengthy strings are converted to hash symbols. Here is a snippet of my field type: fieldType name=text_en class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType field name=description type=text_en indexed=true stored=true required=false / Here is an example of the field's value: str name=description###/str Any ideas why this might be happening? -- View this message in context: http://lucene.472066.n3.nabble.com/Lengthy-description-is-converted-to-hash-symbols-tp4053338.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Lengthy description is converted to hash symbols
: Here is an example of the field's value: : str : name=description###/str where are you getting that str ... / from? if that's what you see when you do a search for a document, then it has nothing to do with your fieldType or analyzer -- the strings returned from searches are the stored values, which are not modified by the analyzer at all. What does your indexing code/process look like? Do you have any custom UpdateProcessors? details, details, details. -Hoss
Re: Solr 4.2 Cloud Replication Replica has higher version than Master?
Sorry I didn't ask the obvious question. Is there anything else that I should be looking for here and is this a bug? I'd be happy to troll through the logs further if more information is needed, just let me know. Also what is the most appropriate mechanism to fix this. Is it required to kill the index that is out of sync and let solr resync things? On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson jej2...@gmail.com wrote: sorry for spamming here shard5-core2 is the instance we're having issues with... Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log SEVERE: shard update error StdNode: http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException: Server at http://10.38.33.17:7577/solr/dsc-shard5-core2 returned non ok status:503, message:Service Unavailable at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson jej2...@gmail.com wrote: here is another one that looks interesting Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: ClusterState says we are the leader, but locally we don't think so at org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293) at org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson jej2...@gmail.com wrote: Looking at the master it looks like at some point there were shards that went down. I am seeing things like what is below. NFO: A cluster state change: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live nodes size: 12) Apr 2, 2013 8:12:52 PM org.apache.solr.common.cloud.ZkStateReader$3 process INFO: Updating live nodes... (9) Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext runLeaderProcess INFO: Running the leader process. Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext shouldIBeLeader INFO: Checking if I should try and be the leader. Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext shouldIBeLeader INFO: My last published State was Active, it's okay to be the leader. Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext runLeaderProcess INFO: I may be the new leader - try and sync On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller markrmil...@gmail.comwrote: I don't think the versions you are thinking of apply here. Peersync does not look at that - it looks at version numbers for updates in the transaction log - it compares the last 100 of them on leader and replica. What it's saying is that the replica seems to have versions that the leader does not. Have you scanned the logs for any interesting exceptions? Did the leader change during the heavy indexing? Did any zk session timeouts occur? - Mark On Apr 2, 2013, at 4:52 PM, Jamie Johnson jej2...@gmail.com wrote: I am currently looking
Re: Solr 4.2 Cloud Replication Replica has higher version than Master?
It would appear it's a bug given what you have said. Any other exceptions would be useful. Might be best to start tracking in a JIRA issue as well. To fix, I'd bring the behind node down and back again. Unfortunately, I'm pressed for time, but we really need to get to the bottom of this and fix it, or determine if it's fixed in 4.2.1 (spreading to mirrors now). - Mark On Apr 2, 2013, at 7:21 PM, Jamie Johnson jej2...@gmail.com wrote: Sorry I didn't ask the obvious question. Is there anything else that I should be looking for here and is this a bug? I'd be happy to troll through the logs further if more information is needed, just let me know. Also what is the most appropriate mechanism to fix this. Is it required to kill the index that is out of sync and let solr resync things? On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson jej2...@gmail.com wrote: sorry for spamming here shard5-core2 is the instance we're having issues with... Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log SEVERE: shard update error StdNode: http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException: Server at http://10.38.33.17:7577/solr/dsc-shard5-core2 returned non ok status:503, message:Service Unavailable at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson jej2...@gmail.com wrote: here is another one that looks interesting Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: ClusterState says we are the leader, but locally we don't think so at org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293) at org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson jej2...@gmail.com wrote: Looking at the master it looks like at some point there were shards that went down. I am seeing things like what is below. NFO: A cluster state change: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live nodes size: 12) Apr 2, 2013 8:12:52 PM org.apache.solr.common.cloud.ZkStateReader$3 process INFO: Updating live nodes... (9) Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext runLeaderProcess INFO: Running the leader process. Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext shouldIBeLeader INFO: Checking if I should try and be the leader. Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext shouldIBeLeader INFO: My last published State was Active, it's okay to be the leader. Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext runLeaderProcess INFO: I may be the new leader - try and sync On Tue, Apr 2, 2013 at 5:09 PM, Mark Miller markrmil...@gmail.comwrote: I don't think the versions you are thinking of apply here. Peersync does not look at that - it looks at
RequestHandler.. Conditional components
In our use cases, for certain query terms, we want to redirect the query processing to external system for the rest of the keywords, we want to continue with query component , facets etc. Based on some condition it is possible to skip some components in a request handler? -- View this message in context: http://lucene.472066.n3.nabble.com/RequestHandler-Conditional-components-tp4053381.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: MoreLikeThis - Odd results - what am I doing wrong?
Isn't this an AWS security groups question? You should probably post this question on the AWS forums, but for the moment, here's the basic reading material - go set up your EC2 security groups and lock down your systems. http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html If you just want to password protect Solr here are the instructions: http://wiki.apache.org/solr/SolrSecurity But I most certainly would not leave it open to the world even with a password (note that the basic password authentication sends passwords in clear text if you're not using HTTPS, best lock the thing down behind a firewall). Dave -Original Message- From: DC tech [mailto:dctech1...@gmail.com] Sent: Tuesday, April 02, 2013 1:02 PM To: solr-user@lucene.apache.org Subject: Re: MoreLikeThis - Odd results - what am I doing wrong? OK - so I have my SOLR instance running on AWS. Any suggestions on how to safely share the link? Right now, the whole SOLR instance is totally open. Gagandeep singh gagan.g...@gmail.com wrote: say debugQuery=truemlt=true and see the scores for the MLT query, not a sample query. You can use Amazon ec2 to bring up your solr, you should be able to get a micro instance for free trial. On Mon, Apr 1, 2013 at 5:10 AM, dc tech dctech1...@gmail.com wrote: I did try the raw query against the *simi* field and those seem to return results in the order expected. For instance, Acura MDX has ( large, SUV, 4WD Luxury) in the simi field. Running a query with those words against the simi field returns the expected models (X5, Audi Q5, etc) and then the subsequent documents have decreasing relevance. So the basic query mechanism seems to be fine. The issue just seems to be with MoreLikeThis component and handler. I can post the index on a public SOLR instance - any suggestions? (or for hosting) On Sun, Mar 31, 2013 at 1:54 PM, Gagandeep singh gagan.g...@gmail.com wrote: If you can bring up your solr setup on a public machine then im sure a lot of debugging can be done. Without that, i think what you should look at is the tf-idf scores of the terms like camry etc. Usually idf is the deciding factor into which results show at the top (tf should be 1 for your data). Enable debugQuery=true and look at explain section to see show score is getting calculated. You should try giving different boosts to class, type, drive, size to control the results. On Sun, Mar 31, 2013 at 8:52 PM, dc tech dctech1...@gmail.com wrote: I am running some experiments on more like this and the results seem rather odd - I am doing something wrong but just cannot figure out what. Basically, the similarity results are decent - but not great. *Issue 1 = Quality* Toyota Camry : finds Altima (good) but then next one is Camry Hybrid whereas it should have found Accord. I have normalized the data into a simi field which has only the attributes that I care about. Without the simi field, I could not get mlt.qf boosts to work well enough to return results *Issue 2* Some fields do not work at all. For instance, text+simi (in mlt.fl) works whereas just simi does not. So some weirdness that am just not understanding. Would be grateful for your guidance ! Here is the setup: *1. SOLR Version* solr-spec 4.2.0.2013.03.06.22.32.13 solr-impl 4.2.0 1453694 rmuir - 2013-03-06 22:32:13 lucene-spec 4.2.0 lucene-impl 4.2.0 1453694 - rmuir - 2013-03-06 22:25:29 *2. Machine Information* Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM (1.6.0_23 19.0-b09) Windows 7 Home 64 Bit with 4 GB RAM *3. Sample Data * I created this 'dummy' data of cars - the idea being that these would be sufficient and simple to generate similarity and understand how it would work. There are 181 rows in the data set (I have attached it for reference in CSV format) [image: Inline image 1] *4. SCHEMA* *Field Definitions* field name=id type=string indexed=true stored=true termVectors=true multiValued=false/ field name=make type=string indexed=true stored=true termVectors=true multiValued=false/ field name=model type=string indexed=true stored=true termVectors=true multiValued=false/ field name=class type=string indexed=true stored=true termVectors=true multiValued=false/ field name=type type=string indexed=true stored=true termVectors=true multiValued=false/ field name=drive type=string indexed=true stored=true termVectors=true multiValued=false/ field name=comment type=text_general indexed=true stored=true termVectors=true multiValued=true/ field name=size type=string indexed=true stored=true termVectors=true multiValued=false/ * * *Copy Fields* copyField source=make dest=make_en / !-- Search -- copyField source=model dest=model_en / !-- Search -- copyField
Re: Confusion over Solr highlight hl.q parameter
(13/04/03 5:27), Van Tassell, Kristian wrote: Thanks Koji, this helped with some of our problems, but it is still not perfect. This query, for example, returns no highlighting: ?q=id:abc123hl.q=text_it_IT:l'assiemehl.fl=text_it_IThl=truedefType=edismax But this one does (when it is, in effect, the same query): ?q=text_it_IT:l'assiemehl=truedefType=edismaxhl.fl=text_it_IT I've tried many combinations but can't seem to get the right one to work. Is this possibly a bug? As hl.q doesn't care defType parameter but does localParams, can you try to put {!edismax} to hl.q parameter? koji -- http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html
Re: Solr 4.2 Cloud Replication Replica has higher version than Master?
I brought the bad one down and back up and it did nothing. I can clear the index and try4.2.1. I will save off the logs and see if there is anything else odd On Apr 2, 2013 9:13 PM, Mark Miller markrmil...@gmail.com wrote: It would appear it's a bug given what you have said. Any other exceptions would be useful. Might be best to start tracking in a JIRA issue as well. To fix, I'd bring the behind node down and back again. Unfortunately, I'm pressed for time, but we really need to get to the bottom of this and fix it, or determine if it's fixed in 4.2.1 (spreading to mirrors now). - Mark On Apr 2, 2013, at 7:21 PM, Jamie Johnson jej2...@gmail.com wrote: Sorry I didn't ask the obvious question. Is there anything else that I should be looking for here and is this a bug? I'd be happy to troll through the logs further if more information is needed, just let me know. Also what is the most appropriate mechanism to fix this. Is it required to kill the index that is out of sync and let solr resync things? On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson jej2...@gmail.com wrote: sorry for spamming here shard5-core2 is the instance we're having issues with... Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log SEVERE: shard update error StdNode: http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException : Server at http://10.38.33.17:7577/solr/dsc-shard5-core2 returned non ok status:503, message:Service Unavailable at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson jej2...@gmail.com wrote: here is another one that looks interesting Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: ClusterState says we are the leader, but locally we don't think so at org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293) at org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson jej2...@gmail.com wrote: Looking at the master it looks like at some point there were shards that went down. I am seeing things like what is below. NFO: A cluster state change: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live nodes size: 12) Apr 2, 2013 8:12:52 PM org.apache.solr.common.cloud.ZkStateReader$3 process INFO: Updating live nodes... (9) Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext runLeaderProcess INFO: Running the leader process. Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext shouldIBeLeader INFO: Checking if I should try and be the leader. Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext shouldIBeLeader INFO: My last published State was Active, it's okay to
Re: Solr 4.2 Cloud Replication Replica has higher version than Master?
Mark It's there a particular jira issue that you think may address this? I read through it quickly but didn't see one that jumped out On Apr 2, 2013 10:07 PM, Jamie Johnson jej2...@gmail.com wrote: I brought the bad one down and back up and it did nothing. I can clear the index and try4.2.1. I will save off the logs and see if there is anything else odd On Apr 2, 2013 9:13 PM, Mark Miller markrmil...@gmail.com wrote: It would appear it's a bug given what you have said. Any other exceptions would be useful. Might be best to start tracking in a JIRA issue as well. To fix, I'd bring the behind node down and back again. Unfortunately, I'm pressed for time, but we really need to get to the bottom of this and fix it, or determine if it's fixed in 4.2.1 (spreading to mirrors now). - Mark On Apr 2, 2013, at 7:21 PM, Jamie Johnson jej2...@gmail.com wrote: Sorry I didn't ask the obvious question. Is there anything else that I should be looking for here and is this a bug? I'd be happy to troll through the logs further if more information is needed, just let me know. Also what is the most appropriate mechanism to fix this. Is it required to kill the index that is out of sync and let solr resync things? On Tue, Apr 2, 2013 at 5:45 PM, Jamie Johnson jej2...@gmail.com wrote: sorry for spamming here shard5-core2 is the instance we're having issues with... Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log SEVERE: shard update error StdNode: http://10.38.33.17:7577/solr/dsc-shard5-core2/:org.apache.solr.common.SolrException : Server at http://10.38.33.17:7577/solr/dsc-shard5-core2 returned non ok status:503, message:Service Unavailable at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:373) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) On Tue, Apr 2, 2013 at 5:43 PM, Jamie Johnson jej2...@gmail.com wrote: here is another one that looks interesting Apr 2, 2013 7:27:14 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: ClusterState says we are the leader, but locally we don't think so at org.apache.solr.update.processor.DistributedUpdateProcessor.doDefensiveChecks(DistributedUpdateProcessor.java:293) at org.apache.solr.update.processor.DistributedUpdateProcessor.setupRequest(DistributedUpdateProcessor.java:228) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:339) at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) On Tue, Apr 2, 2013 at 5:41 PM, Jamie Johnson jej2...@gmail.com wrote: Looking at the master it looks like at some point there were shards that went down. I am seeing things like what is below. NFO: A cluster state change: WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/live_nodes, has occurred - updating... (live nodes size: 12) Apr 2, 2013 8:12:52 PM org.apache.solr.common.cloud.ZkStateReader$3 process INFO: Updating live nodes... (9) Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext runLeaderProcess INFO: Running the leader process. Apr 2, 2013 8:12:52 PM org.apache.solr.cloud.ShardLeaderElectionContext
Re: WADL for REST service?
Hi Peter, I'm afraid we don't have anything that formal... almost empty: http://search-lucene.com/?q=wadlfc_project=Solr Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Apr 2, 2013 at 6:38 AM, Peter Schütt newsgro...@pstt.de wrote: Hallo, does a WADL exists for the REST service of SOLR? Ciao Peter Schütt
solre scores remains same for exact match and nearly exact match
Below is my query http://localhost:8983/solr/select/?q=subject:session management in phpfq=category:[*%20TO%20*]fl=category,score,subject The result is like below ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime983/int lst name=params str name=fqcategory:[* TO *]/str str name=qsubject:session management in php/str str name=flcategory,score,subject/str /lst /lst result name=response maxScore=0.8770298 start=0 numFound=2 doc float name=score0.8770298/float str name=categoryAnnapurnap/str str name=subjectsession management in asp.net/str /doc doc float name=score0.8770298/float str name=categoryAnnapurnap/str str name=subjectsession management in PHP/str /doc /result /response The question is how come both have the same score when 1 is exact match and the other isn't. This is the schema field name=subject type=text_en_splitting indexed=true stored=true/ field name=category type=text_general indexed=true stored=true/ -- View this message in context: http://lucene.472066.n3.nabble.com/solre-scores-remains-same-for-exact-match-and-nearly-exact-match-tp4053406.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr Phonetic Search Highlight issue in search results
Thanks a lot Erick for trying this out. Will wait for a reply from your end. Thanks Regards, Soumya. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 01 April 2013 05:46 PM To: solr-user@lucene.apache.org Subject: Re: Solr Phonetic Search Highlight issue in search results Good question, you're causing me to think... about code I know very little about G. So rather than spouting off, I tried it and.. it works fine for me, either with or without using fast vector highlighter on, admittedly, a very simple test. So I think I'd try peeling off all the extra stuff you've put into your configs (sorry, I don't have time right now to try to reproduce) and get the very simple case working, then build the rest back up and see where the problem begins. Sorry for the mis-direction! Erick On Mon, Apr 1, 2013 at 1:07 AM, Soumyanayan Kar soumyanayan@rebaca.com wrote: Hi Erick, Thanks for the reply. But help me understand this: If Solr is able to isolate the two documents which contain the term fact being the phonetic equivalent of the search term fakt, then why will it be unable to highlight the terms based on the same logic it uses to search the documents. Also, it is correctly highlighting the results in other searches which are also approximate searches and not exact ones for eg. Fuzzy or Synonym search. In these cases also the highlights in the search results are far from the actual search term but still they are getting correctly highlighted. Maybe I am getting it completely wrong but it looks like there is something wrong with my implementation. Thanks Regards, Soumya. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 27 March 2013 06:07 AM To: solr-user@lucene.apache.org Subject: Re: Solr Phonetic Search Highlight issue in search results How would you expect it to highlight successfully? The term is fakt, there's nothing built in (and, indeed couldn't be) to un-phoneticize it into fact and apply that to the Content field. The whole point of phonetic processing is to do a lossy translation from the word into some variant, losing precision all the way. So this behavior is unsurprising... Best Erick On Tue, Mar 26, 2013 at 7:28 AM, Soumyanayan Kar soumyanayan@rebaca.com wrote: When we are issuing a query with Phonetic Search, it is returning the correct documents but not returning the highlights. When we use Stemming or Synonym searches we are getting the proper highlights. For example, when we execute a phonetic query for the term fakt(ContentSearchPhonetic:fakt) in the Solr Admin interface, it returns two documents containing the term fact(phonetic token equivalent), but the list of highlights is empty as shown in the response below. response lst name=responseHeader int name=status0/int int name=QTime16/int lst name=params str name=qContentSearchPhonetic:fakt/str str name=wtxml/str /lst /lst result name=response numFound=2 start=0 doc long name=DocId1/long str name=DocTitleDoc 1/str str name=ContentAnyway, this game was excellent and was well worth the time. The graphics are truly amazing and the sound track was pretty pleasant also. The preacher was in fact a thief./str long name=_version_1430480998833848320/long /doc doc long name=DocId2/long str name=DocTitleDoc 2/str str name=Contentstunning. The preacher was in fact an excellent thief who had stolen the original manuscript of Hamlet from an exhibit on the Riviera, where he also acquired his remarkable and tan./str long name=_version_1430480998841188352/long /doc /result lst name=highlighting lst name=1/ lst name=2/ /lst /response Relevant section of Solr schema: field name=DocId type=long indexed=true stored=true required=true/ field name=DocTitle type=string indexed=false stored=true required=true/ field name=Content type=text_general indexed=false stored=true required=true/ field name=ContentSearch type=text_general indexed=true stored=false multiValued=true/ field name=ContentSearchStemming type=text_stem indexed=true stored=false multiValued=true/ field name=ContentSearchPhonetic type=text_phonetic indexed=true stored=false multiValued=true/ field name=ContentSearchSynonym type=text_synonym indexed=true stored=false multiValued=true/ uniqueKeyDocId/uniqueKey copyField source=Content dest=ContentSearch/ copyField source=Content dest=ContentSearchStemming/ copyField source=Content dest=ContentSearchPhonetic/ copyField source=Content dest=ContentSearchSynonym/ fieldType name=text_stem class=solr.TextField analyzer