Fwd: Solr 3.4 Grouping group.main=true results in java.lang.NoClassDefFound
I use drupal for accessing the solr search engine. After updating an creating my new index everthing works as before. Then I activate the group=true and group.field=site and solr delivers me the wanted search results but in Drupal nothing appears just an empty search page. I found out that the group changes the resultset names. No problem solr offers for this case the group.main=true parameter. So I added this and get this 500 error. HTTP Status 500 - org/apache/commons/lang/ArrayUtils java.lang.NoClassDefFoundError: org/apache/commons/lang/ArrayUtils at org.apache.solr.search.Grouping$Command.createSimpleResponse(Grouping.java:573) at org.apache.solr.search.Grouping$CommandField.finish(Grouping.java:675) at org.apache.solr.search.Grouping.execute(Grouping.java:339) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:240) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) I found out that solr didt find the class ArrayUtils.class. I try a lot of things to get this work. Setting JAVA_HOME and CLASSPATH vars and I changed the jre without any success. I am really wondering all my other programms are still running even solr in the normal mode is working and accesibly but not the group.main=true function. So my question is now what is nessesary to get this work? Any help is apreciated. Thx frank
Solr 3.4 Grouping group.main=true results in java.lang.NoClassDefFound
I use drupal for accessing the solr search engine. After updating an creating my new index everthing works as before. Then I activate the group=true and group.field=site and solr delivers me the wanted search results but in Drupal nothing appears just an empty search page. I found out that the group changes the resultset names. No problem solr offers for this case the group.main=true parameter. So I added this and get this 500 error. HTTP Status 500 - org/apache/commons/lang/ArrayUtils java.lang.NoClassDefFoundError: org/apache/commons/lang/ArrayUtils at org.apache.solr.search.Grouping$Command.createSimpleResponse(Grouping.java:573) at org.apache.solr.search.Grouping$CommandField.finish(Grouping.java:675) at org.apache.solr.search.Grouping.execute(Grouping.java:339) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:240) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) I found out that solr didt find the class ArrayUtils.class. I try a lot of things to get this work. Setting JAVA_HOME and CLASSPATH vars and I changed the jre without any success. I am really wondering all my other programms are still running even solr in the normal mode is working and accesibly but not the group.main=true function. So my question is now what is nessesary to get this work? Any help is apreciated. Thx frank
Upgrading from 3.1 to 3.4
I have been using solr 3.1 am planning to update to solr 3.4, whats the steps to be followed or anything that needs to be take care of specifically for the upgrade? Regards, Rohit
Re: getting answers starting with a requested string first
Thanks a lot for your advice. What really matters to me is that answers with NAME_ANALYZED=Tour Eiffel appear first. Then, if Tour Eiffel Tower By Helicopter appears before or after Hotel la tour Eiffel doesn't really matter. Since I send fq=NAME_ANALYZED:tour eiffel, I am sure NAME_ANALYZED will at least contain those two words. So I figured out that if I sort answers by this field length, I'll get those called Tour eiffel first. But I'll check the QParser anyway since it seems to be an interesting one. Best regards, Elisabeth 2011/9/28 Chris Hostetter hossman_luc...@fucit.org : 1) giving NAME_ANALYZED a type where omitNorms=false: I thought this would : give answers with shorter NAME_ANALYZED field a higher score. I've tested : that solution, but it's not working. I guess this is because there is no : score for fq parameter (all my answers have same score) both of those statements are correct. omitNorms=false will cause length normalization to apply, so with the default similarity, shorter field values will generally score higher, but norms are very coarse, so it won't be very precise; and fq queries filter the results, but do not affect the score. : 2) sorting my answers by length desc, and I guess in this case I would need : to store the length of NAME_ANALYZED field to avoid having to compute it on : the fly. at this point, this is the only solution I can think of. that will also be a good way to sort on the length of the field, and will give you a lot of precise control. but sorting on length isn't what you asked about... : and I have different answers like : : Restaurant la tour Eiffel : Hotel la tour Eiffel : Tour Eiffel ... : Is there a way to get answers with NAME_ANALYZED beginning with tour : Eiffel first? If you want to score documents higher because they appear at the begining of the field value, that is a differnet problem then scoring documents higher because they are shorter -- ie: Tour Eiffel Tower By Helicopter is longer then Hotel la tour Eiffel, which one do you want to come first? If you want documents to score higher if they appear early in the field value, you can either index a marker token at the begining of the field (ie: S_T_A_R_T Tour Eiffel) and then do all queries on that field as phrase queries including that token (shorter matches score higher in phrase queries); or you can look into using the surround QParser that was recently commited to the trunk. the surround parser has special syntax for generting Span Queries, which support a SpanFirst query that scores documents higher based on how close to the begining of a field value the match is. -Hoss
Re: Search for empty string in 1.4.1 vs 3.4
Thank you for the reply Chris. Please find the sample query which is returning results even though id is not having any value as in SOLR 1.4.1 http://localhost/solr/online/select/?q=%28%20state%20%29^1.8%20AND%20%20%28%20%28id:%22%22%29%29%20AND%20%20%28%20%28content_type_s:%22Video%22%29^1.5%20%29 PS: id is multivalued=false. Any field with multivalued as false works like this {returning results on search of } in SOLR 1.4.1. But at the same time q=id: is not returning any results in SOLR 1.4.1. This problem happens when there is AND clause with id: -- View this message in context: http://lucene.472066.n3.nabble.com/Search-for-empty-string-in-1-4-1-vs-3-4-tp3358444p3375436.html Sent from the Solr - User mailing list archive at Nabble.com.
strange performance issue with many shards on one server
Hi, I am experiencing a strange issue doing some load tests. Our setup: - 2 server with each 24 cpu cores, 130GB of RAM - 10 shards per server (needed for response times) running in a single tomcat instance - each query queries all 20 shards (distributed search) - each shard holds about 1.5 mio documents (small shards are needed due to rather complex queries) - all caches are warmed / high cache hit rates (99%) etc. Now for some reason we cannot seem to fully utilize all CPU power (no disk IO), ie. increasing concurrent users doesn't increase CPU-Load at a point, decreases throughput and increases the response times of the individual queries. Also 1-2% of the queries take significantly longer: avg somewhere at 100ms while 1-2% take 1.5s or longer. Any ideas are greatly appreciated :) Fred.
Distributed search has problems with some field names
Hello all, I'm experimenting with the Distributed Search bits in the nightly builds and I'm facing a problem. I have on my schema.xml some dynamic fields defined like this: dynamicField name=$* type=double indexed=true stored=true / dynamicField name=@* type=string indexed=true stored=true multiValued=true / dynamicField name=* type=string indexed=true stored=true / When hitting a single shard the following query works fine: http://solr/select?q=*:*fl=ts,$distinct_boxes But when I add the distrib=true parameter I get a NullPointerException: java.lang.NullPointerException at org.apache.solr.handler.component.QueryComponent.returnFields(QueryComponent.java:1025) at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:725) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:700) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:292) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1451) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) The $ in $distinct_boxes appears to be the culprit somehow, the query: http://solr/select?q=*:*fl=ts,distinct_boxesdistrib=true works without errors, but of course doesn't retrieve the field I want. Funnily enough when requesting the uniqueKey field there are no errors: http://solr/select?q=*:*fl=tid,ts,$distinct_boxesdistrib=true But somehow the data from the field $distinct_boxes doesn't appear in the output. Is there some workaround? Using fl=* returns all the data from the fields that start with $ but it severely increases the size of the response. -- Luis Neves
Re: strange performance issue with many shards on one server
Frederik Kraus, il 28/09/2011 12:58, ha scritto: Hi, I am experiencing a strange issue doing some load tests. Our setup: just because I've listened to JUG mates talking about that at the last meeting, could it be that your CPUs are spending their time getting things from RAM to CPU cache? maybe that, say, 10% CPU power is spent on the bus federico
Re: strange performance issue with many shards on one server
Hi Fred, analyze the queries which take longer. We observe our queries and see the problems with q-time with queries which are complex, with phrase queries or queries which contains numbers or special characters. if you don't know it: http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance Regards Vadim 2011/9/28 Frederik Kraus frederik.kr...@gmail.com Hi, I am experiencing a strange issue doing some load tests. Our setup: - 2 server with each 24 cpu cores, 130GB of RAM - 10 shards per server (needed for response times) running in a single tomcat instance - each query queries all 20 shards (distributed search) - each shard holds about 1.5 mio documents (small shards are needed due to rather complex queries) - all caches are warmed / high cache hit rates (99%) etc. Now for some reason we cannot seem to fully utilize all CPU power (no disk IO), ie. increasing concurrent users doesn't increase CPU-Load at a point, decreases throughput and increases the response times of the individual queries. Also 1-2% of the queries take significantly longer: avg somewhere at 100ms while 1-2% take 1.5s or longer. Any ideas are greatly appreciated :) Fred.
Re: strange performance issue with many shards on one server
Hi Vladim, the thing is, that those exact same queries, that take longer during a load test, perform just fine when executed at a slower request rate and are also random, i.e. there is no pattern in bad/slow queries. My first thought was some kind of contention and/or connection starvation for the internal shard communication? Fred. Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann: Hi Fred, analyze the queries which take longer. We observe our queries and see the problems with q-time with queries which are complex, with phrase queries or queries which contains numbers or special characters. if you don't know it: http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance Regards Vadim 2011/9/28 Frederik Kraus frederik.kr...@gmail.com (mailto:frederik.kr...@gmail.com) Hi, I am experiencing a strange issue doing some load tests. Our setup: - 2 server with each 24 cpu cores, 130GB of RAM - 10 shards per server (needed for response times) running in a single tomcat instance - each query queries all 20 shards (distributed search) - each shard holds about 1.5 mio documents (small shards are needed due to rather complex queries) - all caches are warmed / high cache hit rates (99%) etc. Now for some reason we cannot seem to fully utilize all CPU power (no disk IO), ie. increasing concurrent users doesn't increase CPU-Load at a point, decreases throughput and increases the response times of the individual queries. Also 1-2% of the queries take significantly longer: avg somewhere at 100ms while 1-2% take 1.5s or longer. Any ideas are greatly appreciated :) Fred.
Re: strange performance issue with many shards on one server
Hi Fred, ok, it's a strange behavior with same queries. Another questions: -which solr version? -do you indexing during your load test? (because of index rebuilt) -do you replicate your index? Regards Vadim 2011/9/28 Frederik Kraus frederik.kr...@gmail.com Hi Vladim, the thing is, that those exact same queries, that take longer during a load test, perform just fine when executed at a slower request rate and are also random, i.e. there is no pattern in bad/slow queries. My first thought was some kind of contention and/or connection starvation for the internal shard communication? Fred. Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann: Hi Fred, analyze the queries which take longer. We observe our queries and see the problems with q-time with queries which are complex, with phrase queries or queries which contains numbers or special characters. if you don't know it: http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance Regards Vadim 2011/9/28 Frederik Kraus frederik.kr...@gmail.com (mailto: frederik.kr...@gmail.com) Hi, I am experiencing a strange issue doing some load tests. Our setup: - 2 server with each 24 cpu cores, 130GB of RAM - 10 shards per server (needed for response times) running in a single tomcat instance - each query queries all 20 shards (distributed search) - each shard holds about 1.5 mio documents (small shards are needed due to rather complex queries) - all caches are warmed / high cache hit rates (99%) etc. Now for some reason we cannot seem to fully utilize all CPU power (no disk IO), ie. increasing concurrent users doesn't increase CPU-Load at a point, decreases throughput and increases the response times of the individual queries. Also 1-2% of the queries take significantly longer: avg somewhere at 100ms while 1-2% take 1.5s or longer. Any ideas are greatly appreciated :) Fred.
Re: strange performance issue with many shards on one server
Am Mittwoch, 28. September 2011 um 13:41 schrieb Vadim Kisselmann: Hi Fred, ok, it's a strange behavior with same queries. Another questions: -which solr version? 3.3 (might the NIOFSDirectory from 3.4 help?) -do you indexing during your load test? (because of index rebuilt) nope -do you replicate your index? nope Regards Vadim 2011/9/28 Frederik Kraus frederik.kr...@gmail.com (mailto:frederik.kr...@gmail.com) Hi Vladim, the thing is, that those exact same queries, that take longer during a load test, perform just fine when executed at a slower request rate and are also random, i.e. there is no pattern in bad/slow queries. My first thought was some kind of contention and/or connection starvation for the internal shard communication? Fred. Am Mittwoch, 28. September 2011 um 13:18 schrieb Vadim Kisselmann: Hi Fred, analyze the queries which take longer. We observe our queries and see the problems with q-time with queries which are complex, with phrase queries or queries which contains numbers or special characters. if you don't know it: http://www.hathitrust.org/blogs/large-scale-search/tuning-search-performance Regards Vadim 2011/9/28 Frederik Kraus frederik.kr...@gmail.com (mailto:frederik.kr...@gmail.com) (mailto: frederik.kr...@gmail.com (mailto:frederik.kr...@gmail.com)) Hi, I am experiencing a strange issue doing some load tests. Our setup: - 2 server with each 24 cpu cores, 130GB of RAM - 10 shards per server (needed for response times) running in a single tomcat instance - each query queries all 20 shards (distributed search) - each shard holds about 1.5 mio documents (small shards are needed due to rather complex queries) - all caches are warmed / high cache hit rates (99%) etc. Now for some reason we cannot seem to fully utilize all CPU power (no disk IO), ie. increasing concurrent users doesn't increase CPU-Load at a point, decreases throughput and increases the response times of the individual queries. Also 1-2% of the queries take significantly longer: avg somewhere at 100ms while 1-2% take 1.5s or longer. Any ideas are greatly appreciated :) Fred.
Still too many files after running solr optimization
Hi, I am using solr 3.3. I noticed that after indexing about 700, 000 records and running optimization at the end, i still have about 91 files in my index directory. I thought that optimization was supposed to reduce the number of files. My settings are the default that came with Solr (mergefactor, etc) Any ideas what i could be doing wrong?
Re: Still too many files after running solr optimization
Try to do optimize twice. The 2nd one will be quick and will delete lot of files. On Wed, Sep 28, 2011 at 5:26 PM, Kissue Kissue kissue...@gmail.com wrote: Hi, I am using solr 3.3. I noticed that after indexing about 700, 000 records and running optimization at the end, i still have about 91 files in my index directory. I thought that optimization was supposed to reduce the number of files. My settings are the default that came with Solr (mergefactor, etc) Any ideas what i could be doing wrong?
Re: strange performance issue with many shards on one server
I just had a look at the thread-dump, pasting 3 examples here: 'pool-31-thread-8233' Id=11626, BLOCKED on lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9, total cpu time=20.ms user time=20.ms at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool.freeConnection(MultiThreadedHttpConnectionManager.java:982) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.releaseConnection(MultiThreadedHttpConnectionManager.java:643) at org.apache.commons.httpclient.HttpConnection.releaseConnection(HttpConnection.java:1179) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.releaseConnection(MultiThreadedHttpConnectionManager.java:1423) at org.apache.commons.httpclient.HttpMethodBase.ensureConnectionRelease(HttpMethodBase.java:2430) at org.apache.commons.httpclient.HttpMethodBase.responseBodyConsumed(HttpMethodBase.java:2422) at org.apache.commons.httpclient.HttpMethodBase$1.responseConsumed(HttpMethodBase.java:1892) at org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:198) at org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158) at org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1181) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:486) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 'pool-31-thread-8232' Id=11625, BLOCKED on lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9, total cpu time=20.ms user time=20.ms at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:447) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) and 'http-8080-381' Id=6859, WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2607b720, total cpu time=990.ms user time=920.ms at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:164) at org.apache.solr.handler.component.HttpCommComponent.takeCompletedOrError(SearchHandler.java:469) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:271) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at
Re: Still too many files after running solr optimization
why should the optimization reduce the number of files? It happens only when you indexing docs with same unique key. Have you differences in numDocs und maxDocs after optimize? If yes: how is your optimize command ? Regards Vadim 2011/9/28 Manish Bafna manish.bafna...@gmail.com Try to do optimize twice. The 2nd one will be quick and will delete lot of files. On Wed, Sep 28, 2011 at 5:26 PM, Kissue Kissue kissue...@gmail.com wrote: Hi, I am using solr 3.3. I noticed that after indexing about 700, 000 records and running optimization at the end, i still have about 91 files in my index directory. I thought that optimization was supposed to reduce the number of files. My settings are the default that came with Solr (mergefactor, etc) Any ideas what i could be doing wrong?
Re: Still too many files after running solr optimization
numDocs and maxDocs are same size. I was worried because when i used to use only Lucene for the same indexing, before optimization there are many files but after optimization i always end up with just 3 files in my index filder. Just want to find out if this was ok. Thanks On Wed, Sep 28, 2011 at 1:23 PM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: why should the optimization reduce the number of files? It happens only when you indexing docs with same unique key. Have you differences in numDocs und maxDocs after optimize? If yes: how is your optimize command ? Regards Vadim 2011/9/28 Manish Bafna manish.bafna...@gmail.com Try to do optimize twice. The 2nd one will be quick and will delete lot of files. On Wed, Sep 28, 2011 at 5:26 PM, Kissue Kissue kissue...@gmail.com wrote: Hi, I am using solr 3.3. I noticed that after indexing about 700, 000 records and running optimization at the end, i still have about 91 files in my index directory. I thought that optimization was supposed to reduce the number of files. My settings are the default that came with Solr (mergefactor, etc) Any ideas what i could be doing wrong?
Re: Still too many files after running solr optimization
if numDocs und maxDocs have the same mumber of docs nothing will be deleted on optimize. You only rebuild your index. Regards Vadim 2011/9/28 Kissue Kissue kissue...@gmail.com numDocs and maxDocs are same size. I was worried because when i used to use only Lucene for the same indexing, before optimization there are many files but after optimization i always end up with just 3 files in my index filder. Just want to find out if this was ok. Thanks On Wed, Sep 28, 2011 at 1:23 PM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: why should the optimization reduce the number of files? It happens only when you indexing docs with same unique key. Have you differences in numDocs und maxDocs after optimize? If yes: how is your optimize command ? Regards Vadim 2011/9/28 Manish Bafna manish.bafna...@gmail.com Try to do optimize twice. The 2nd one will be quick and will delete lot of files. On Wed, Sep 28, 2011 at 5:26 PM, Kissue Kissue kissue...@gmail.com wrote: Hi, I am using solr 3.3. I noticed that after indexing about 700, 000 records and running optimization at the end, i still have about 91 files in my index directory. I thought that optimization was supposed to reduce the number of files. My settings are the default that came with Solr (mergefactor, etc) Any ideas what i could be doing wrong?
Re: help understanding match
On Tue, Sep 27, 2011 at 10:58 PM, tamanjit.bin...@yahoo.co.in tamanjit.bin...@yahoo.co.in wrote: Hi, 1. Just curious - you have your defaultsearchfield - defaultquery as not stored, how do you know that it contains what you think it contains? 2. the fieldType of defaultquery is query_text, am not sure what all analyzers are you using on this fields type both at indexing time and querying time . This could actually be the reason why stopwords were not used both during indexing and querying time. Thank you. This seemed to be the problem - I had started with a schema doc from another project, and made this mistake. 3. Lastly, if you wanr OR operato to work dont use (quotes) instead use () brackets around your searchable term. The quotes were from some py code for creating the query string, and were only illustrative. thanks again! Vijay -- Targeted direct marketing on Twitter - http://www.wisdomtap.com/
Re: strange performance issue with many shards on one server
Hmm, sorry don't know... My ideas: - tomcat generate this problem (for example: maxthreads, number of connections...) - JVM - Options, especially GC - index locks, eventually an open issue in jira Regards Vadim 2011/9/28 Frederik Kraus frederik.kr...@gmail.com I just had a look at the thread-dump, pasting 3 examples here: 'pool-31-thread-8233' Id=11626, BLOCKED on lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9, total cpu time=20.ms user time=20.ms at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool.freeConnection(MultiThreadedHttpConnectionManager.java:982) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.releaseConnection(MultiThreadedHttpConnectionManager.java:643) at org.apache.commons.httpclient.HttpConnection.releaseConnection(HttpConnection.java:1179) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.releaseConnection(MultiThreadedHttpConnectionManager.java:1423) at org.apache.commons.httpclient.HttpMethodBase.ensureConnectionRelease(HttpMethodBase.java:2430) at org.apache.commons.httpclient.HttpMethodBase.responseBodyConsumed(HttpMethodBase.java:2422) at org.apache.commons.httpclient.HttpMethodBase$1.responseConsumed(HttpMethodBase.java:1892) at org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:198) at org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158) at org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1181) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:486) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 'pool-31-thread-8232' Id=11625, BLOCKED on lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9, total cpu time=20.ms user time=20.ms at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:447) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) and 'http-8080-381' Id=6859, WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@2607b720, total cpu time=990.ms user time=920.ms at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399) at java.util.concurrent.ExecutorCompletionService.take(ExecutorCompletionService.java:164) at org.apache.solr.handler.component.HttpCommComponent.takeCompletedOrError(SearchHandler.java:469) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:271) at
FieldCollapsing don't return every groups
Hello, I'm using the field collapsing feature to group my products by merchant and I don't understand why some merchant are missing on the result send by solr. My request is http:/localhost:8983/solr/select/?q=merchant_name_t:*version=2.2start=0rows=2000indent=ongroup=truegroup.field=merchant_name_tfl=merchant_name_twt=json. Currently the request return 166 merchants and it should return more than that. Did I do something wrong in my query? Thank you, Remy
Re: FieldCollapsing don't return every groups
Hi Remy, could you paste the analyzer part of the field merchant_name_t please ? And when you say it should return more than that, could you explain why with examples ? If I'm not wrong, the field collapsing function is based on indexed values, so if your analyzer is complex (not string), Rémy Loubradou can be indexed as remy and loubradou. And Rémy NotLoubradou could be grouped with Rémy Loubradou. This could explain the behavior. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/FieldCollapsing-don-t-return-every-groups-tp3376036p3376089.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Sort five random Top Offers to the top
Hey Community. I write my first component and now i got a problem hear is my code: @Override public void prepare(ResponseBuilder rb) throws IOException { try { rb.req.getParams().getBool(topoffers.show, true); String client = rb.req.getParams().get(client, 1); BooleanQuery[] queries = new BooleanQuery[2]; queries[0] = (BooleanQuery) DisMaxQParser.getParser( rb.req.getParams().get(q), DisMaxQParserPlugin.NAME, rb.req) .getQuery(); queries[1] = new BooleanQuery(); Occur occur = BooleanClause.Occur.MUST; queries[1].add(QueryParsing.parseQuery(ups_topoffer_ + client + :true, rb.req.getSearcher().getSchema()), occur); Query q = Query.mergeBooleanQueries(queries[0], queries[1]); DocList ergebnis = rb.req.getSearcher().getDocList(q, null, null, 0, 5, 0); String[] machineIds = new String[5]; int position = 0; DocIterator iter = ergebnis.iterator(); while (iter.hasNext()) { int docID = iter.nextDoc(); Document doc = rb.req.getSearcher().getReader().document(docID); for (String value : doc.getValues(machine_id)) { machineIds[position++] = value; } } Sort sort = rb.getSortSpec().getSort(); if (sort == null) { rb.getSortSpec().setSort(new Sort()); sort = rb.getSortSpec().getSort(); } SortField[] newSortings = new SortField[sort.getSort().length + 5]; int count = 0; for (String machineId : machineIds) { SortField sortMachineId = new SortField(map(machine_id, + machineId + , + machineId + ,1,0) desc, SortField.DOUBLE); newSortings[count++] = sortMachineId; } SortField[] sortings = sort.getSort(); for (SortField sorting : sortings) { newSortings[count++] = sorting; } sort.setSort(newSortings); rb.getSortSpec().setSort(sort); } catch (ParseException e) { LoggerFactory.getLogger(Topoffers.class).error( Fehler bei den Topoffers!, this); LoggerFactory.getLogger(Topoffers.class).error(e.toString(), this); } } Why can't i manipulate the sort? Is there something i miss understand? This search component is added as a first-component in the solrconfig.xml. Please can anyone help me?? -- View this message in context: http://lucene.472066.n3.nabble.com/Sort-five-random-Top-Offers-to-the-top-tp3355469p3376166.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Still too many files after running solr optimization
Will it not merge the index? While merging on windows, the old index files dont get deleted. (Windows has an issue where the file opened for reading cannot be deleted) So, if you call optimize again, it will delete the older index files. On Wed, Sep 28, 2011 at 6:43 PM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: if numDocs und maxDocs have the same mumber of docs nothing will be deleted on optimize. You only rebuild your index. Regards Vadim 2011/9/28 Kissue Kissue kissue...@gmail.com numDocs and maxDocs are same size. I was worried because when i used to use only Lucene for the same indexing, before optimization there are many files but after optimization i always end up with just 3 files in my index filder. Just want to find out if this was ok. Thanks On Wed, Sep 28, 2011 at 1:23 PM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: why should the optimization reduce the number of files? It happens only when you indexing docs with same unique key. Have you differences in numDocs und maxDocs after optimize? If yes: how is your optimize command ? Regards Vadim 2011/9/28 Manish Bafna manish.bafna...@gmail.com Try to do optimize twice. The 2nd one will be quick and will delete lot of files. On Wed, Sep 28, 2011 at 5:26 PM, Kissue Kissue kissue...@gmail.com wrote: Hi, I am using solr 3.3. I noticed that after indexing about 700, 000 records and running optimization at the end, i still have about 91 files in my index directory. I thought that optimization was supposed to reduce the number of files. My settings are the default that came with Solr (mergefactor, etc) Any ideas what i could be doing wrong?
Re: FieldCollapsing don't return every groups
Hi Ludovic, I'm not sure to understand which piece of my schema expose the analyzer so you will find my schema here https://github.com/lbdremy/solr-install/blob/master/conf/schema.xml. Hope this will be helpfull :) The merchant_name_t is a dynamic field matching the *_t pattern so this field is indexed and the type is text_general. When I said it should return more than that, I mean the result send by solr contains 166 groups(=merchants) and it should return more than 166 groups(merchants). For example the merchant Cult Beauty Ltd. doesn't not appear in the result and others merchants don't begin by Cult, so where this merchant is grouped? Thank you very much for your help Ludovic. Rémy, On 28 September 2011 15:52, lboutros boutr...@gmail.com wrote: Hi Remy, could you paste the analyzer part of the field merchant_name_t please ? And when you say it should return more than that, could you explain why with examples ? If I'm not wrong, the field collapsing function is based on indexed values, so if your analyzer is complex (not string), Rémy Loubradou can be indexed as remy and loubradou. And Rémy NotLoubradou could be grouped with Rémy Loubradou. This could explain the behavior. Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/FieldCollapsing-don-t-return-every-groups-tp3376036p3376089.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: strange performance issue with many shards on one server
On Wed, 2011-09-28 at 12:58 +0200, Frederik Kraus wrote: - 10 shards per server (needed for response times) running in a single tomcat instance Have you tested that sharding actually decreases response times in your case? I see the idea in decreasing response times with sharding at the cost of decreasing throughput, but the added overhead of merging is non-trivial. - each query queries all 20 shards (distributed search) - each shard holds about 1.5 mio documents (small shards are needed due to rather complex queries) - all caches are warmed / high cache hit rates (99%) etc. Now for some reason we cannot seem to fully utilize all CPU power (no disk IO), ie. increasing concurrent users doesn't increase CPU-Load at a point, decreases throughput and increases the response times of the individual queries. It sounds as if there's a hard limit on the number of concurrent users somewhere. I am no expert in httpclient, but the blocked threads in your thread dump seems to indicate that they wait for connections to be established rather than for results to be produced. I seem to remember that tomcat has a default limit on 200 concurrent connections and with 10 shards/search, that is just 200 / (10 shard_connections + 1 incoming_connection) = 18 concurrent searches. Also 1-2% of the queries take significantly longer: avg somewhere at 100ms while 1-2% take 1.5s or longer. Could be garbage collection, especially since it shows under high load which might result in more old objects and thereby trigger full gc.
Re: Still too many files after running solr optimization
2011/9/28 Manish Bafna manish.bafna...@gmail.com Will it not merge the index? yes While merging on windows, the old index files dont get deleted. (Windows has an issue where the file opened for reading cannot be deleted) So, if you call optimize again, it will delete the older index files. no. during optimize you only delete docs, which are flagged as deleted. no matter how old they are. if your numDocs and maxDocs have the same number of Docs, you only rebuild and merge your index, but you delete nothing. Regards On Wed, Sep 28, 2011 at 6:43 PM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: if numDocs und maxDocs have the same mumber of docs nothing will be deleted on optimize. You only rebuild your index. Regards Vadim 2011/9/28 Kissue Kissue kissue...@gmail.com numDocs and maxDocs are same size. I was worried because when i used to use only Lucene for the same indexing, before optimization there are many files but after optimization i always end up with just 3 files in my index filder. Just want to find out if this was ok. Thanks On Wed, Sep 28, 2011 at 1:23 PM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: why should the optimization reduce the number of files? It happens only when you indexing docs with same unique key. Have you differences in numDocs und maxDocs after optimize? If yes: how is your optimize command ? Regards Vadim 2011/9/28 Manish Bafna manish.bafna...@gmail.com Try to do optimize twice. The 2nd one will be quick and will delete lot of files. On Wed, Sep 28, 2011 at 5:26 PM, Kissue Kissue kissue...@gmail.com wrote: Hi, I am using solr 3.3. I noticed that after indexing about 700, 000 records and running optimization at the end, i still have about 91 files in my index directory. I thought that optimization was supposed to reduce the number of files. My settings are the default that came with Solr (mergefactor, etc) Any ideas what i could be doing wrong?
Re: Still too many files after running solr optimization
We tested it so many times. 1st time we optimize, the new index file is created (merged one), but the existing index files are not deleted (because they might be still open for reading) 2nd time optimize, other than the new index file, all else gets deleted. This is happening specifically on Windows. On Wed, Sep 28, 2011 at 8:23 PM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: 2011/9/28 Manish Bafna manish.bafna...@gmail.com Will it not merge the index? yes While merging on windows, the old index files dont get deleted. (Windows has an issue where the file opened for reading cannot be deleted) So, if you call optimize again, it will delete the older index files. no. during optimize you only delete docs, which are flagged as deleted. no matter how old they are. if your numDocs and maxDocs have the same number of Docs, you only rebuild and merge your index, but you delete nothing. Regards On Wed, Sep 28, 2011 at 6:43 PM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: if numDocs und maxDocs have the same mumber of docs nothing will be deleted on optimize. You only rebuild your index. Regards Vadim 2011/9/28 Kissue Kissue kissue...@gmail.com numDocs and maxDocs are same size. I was worried because when i used to use only Lucene for the same indexing, before optimization there are many files but after optimization i always end up with just 3 files in my index filder. Just want to find out if this was ok. Thanks On Wed, Sep 28, 2011 at 1:23 PM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: why should the optimization reduce the number of files? It happens only when you indexing docs with same unique key. Have you differences in numDocs und maxDocs after optimize? If yes: how is your optimize command ? Regards Vadim 2011/9/28 Manish Bafna manish.bafna...@gmail.com Try to do optimize twice. The 2nd one will be quick and will delete lot of files. On Wed, Sep 28, 2011 at 5:26 PM, Kissue Kissue kissue...@gmail.com wrote: Hi, I am using solr 3.3. I noticed that after indexing about 700, 000 records and running optimization at the end, i still have about 91 files in my index directory. I thought that optimization was supposed to reduce the number of files. My settings are the default that came with Solr (mergefactor, etc) Any ideas what i could be doing wrong?
Re: FieldCollapsing don't return every groups
Ok, thanks for the schema. the merchant Cult Beauty Ltd should be indexed like this: cult beauty ltd I think some other merchants contain at least one of these words. you should try to group with a special field used for field collapsing: dynamicField name=*_t_group type=stringindexed=true stored=true/ I think you could even disable the stored value for this particular field (not sure, I have to check). Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/FieldCollapsing-don-t-return-every-groups-tp3376036p3376289.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Still too many files after running solr optimization
we had an understanding problem:) docs are the docs in index. files are the files in the index directory (index parts). during the optimization you don't delete docs if they are don't flagged as deleted. but you merge your index und delete the files in your index directory, thats right. after an second optimize the files are deleted which were opened for reading. Regards 2011/9/28 Manish Bafna manish.bafna...@gmail.com We tested it so many times. 1st time we optimize, the new index file is created (merged one), but the existing index files are not deleted (because they might be still open for reading) 2nd time optimize, other than the new index file, all else gets deleted. This is happening specifically on Windows. On Wed, Sep 28, 2011 at 8:23 PM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: 2011/9/28 Manish Bafna manish.bafna...@gmail.com Will it not merge the index? yes While merging on windows, the old index files dont get deleted. (Windows has an issue where the file opened for reading cannot be deleted) So, if you call optimize again, it will delete the older index files. no. during optimize you only delete docs, which are flagged as deleted. no matter how old they are. if your numDocs and maxDocs have the same number of Docs, you only rebuild and merge your index, but you delete nothing. Regards On Wed, Sep 28, 2011 at 6:43 PM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: if numDocs und maxDocs have the same mumber of docs nothing will be deleted on optimize. You only rebuild your index. Regards Vadim 2011/9/28 Kissue Kissue kissue...@gmail.com numDocs and maxDocs are same size. I was worried because when i used to use only Lucene for the same indexing, before optimization there are many files but after optimization i always end up with just 3 files in my index filder. Just want to find out if this was ok. Thanks On Wed, Sep 28, 2011 at 1:23 PM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: why should the optimization reduce the number of files? It happens only when you indexing docs with same unique key. Have you differences in numDocs und maxDocs after optimize? If yes: how is your optimize command ? Regards Vadim 2011/9/28 Manish Bafna manish.bafna...@gmail.com Try to do optimize twice. The 2nd one will be quick and will delete lot of files. On Wed, Sep 28, 2011 at 5:26 PM, Kissue Kissue kissue...@gmail.com wrote: Hi, I am using solr 3.3. I noticed that after indexing about 700, 000 records and running optimization at the end, i still have about 91 files in my index directory. I thought that optimization was supposed to reduce the number of files. My settings are the default that came with Solr (mergefactor, etc) Any ideas what i could be doing wrong?
Re: FieldCollapsing don't return every groups
I just checked, you can disable the storing parameter and use this field: dynamicField name=*_t_group type=stringindexed=true stored=false/ Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/FieldCollapsing-don-t-return-every-groups-tp3376036p3376316.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: FieldCollapsing don't return every groups
You right one of the group is 'ltd. Thanks :) I fixed this issue using a field that I know is unique for each merchant (the merchant id). Again thanks for your help Ludovic. Sinon en France il fait beau? :) On 28 September 2011 16:56, lboutros boutr...@gmail.com wrote: Ok, thanks for the schema. the merchant Cult Beauty Ltd should be indexed like this: cult beauty ltd I think some other merchants contain at least one of these words. you should try to group with a special field used for field collapsing: dynamicField name=*_t_group type=stringindexed=true stored=true/ I think you could even disable the stored value for this particular field (not sure, I have to check). Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/FieldCollapsing-don-t-return-every-groups-tp3376036p3376289.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: FieldCollapsing don't return every groups
excellent ! and yes, il fait très beau en France :) - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/FieldCollapsing-don-t-return-every-groups-tp3376036p3376362.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Searching multiple fields
It will be nice if we can have dissum in addition to dismax. ;-) On Tue, Sep 27, 2011 at 9:26 AM, lee carroll lee.a.carr...@googlemail.comwrote: see http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html On 27 September 2011 16:04, Mark static.void@gmail.com wrote: I thought that a similarity class will only affect the scoring of a single field.. not across multiple fields? Can anyone else chime in with some input? Thanks. On 9/26/11 9:02 PM, Otis Gospodnetic wrote: Hi Mark, Eh, I don't have Lucene/Solr source code handy, but I *think* for that you'd need to write custom Lucene similarity. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Markstatic.void@gmail.com To: solr-user@lucene.apache.org Sent: Monday, September 26, 2011 8:12 PM Subject: Searching multiple fields I have a use case where I would like to search across two fields but I do not want to weight a document that has a match in both fields higher than a document that has a match in only 1 field. For example. Document 1 - Field A: Foo Bar - Field B: Foo Baz Document 2 - Field A: Foo Blarg - Field B: Something else Now when I search for Foo I would like document 1 and 2 to be similarly scored however document 1 will be scored much higher in this use case because it matches in both fields. I could create a third field and use copyField directive to search across that but I was wondering if there is an alternative way. It would be nice if we could search across some sort of virtual field that will use both underlying fields but not actually increase the size of the index. Thanks
Re: strange performance issue with many shards on one server
Hi Frederik, I haven't directly run into this issue with Solr, but I have experienced similar issues in a related context. In my case, I had a custom webapp that made SolrJ requests and then generated some aggregated/analyzed results. During load testing, we ran into a few different issues... 1. The load test software itself had an issue with scaling - I'm assuming that's not the case for you, but I've seen it happen more than once. E.g. there's a limit to max parallel connections in the client being used to talk to Solr. 2. We needed to tune up the SolrJ settings for the HttpConnectionManager Under heavy load, this was running out of free connections. Given you've got 20 shards, each request is going to spawn 20 HTTP connections. I don't know off the top of my head how solr.SearchHandler manages connections (and whether it's possible to tune this), but from the stack trace below it sure looks like you're blocked on getting free HTTP connections. 3. We needed to optimize our configuration for Jetty, Ubuntu, JVM GC, etc. There are lots of knobs to twiddle here, for better or worse. -- Ken On Sep 28, 2011, at 5:21am, Frederik Kraus wrote: I just had a look at the thread-dump, pasting 3 examples here: 'pool-31-thread-8233' Id=11626, BLOCKED on lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9, total cpu time=20.ms user time=20.ms at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool.freeConnection(MultiThreadedHttpConnectionManager.java:982) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.releaseConnection(MultiThreadedHttpConnectionManager.java:643) at org.apache.commons.httpclient.HttpConnection.releaseConnection(HttpConnection.java:1179) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.releaseConnection(MultiThreadedHttpConnectionManager.java:1423) at org.apache.commons.httpclient.HttpMethodBase.ensureConnectionRelease(HttpMethodBase.java:2430) at org.apache.commons.httpclient.HttpMethodBase.responseBodyConsumed(HttpMethodBase.java:2422) at org.apache.commons.httpclient.HttpMethodBase$1.responseConsumed(HttpMethodBase.java:1892) at org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:198) at org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158) at org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1181) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:486) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 'pool-31-thread-8232' Id=11625, BLOCKED on lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9, total cpu time=20.ms user time=20.ms at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:447) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at
synonym filtering at index time
Trying to add in synonyms at index time but it's not working as expected. Here's the schema and example from synonyms.txt synonyms.txt has : watch, watches, watche, watchs schema for the field : fieldType name=text_ngram class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrement=true/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false/ filter class=solr.EdgeNGramFilterFactory minGramSize=2 maxGramSize=15 side=front/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType When I run analysis, the index query correctly shows watche = watch, which is then EdgeNGrammed My understanding of how this is meant to work is that solr will index all instances of 'watche' as 'watch' when expand=false This doesn't seem to be happening though. Any ideas on what I'm missing? I initially set the synonym filtering to run at query time as its user input however that was returning the same results so I thought it might be because those terms were already in the index and would therefore show up in the results Thanks Doug -- Become a Firebox Fan on Facebook: http://facebook.com/firebox And Follow us on Twitter: http://twitter.com/firebox Firebox has been nominated for Retailer of the Year in the 2011 Stuff Awards. Who will win? It's up to you! Visit http://www.stuff.tv/awards and place your vote. We'll do a special dance if it's us. Firebox HQ is MOVING HOUSE! We're migrating from Streatham Hill to shiny new digs in Shoreditch. As of 3rd October please update your records to: Firebox.com, 6.10 The Tea Building, 56 Shoreditch High Street, London, E1 6JJ Global Head Office: Firebox House, Ardwell Road, London SW2 4RT Firebox.com Ltd is registered in England and Wales, company number 3874477 Registered Company Address: 41 Welbeck Street London W1G 8EA Firebox.com Any views expressed in this email are those of the individual sender, except where the sender expressly, and with authority, states them to be the views of Firebox.com Ltd.
RE: strange performance issue with many shards on one server
That would still show up as the CPU being busy. -Original Message- From: Federico Fissore [mailto:feder...@fissore.org] Sent: Wednesday, September 28, 2011 6:12 AM To: solr-user@lucene.apache.org Subject: Re: strange performance issue with many shards on one server Frederik Kraus, il 28/09/2011 12:58, ha scritto: Hi, I am experiencing a strange issue doing some load tests. Our setup: just because I've listened to JUG mates talking about that at the last meeting, could it be that your CPUs are spending their time getting things from RAM to CPU cache? maybe that, say, 10% CPU power is spent on the bus federico
Re: strange performance issue with many shards on one server
Hi Ken, the HttpConnectionManager was actually the first thing I looked at - and bumped the Solr default of 20 up to 50, 100, 400, 1 (which should be more or less unlimited ;) ). Unfortunately didn't really solve anything. I don't know if the static HttpClient is a problem here as it will be the same HttpConnectionManager for all shards … Obviously a way of validating this would be to spawn 20 tomcat (or jetty) instances, one for each shard and 10 per server - hopefully there is an easier way ;) By the way: Ubuntu / GC / etc. are all tuned and shouldn't be a bottleneck here. The GC only spends about 50-100ms during a 10min load test, and never a full-GC. Just going through a jstack dump again, it looks like the HttpConnectionManager is actually waiting for a lock … pool-31-thread-15776 prio=10 tid=0x7ef544249000 nid=0x50be waiting for monitor entry [0x7ef4d38fc000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:447) - waiting to lock 0x7f07dd6bfa70 (a org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:427) …. Fred. Am Mittwoch, 28. September 2011 um 17:48 schrieb Ken Krugler: Hi Frederik, I haven't directly run into this issue with Solr, but I have experienced similar issues in a related context. In my case, I had a custom webapp that made SolrJ requests and then generated some aggregated/analyzed results. During load testing, we ran into a few different issues... 1. The load test software itself had an issue with scaling - I'm assuming that's not the case for you, but I've seen it happen more than once. E.g. there's a limit to max parallel connections in the client being used to talk to Solr. 2. We needed to tune up the SolrJ settings for the HttpConnectionManager Under heavy load, this was running out of free connections. Given you've got 20 shards, each request is going to spawn 20 HTTP connections. I don't know off the top of my head how solr.SearchHandler manages connections (and whether it's possible to tune this), but from the stack trace below it sure looks like you're blocked on getting free HTTP connections. 3. We needed to optimize our configuration for Jetty, Ubuntu, JVM GC, etc. There are lots of knobs to twiddle here, for better or worse. -- Ken On Sep 28, 2011, at 5:21am, Frederik Kraus wrote: I just had a look at the thread-dump, pasting 3 examples here: 'pool-31-thread-8233' Id=11626, BLOCKED on lock=org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool@19dd10d9, total cpu time=20.ms user time=20.ms at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool.freeConnection(MultiThreadedHttpConnectionManager.java:982) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.releaseConnection(MultiThreadedHttpConnectionManager.java:643) at org.apache.commons.httpclient.HttpConnection.releaseConnection(HttpConnection.java:1179) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.releaseConnection(MultiThreadedHttpConnectionManager.java:1423) at org.apache.commons.httpclient.HttpMethodBase.ensureConnectionRelease(HttpMethodBase.java:2430) at org.apache.commons.httpclient.HttpMethodBase.responseBodyConsumed(HttpMethodBase.java:2422) at org.apache.commons.httpclient.HttpMethodBase$1.responseConsumed(HttpMethodBase.java:1892) at org.apache.commons.httpclient.AutoCloseInputStream.notifyWatcher(AutoCloseInputStream.java:198) at org.apache.commons.httpclient.AutoCloseInputStream.close(AutoCloseInputStream.java:158) at org.apache.commons.httpclient.HttpMethodBase.releaseConnection(HttpMethodBase.java:1181) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:486) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:421) at org.apache.solr.handler.component.HttpCommComponent$1.call(SearchHandler.java:393) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at
Re: strange performance issue with many shards on one server
Am Mittwoch, 28. September 2011 um 16:40 schrieb Toke Eskildsen: On Wed, 2011-09-28 at 12:58 +0200, Frederik Kraus wrote: - 10 shards per server (needed for response times) running in a single tomcat instance Have you tested that sharding actually decreases response times in your case? I see the idea in decreasing response times with sharding at the cost of decreasing throughput, but the added overhead of merging is non-trivial. Yep unfortunately, the queries have huge boolean filterqueries for ACLs etc. which just take too long to compute in a single thread. - each query queries all 20 shards (distributed search) - each shard holds about 1.5 mio documents (small shards are needed due to rather complex queries) - all caches are warmed / high cache hit rates (99%) etc. Now for some reason we cannot seem to fully utilize all CPU power (no disk IO), ie. increasing concurrent users doesn't increase CPU-Load at a point, decreases throughput and increases the response times of the individual queries. It sounds as if there's a hard limit on the number of concurrent users somewhere. I am no expert in httpclient, but the blocked threads in your thread dump seems to indicate that they wait for connections to be established rather than for results to be produced. I seem to remember that tomcat has a default limit on 200 concurrent connections and with 10 shards/search, that is just 200 / (10 shard_connections + 1 incoming_connection) = 18 concurrent searches. I have gradually bumped all of this up to (almost) infinity with no effect ;) Also 1-2% of the queries take significantly longer: avg somewhere at 100ms while 1-2% take 1.5s or longer. Could be garbage collection, especially since it shows under high load which might result in more old objects and thereby trigger full gc. GC is only spending something like 50-100ms total for a 10min load test
Date Faceting | Range Faceting patch not working
Hi, We extensively use date faceting in our application, but now since the index has become very big we are dividing into shards. Since date/range faceting don't work on Shards I was trying to apply the path to my Solr, currently using 3.1 but planning for 3.4 upgrade. https://issues.apache.org/jira/browse/SOLR-1709 The path is not working on both 3.1 and 3.4 version, how else can I apply the patch? Regards, Rohit
Re: Solr messing up the UK GBP (pound) symbol in response, even though Java environment variabe has file encoding is set to UTF 8....
Thanks Chris. Yes, changing connector settings not just in solr but also in all webapps that were sending queries into it solved the problem! Appreciate the help. R On Tue, Sep 13, 2011 at 6:11 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Any idea why solr is unable to return the pound sign as-is? : : I tried typing in £ 1 million in Solr admin GUI and got following response. ... : str name=q£ 1 million/str ... : Here is my Java Properties I got also from admin interface: ... : catalina.home = : /home/rbhagdev/SCCRepos/SCC_Platform/search/solr/target/ Looks like you are using tomcat, so I suspect you are getting bit by this... https://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config If that's not the problem, please try running the example/exampledocs/test_utf8.sh script against your Solr instance (you'll need to change the URL variable to match your host:port) -Hoss
Re: Solr 3.4 Grouping group.main=true results in java.lang.NoClassDefFound
Hi Frank, How is Solr deployed? And how did you upgrade? The commons-lang library (containing ArrayUtils) is included in the Solr war file. Martijn On 28 September 2011 09:16, Frank Romweber fr...@romweber.de wrote: I use drupal for accessing the solr search engine. After updating an creating my new index everthing works as before. Then I activate the group=true and group.field=site and solr delivers me the wanted search results but in Drupal nothing appears just an empty search page. I found out that the group changes the resultset names. No problem solr offers for this case the group.main=true parameter. So I added this and get this 500 error. HTTP Status 500 - org/apache/commons/lang/ArrayUtils java.lang.NoClassDefFoundError: org/apache/commons/lang/ArrayUtils at org.apache.solr.search.Grouping$Command.createSimpleResponse(Grouping.java:573) at org.apache.solr.search.Grouping$CommandField.finish(Grouping.java:675) at org.apache.solr.search.Grouping.execute(Grouping.java:339) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:240) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662) I found out that solr didt find the class ArrayUtils.class. I try a lot of things to get this work. Setting JAVA_HOME and CLASSPATH vars and I changed the jre without any success. I am really wondering all my other programms are still running even solr in the normal mode is working and accesibly but not the group.main=true function. So my question is now what is nessesary to get this work? Any help is apreciated. Thx frank -- Met vriendelijke groet, Martijn van Groningen
Solr Hanging While Building Suggester Index
We have a separate Java process indexing to Solr using SolrJ. We are using Solr 3.4.0, and Jetty version 8.0.1.v20110908. We experienced Solr hanging today. For a period of approximately 10 minutes, it did not respond to queries. Our indexer sends a query to build a spellcheck index after committing once it's added all new documents (because we have auto-commits that we don't want to trigger rebuilding the spellcheck, we don't use buildOnCommit), and then sends a query to build the suggest component index. We see this from the Solr log during the period it was hung (we attempted to send several queries during this time, but they do not appear in the log, or appear after waiting for several minutes): 2011-09-28 13:18:03,217 [qtp10884088-13] INFO org.apache.solr.core.SolrCore - [report] webapp= path=/select params={spellcheck=trueqt=dismaxwt=javabinrows=0spellcheck.build=trueversion=2} hits=98772 status=0 QTime=173594 2011-09-28 13:28:18,857 [qtp10884088-89] INFO org.apache.solr.spelling.suggest.Suggester - build() ... 2011-09-28 13:29:02,873 [qtp10884088-89] INFO org.apache.solr.core.SolrCore - [report] webapp= path=/suggest params={spellcheck=trueqt=/suggestwt=javabinspellcheck.build=trueversion=2} status=0 QTime=44016 In our indexer log, we see just after this (13:28:19,217) the call to build our suggestion index (which comes right after building the spellcheck index) times out and throws a NoHttpResponseException: The server localhost failed to respond. Any ideas? Anything else we should look at to help diagnose? --- Stephen Duncan Jr www.stephenduncanjr.com
Re: strange performance issue with many shards on one server
Jaeger, Jay - DOT, il 28/09/2011 18:40, ha scritto: That would still show up as the CPU being busy. i don't know how the program (top, htop, whatever) displays the value but when the cpu has a cache miss definitely that thread sits and waits for a number of clock cycles with 130GB of ram (per server?) I suspect caches miss as a rule just a suspicion however, nothing I'll bet on
Re: Still too many files after running solr optimization
: I was worried because when i used to use only Lucene for the same indexing, : before optimization there are many files but after optimization i always end : up with just 3 files in my index filder. Just want to find out if this was : ok. It sounds like you were most likely using the Compound File Format (which causes multiple per-field files to be encapsultated into a single file per segment) when you were using Lucene directly (i believe it is the default) but in Solr you are not. check the useCompoundFile setting(s) in your solrconfig.xml https://lucene.apache.org/java/3_4_0/fileformats.html#Compound%20Files For most Solr users, the compound file format is a bad idea because it can decreases performance -- the only reason to use it is if you are in a heavily constraind setup where you need to be very restrictive about the number of open file handles. -Hoss
Trouble configuring multicore / accessing admin page
Hello, I am trying to get SOLR working with multiple cores and have a problem accessing the admin page once I configure multiple cores. Problem: When accessing the admin page via http://solrhost:8080/solr/admin, I get a 404, missing core name in path. Question: when using the multicore option, is the standard admin page still available? Environment: - solr 1.4.1 - Windows server 2008 R2 - Java SE 1.6u27 - Tomcat 6.0.33 - Solr Experience: none I have set -Dsolr.solr.home=c:\solr and within that I have a solr.xml with the following contents: solr persistent=true sharedLib=lib cores adminPath=/admij/cores core name=core0 instanceDir=cores/core0 / core name=core1 instanceDir=cores/core1 / /cores /solr I have copied the example/solr directory to c:\solr and have populated that directory with the cores/{core{0,1}} as well as the proper configs and data directories within. When I restart tomcat, it shows a couple of exceptions related to queryElevationComponent and null pointers that I think are due to the DB not yet being available but I see that the cores appear to initialize properly other than that So the problem I'm looking to solve/clarify here is the admin page - should that remain available and usable when using the multicore configuration or am I doing something wrong? Do I need to use the CoreAdminHandler type requests to manage multicore instead? Thanks, -- Josh Miller Open Source Solutions Architect (425) 737-2590 http://itsecureadmin.com/
Re: Trouble configuring multicore / accessing admin page
On 9/28/2011 1:40 PM, Joshua Miller wrote: I am trying to get SOLR working with multiple cores and have a problem accessing the admin page once I configure multiple cores. Problem: When accessing the admin page via http://solrhost:8080/solr/admin, I get a 404, missing core name in path. Question: when using the multicore option, is the standard admin page still available? When you enable multiple cores, the URL syntax becomes a little different. On 1.4.1 and 3.2.0, I ran into a problem where the trailing / is required on this URL, but that problem seems to be fixed in 3.4.0: http://host:port/solr/corename/admin/ If you put a defaultCoreName=somecore into the cores tag in solr.xml, the original /solr/admin URL should work as well. I just tried it on Solr 3.4.0 and it does work. According to the wiki, it should work in 1.4 as well. I don't have a 1.4.1 server any more, so I can't verify that. http://wiki.apache.org/solr/CoreAdmin#cores Thanks, Shawn
Re: Trouble configuring multicore / accessing admin page
Hi Joshua, Can you try updating your solr.xml as follows: Specify core name=core0 instanceDir=/core0 / instead of core name=core0 instanceDir=cores/core0 / Basically remove the extra text cores in the core element from the instanceDir attribute. Just try and let us know if it works. On Wed, Sep 28, 2011 at 3:40 PM, Joshua Miller jos...@itsecureadmin.comwrote: Hello, I am trying to get SOLR working with multiple cores and have a problem accessing the admin page once I configure multiple cores. Problem: When accessing the admin page via http://solrhost:8080/solr/admin, I get a 404, missing core name in path. Question: when using the multicore option, is the standard admin page still available? Environment: - solr 1.4.1 - Windows server 2008 R2 - Java SE 1.6u27 - Tomcat 6.0.33 - Solr Experience: none I have set -Dsolr.solr.home=c:\solr and within that I have a solr.xml with the following contents: solr persistent=true sharedLib=lib cores adminPath=/admij/cores core name=core0 instanceDir=cores/core0 / core name=core1 instanceDir=cores/core1 / /cores /solr I have copied the example/solr directory to c:\solr and have populated that directory with the cores/{core{0,1}} as well as the proper configs and data directories within. When I restart tomcat, it shows a couple of exceptions related to queryElevationComponent and null pointers that I think are due to the DB not yet being available but I see that the cores appear to initialize properly other than that So the problem I'm looking to solve/clarify here is the admin page - should that remain available and usable when using the multicore configuration or am I doing something wrong? Do I need to use the CoreAdminHandler type requests to manage multicore instead? Thanks, -- Josh Miller Open Source Solutions Architect (425) 737-2590 http://itsecureadmin.com/ -- Thanks and Regards Rahul A. Warawdekar
Re: Trouble configuring multicore / accessing admin page
On Sep 28, 2011, at 1:03 PM, Shawn Heisey wrote: On 9/28/2011 1:40 PM, Joshua Miller wrote: I am trying to get SOLR working with multiple cores and have a problem accessing the admin page once I configure multiple cores. Problem: When accessing the admin page via http://solrhost:8080/solr/admin, I get a 404, missing core name in path. Question: when using the multicore option, is the standard admin page still available? When you enable multiple cores, the URL syntax becomes a little different. On 1.4.1 and 3.2.0, I ran into a problem where the trailing / is required on this URL, but that problem seems to be fixed in 3.4.0: http://host:port/solr/corename/admin/ If you put a defaultCoreName=somecore into the cores tag in solr.xml, the original /solr/admin URL should work as well. I just tried it on Solr 3.4.0 and it does work. According to the wiki, it should work in 1.4 as well. I don't have a 1.4.1 server any more, so I can't verify that. http://wiki.apache.org/solr/CoreAdmin#cores Hi Shawn, Thanks for the quick response. I can't get any of those combinations to work. I've added the defaultCoreName=core0 into the solr.xml and restarted and tried the following combinations: http://host:port/solr/admin http://host:port/solr/admin/ http://host:port/solr/core0/admin/ … (and many others) I'm stuck on 1.4.1 at least temporarily as I'm taking over an application from another resource and need to get it up and running before modifying anything so any help here would be greatly appreciated. Thanks, Josh Miller Open Source Solutions Architect (425) 737-2590 http://itsecureadmin.com/
Re: Trouble configuring multicore / accessing admin page
On Sep 28, 2011, at 1:17 PM, Rahul Warawdekar wrote: Can you try updating your solr.xml as follows: Specify core name=core0 instanceDir=/core0 / instead of core name=core0 instanceDir=cores/core0 / Basically remove the extra text cores in the core element from the instanceDir attribute. I gave that a try and it didn't change anything. Thanks, Josh
RE: Trouble configuring multicore / accessing admin page
Just go to localhost:8983 (or whatever other port you are using) and use this path to see all the cores available on the box: In your example this should give you a core list: http://solrhost:8080/solr/ -Original Message- From: Joshua Miller [mailto:jos...@itsecureadmin.com] Sent: Wednesday, September 28, 2011 1:18 PM To: solr-user@lucene.apache.org Subject: Re: Trouble configuring multicore / accessing admin page On Sep 28, 2011, at 1:03 PM, Shawn Heisey wrote: On 9/28/2011 1:40 PM, Joshua Miller wrote: I am trying to get SOLR working with multiple cores and have a problem accessing the admin page once I configure multiple cores. Problem: When accessing the admin page via http://solrhost:8080/solr/admin, I get a 404, missing core name in path. Question: when using the multicore option, is the standard admin page still available? When you enable multiple cores, the URL syntax becomes a little different. On 1.4.1 and 3.2.0, I ran into a problem where the trailing / is required on this URL, but that problem seems to be fixed in 3.4.0: http://host:port/solr/corename/admin/ If you put a defaultCoreName=somecore into the cores tag in solr.xml, the original /solr/admin URL should work as well. I just tried it on Solr 3.4.0 and it does work. According to the wiki, it should work in 1.4 as well. I don't have a 1.4.1 server any more, so I can't verify that. http://wiki.apache.org/solr/CoreAdmin#cores Hi Shawn, Thanks for the quick response. I can't get any of those combinations to work. I've added the defaultCoreName=core0 into the solr.xml and restarted and tried the following combinations: http://host:port/solr/admin http://host:port/solr/admin/ http://host:port/solr/core0/admin/ ... (and many others) I'm stuck on 1.4.1 at least temporarily as I'm taking over an application from another resource and need to get it up and running before modifying anything so any help here would be greatly appreciated. Thanks, Josh Miller Open Source Solutions Architect (425) 737-2590 http://itsecureadmin.com/
Facet mappings
Hi, I got a set of values which needs to be mapped to a facet. For example, I want to map the codes SC, AC to the facet value 'Catering', HB to Half Board AI, IN to All\ inclusive I tried creating the following in the schema file. fieldType name=alpine_field_boardbasis class=solr.TextField sortMissingLast=true omitNorms=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=boardbasis_synonyms.txt ignoreCase=true expand=false/ /analyzer /fieldType copyField source=board_basis dest=Board Basis / field multiValued=false name=Board Basis type=field_boardbasis stored=false/ And in boardbasis_synonyms.txt SC = Self\ Catering CA = Catered\ Chalet HB = Half\ Board FB = Full\ Board RO = Room\ only\ no\ kitchen\ facilities EM = Self\ catering\ with\ evening\ meal BB = Bed\ \ Breakfast AI, IN = All\ inclusive But when I do a query (http://localhost:/solr/collection1/select/?q=brochure_year%3A12version=2.2start=0rows=1indent=onfacet=truefacet.field=Board%20Basis), I get the following lst name=Board Basis int name=catering455/int int name=self455/int int name=board281/int int name=half243/int int name=catered114/int int name=chalet114/int int name=63/int int name=bed63/int int name=breakfast63/int int name=evening45/int int name=meal45/int int name=with45/int int name=full38/int int name=all27/int int name=inclusive27/int int name=facilities9/int int name=kitchen9/int int name=no9/int int name=only9/int int name=room9/int I am expecting to see something like lst name=Board Basis int name=Catered Chalet455/int int name=Self Catering455/int int name=Half Board281/int Thanks in advance, Srikanth NT -- View this message in context: http://lucene.472066.n3.nabble.com/Facet-mappings-tp3377317p3377317.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Trouble configuring multicore / accessing admin page
On Sep 28, 2011, at 1:24 PM, Robert Petersen wrote: Just go to localhost:8983 (or whatever other port you are using) and use this path to see all the cores available on the box: In your example this should give you a core list: http://solrhost:8080/solr/ I see Welcome to Solr! and Solr Admin below that as a link. When I click through the link, I get the 404 error, missing core name in path. Thanks, Josh Miller Open Source Solutions Architect (425) 737-2590 http://itsecureadmin.com/
Re: Trouble configuring multicore / accessing admin page
On 9/28/2011 2:24 PM, Robert Petersen wrote: Just go to localhost:8983 (or whatever other port you are using) and use this path to see all the cores available on the box: In your example this should give you a core list: http://solrhost:8080/solr/ Now this is interesting. If I have defaultCoreName in my solr.xml (on 3.4.0), the /solr URL only shows one admin link, which takes me to the /solr/admin/ page for my default core. On that page, I do have links to all the other core admin pages, as usual. If I don't have defaultCoreName, /solr shows admin links for all defined cores. A quick search didn't turn up any Jira issues for this. Is this intended behavior? Thanks, Shawn
Re: Solr Hanging While Building Suggester Index
Is this a huge index? Keep in mind that most spellchecker implementations rebuild the index which can stall the entire process if there are millions of full text documents to process. There is a new implementation called DirectSolrSpellchecker that doens't so a complete rebuild but i haven't tried it yet but should work with the SuggesterComonent. It's still experimental though. We have a separate Java process indexing to Solr using SolrJ. We are using Solr 3.4.0, and Jetty version 8.0.1.v20110908. We experienced Solr hanging today. For a period of approximately 10 minutes, it did not respond to queries. Our indexer sends a query to build a spellcheck index after committing once it's added all new documents (because we have auto-commits that we don't want to trigger rebuilding the spellcheck, we don't use buildOnCommit), and then sends a query to build the suggest component index. We see this from the Solr log during the period it was hung (we attempted to send several queries during this time, but they do not appear in the log, or appear after waiting for several minutes): 2011-09-28 13:18:03,217 [qtp10884088-13] INFO org.apache.solr.core.SolrCore - [report] webapp= path=/select params={spellcheck=trueqt=dismaxwt=javabinrows=0spellcheck.build=truev ersion=2} hits=98772 status=0 QTime=173594 2011-09-28 13:28:18,857 [qtp10884088-89] INFO org.apache.solr.spelling.suggest.Suggester - build() ... 2011-09-28 13:29:02,873 [qtp10884088-89] INFO org.apache.solr.core.SolrCore - [report] webapp= path=/suggest params={spellcheck=trueqt=/suggestwt=javabinspellcheck.build=trueversio n=2} status=0 QTime=44016 In our indexer log, we see just after this (13:28:19,217) the call to build our suggestion index (which comes right after building the spellcheck index) times out and throws a NoHttpResponseException: The server localhost failed to respond. Any ideas? Anything else we should look at to help diagnose? --- Stephen Duncan Jr www.stephenduncanjr.com
RE: strange performance issue with many shards on one server
Yes, that thread waits (in the sense that nothing useful gets done), but during that time, from the perspective of the applications and OS, that CPU is busy: it is not waiting in such a way that you can dispatch a different process. The point is, that if this was actually the problem, it would show up in a higher CPU utilization than the correspondent reported. -Original Message- From: Federico Fissore [mailto:feder...@fissore.org] Sent: Wednesday, September 28, 2011 2:04 PM To: solr-user@lucene.apache.org Subject: Re: strange performance issue with many shards on one server Jaeger, Jay - DOT, il 28/09/2011 18:40, ha scritto: That would still show up as the CPU being busy. i don't know how the program (top, htop, whatever) displays the value but when the cpu has a cache miss definitely that thread sits and waits for a number of clock cycles with 130GB of ram (per server?) I suspect caches miss as a rule just a suspicion however, nothing I'll bet on
RE: Trouble configuring multicore / accessing admin page
One time when we had that problem, it was because one or more cores had a broken XML configuration file. Another time, it was because solr/home was not set right in the servlet container. Another time it was because we had an older EAR pointing to a newer release Solr home directory. Given what you did, I suppose that is possible in your case, too. In all cases, the Solr log provided hints as to what was going wrong. JRJ -Original Message- From: Joshua Miller [mailto:jos...@itsecureadmin.com] Sent: Wednesday, September 28, 2011 2:41 PM To: solr-user@lucene.apache.org Subject: Trouble configuring multicore / accessing admin page Hello, I am trying to get SOLR working with multiple cores and have a problem accessing the admin page once I configure multiple cores. Problem: When accessing the admin page via http://solrhost:8080/solr/admin, I get a 404, missing core name in path. Question: when using the multicore option, is the standard admin page still available? Environment: - solr 1.4.1 - Windows server 2008 R2 - Java SE 1.6u27 - Tomcat 6.0.33 - Solr Experience: none I have set -Dsolr.solr.home=c:\solr and within that I have a solr.xml with the following contents: solr persistent=true sharedLib=lib cores adminPath=/admij/cores core name=core0 instanceDir=cores/core0 / core name=core1 instanceDir=cores/core1 / /cores /solr I have copied the example/solr directory to c:\solr and have populated that directory with the cores/{core{0,1}} as well as the proper configs and data directories within. When I restart tomcat, it shows a couple of exceptions related to queryElevationComponent and null pointers that I think are due to the DB not yet being available but I see that the cores appear to initialize properly other than that So the problem I'm looking to solve/clarify here is the admin page - should that remain available and usable when using the multicore configuration or am I doing something wrong? Do I need to use the CoreAdminHandler type requests to manage multicore instead? Thanks, -- Josh Miller Open Source Solutions Architect (425) 737-2590 http://itsecureadmin.com/
UIMA DictionaryAnnotator partOfSpeach
Hi all, I have the dictionary Annotator UIMA-solr running, used my own dictionary file and it works, it will match all the words (Nouns, Verbs and Adjectives) from my dictionary file. *but now, if I only want to match Nouns, (ignore other part of speech)* how can I configure it? http://uima.apache.org/d/uima-addons-current/DictionaryAnnotator/DictionaryAnnotatorUserGuide.html From the above user guide, in section (3.3. Input Match Type Filters), i added the following code to my DictionaryAnnotatorDescriptor.xml, nameValuePair nameInputMatchFilterFeaturePath/name value string*partOfSpeach*/string /value /nameValuePair nameValuePair nameFilterConditionOperator/name value stringEQUALS/string /value /nameValuePair nameValuePair nameFilterConditionValue/name value stringnoun/string /value /nameValuePair but it fails, and the error said featurePathElementNames *partOfSpeach* is invalid. org.apache.uima.annotator.dict_annot.impl.DictionaryAnnotatorProcessException: EXCEPTION MESSAGE LOCALIZATION FAILED: java.util.MissingResourceException: Can't find bundle for base name org.apache.uima.annotator.dict_annot.dictionaryAnnotatorMessages, locale en_US at org.apache.uima.annotator.dict_annot.impl.FeaturePathInfo_impl.typeSystemInit(FeaturePathInfo_impl.java:110) at org.apache.uima.annotator.dict_annot.impl.DictionaryAnnotator.typeSystemInit(DictionaryAnnotator.java:383) at org.apache.uima.analysis_component.CasAnnotator_ImplBase.checkTypeSystemChange(CasAnnotator_ImplBase.java:100) at org.apache.uima.analysis_component.CasAnnotator_ImplBase.process(CasAnnotator_ImplBase.java:55) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:567) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.init(ASB_impl.java:409) at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267) at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267) at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:280) Any idea please, Thanks in advance.. Frankie -- View this message in context: http://lucene.472066.n3.nabble.com/UIMA-DictionaryAnnotator-partOfSpeach-tp3377440p3377440.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Trouble configuring multicore / accessing admin page
cores adminPath=/admij/cores Was that a cut and paste? If so, the /admij/cores is presumably incorrect, and ought to be /admin/cores -Original Message- From: Jaeger, Jay - DOT [mailto:jay.jae...@dot.wi.gov] Sent: Wednesday, September 28, 2011 4:10 PM To: solr-user@lucene.apache.org Subject: RE: Trouble configuring multicore / accessing admin page One time when we had that problem, it was because one or more cores had a broken XML configuration file. Another time, it was because solr/home was not set right in the servlet container. Another time it was because we had an older EAR pointing to a newer release Solr home directory. Given what you did, I suppose that is possible in your case, too. In all cases, the Solr log provided hints as to what was going wrong. JRJ -Original Message- From: Joshua Miller [mailto:jos...@itsecureadmin.com] Sent: Wednesday, September 28, 2011 2:41 PM To: solr-user@lucene.apache.org Subject: Trouble configuring multicore / accessing admin page Hello, I am trying to get SOLR working with multiple cores and have a problem accessing the admin page once I configure multiple cores. Problem: When accessing the admin page via http://solrhost:8080/solr/admin, I get a 404, missing core name in path. Question: when using the multicore option, is the standard admin page still available? Environment: - solr 1.4.1 - Windows server 2008 R2 - Java SE 1.6u27 - Tomcat 6.0.33 - Solr Experience: none I have set -Dsolr.solr.home=c:\solr and within that I have a solr.xml with the following contents: solr persistent=true sharedLib=lib cores adminPath=/admij/cores core name=core0 instanceDir=cores/core0 / core name=core1 instanceDir=cores/core1 / /cores /solr I have copied the example/solr directory to c:\solr and have populated that directory with the cores/{core{0,1}} as well as the proper configs and data directories within. When I restart tomcat, it shows a couple of exceptions related to queryElevationComponent and null pointers that I think are due to the DB not yet being available but I see that the cores appear to initialize properly other than that So the problem I'm looking to solve/clarify here is the admin page - should that remain available and usable when using the multicore configuration or am I doing something wrong? Do I need to use the CoreAdminHandler type requests to manage multicore instead? Thanks, -- Josh Miller Open Source Solutions Architect (425) 737-2590 http://itsecureadmin.com/
Re: Trouble configuring multicore / accessing admin page
On Sep 28, 2011, at 2:11 PM, Jaeger, Jay - DOT wrote: cores adminPath=/admij/cores Was that a cut and paste? If so, the /admij/cores is presumably incorrect, and ought to be /admin/cores No, that was a typo -- the config file is correct with admin/cores. Thanks for pointing out the mistake here. Josh Miller Open Source Solutions Architect (425) 737-2590 http://itsecureadmin.com/
Re: strange performance issue with many shards on one server
Yep, I'm not getting more than 50-60% CPU during those load tests. Am Mittwoch, 28. September 2011 um 23:01 schrieb Jaeger, Jay - DOT: Yes, that thread waits (in the sense that nothing useful gets done), but during that time, from the perspective of the applications and OS, that CPU is busy: it is not waiting in such a way that you can dispatch a different process. The point is, that if this was actually the problem, it would show up in a higher CPU utilization than the correspondent reported. -Original Message- From: Federico Fissore [mailto:feder...@fissore.org] Sent: Wednesday, September 28, 2011 2:04 PM To: solr-user@lucene.apache.org (mailto:solr-user@lucene.apache.org) Subject: Re: strange performance issue with many shards on one server Jaeger, Jay - DOT, il 28/09/2011 18:40, ha scritto: That would still show up as the CPU being busy. i don't know how the program (top, htop, whatever) displays the value but when the cpu has a cache miss definitely that thread sits and waits for a number of clock cycles with 130GB of ram (per server?) I suspect caches miss as a rule just a suspicion however, nothing I'll bet on
Re: Solr Hanging While Building Suggester Index
No, this is on a test system that is still smallish, approx 100,000 records of dummy data with Wikipedia articles as content at the time this occurred. I wouldn't expect rebuilding the index to stall the entire JVM, that seems excessive... Stephen Duncan Jr www.stephenduncanjr.com On Wed, Sep 28, 2011 at 4:43 PM, Markus Jelsma markus.jel...@openindex.io wrote: Is this a huge index? Keep in mind that most spellchecker implementations rebuild the index which can stall the entire process if there are millions of full text documents to process. There is a new implementation called DirectSolrSpellchecker that doens't so a complete rebuild but i haven't tried it yet but should work with the SuggesterComonent. It's still experimental though. We have a separate Java process indexing to Solr using SolrJ. We are using Solr 3.4.0, and Jetty version 8.0.1.v20110908. We experienced Solr hanging today. For a period of approximately 10 minutes, it did not respond to queries. Our indexer sends a query to build a spellcheck index after committing once it's added all new documents (because we have auto-commits that we don't want to trigger rebuilding the spellcheck, we don't use buildOnCommit), and then sends a query to build the suggest component index. We see this from the Solr log during the period it was hung (we attempted to send several queries during this time, but they do not appear in the log, or appear after waiting for several minutes): 2011-09-28 13:18:03,217 [qtp10884088-13] INFO org.apache.solr.core.SolrCore - [report] webapp= path=/select params={spellcheck=trueqt=dismaxwt=javabinrows=0spellcheck.build=truev ersion=2} hits=98772 status=0 QTime=173594 2011-09-28 13:28:18,857 [qtp10884088-89] INFO org.apache.solr.spelling.suggest.Suggester - build() ... 2011-09-28 13:29:02,873 [qtp10884088-89] INFO org.apache.solr.core.SolrCore - [report] webapp= path=/suggest params={spellcheck=trueqt=/suggestwt=javabinspellcheck.build=trueversio n=2} status=0 QTime=44016 In our indexer log, we see just after this (13:28:19,217) the call to build our suggestion index (which comes right after building the spellcheck index) times out and throws a NoHttpResponseException: The server localhost failed to respond. Any ideas? Anything else we should look at to help diagnose? --- Stephen Duncan Jr www.stephenduncanjr.com
Re: Facet mappings
(11/09/29 5:38), ntsrikanth wrote: Hi, I got a set of values which needs to be mapped to a facet. For example, I want to map the codes SC, AC to the facet value 'Catering', HB to Half Board AI, IN to All\ inclusive I tried creating the following in the schema file. fieldType name=alpine_field_boardbasis class=solr.TextField sortMissingLast=true omitNorms=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=boardbasis_synonyms.txt ignoreCase=true expand=false/ /analyzer /fieldType Use KeywordTokenizerFactory instead of StandardTokenizerFactory. The factory class should also be specified in filter/ for synonym like: filter class=solr.SynonymFilterFactory tokenizerFactory=solr.KeywordTokenizerFactory .../ as it uses WhitespaceTokenizerFactory to analyze synonyms.txt if tokenizer factory is not specified. koji -- Check out Query Log Visualizer for Apache Solr http://www.rondhuit-demo.com/loganalyzer/loganalyzer.html http://www.rondhuit.com/en/
Re: Boost Exact matches on Specific Fields
I will give str_category more weight than ts_category because we want str_category to win if they have exact matches ( you converted to lowercase). On Mon, Sep 26, 2011 at 10:23 PM, Balaji S mcabal...@gmail.com wrote: Hi You mean to say copy the String field to a Text field or the reverse . This is the approach I am currently following Step 1: Created a FieldType fieldType name=string_lower class=solr.TextField sortMissingLast=true omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / /analyzer /fieldType Step 2 : field name=str_category type=string_lower indexed=true stored=true/ Step 3 : copyField source=ts_category dest=str_category/ And in the SOLR Query planning to q=hospitalsqf=body^4.0 title^5.0 ts_category^10.0 str_category^8.0 The One Question I have here is All the above mentioned fields will have Hospital present in them , will the above approach work to get the exact match on the top and bring Hospitalization below in the results Thanks Balaji On Tue, Sep 27, 2011 at 9:38 AM, Way Cool way1.wayc...@gmail.com wrote: If I were you, probably I will try defining two fields: 1. ts_category as a string type 2. ts_category1 as a text_en type Make sure copy ts_category to ts_category1. You can use the following as qf in your dismax: qf=body^4.0 title^5.0 ts_category^10.0 ts_category1^5.0 or something like that. YH http://thetechietutorials.blogspot.com/ On Mon, Sep 26, 2011 at 2:06 PM, balaji mcabal...@gmail.com wrote: Hi all I am new to SOLR and have a doubt on Boosting the Exact Terms to the top on a Particular field For ex : I have a text field names ts_category and I want to give more boost to this field rather than other fields, SO in my Query I pass the following in the QF params qf=body^4.0 title^5.0 ts_category^21.0 and also sort on SCORE desc When I do a search against Hospitals . I get Hospitalization Management , Hospital Equipment Supplies on Top rather than the exact matches of Hospitals So It would be great , If I could be helped over here Thanks Balaji Thanks in Advance Balaji -- View this message in context: http://lucene.472066.n3.nabble.com/Boost-Exact-matches-on-Specific-Fields-tp3370513p3370513.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: strange performance issue with many shards on one server
Frederik Kraus, il 28/09/2011 23:16, ha scritto: Yep, I'm not getting more than 50-60% CPU during those load tests. I would try reducing the number of shards. A part from the memory discussion, this really seems to me a concurrency issue: too many threads waiting for other threads to complete, too many context switches... recently, on a lots-of-cores database server, we INCREASED speed by REDUCING the number of cores/threads each query was allowed to use (making sense of our customer investment) maybe you can get a similar effect by reducing the number of pieces your distributed search has to merge my 2 eurocents federico
Re: strange performance issue with many shards on one server
Come cache hit problems can be fixed with the Large Pages feature. http://www.google.com/search?q=large+pages On Wed, Sep 28, 2011 at 3:30 PM, Federico Fissore feder...@fissore.orgwrote: Frederik Kraus, il 28/09/2011 23:16, ha scritto: Yep, I'm not getting more than 50-60% CPU during those load tests. I would try reducing the number of shards. A part from the memory discussion, this really seems to me a concurrency issue: too many threads waiting for other threads to complete, too many context switches... recently, on a lots-of-cores database server, we INCREASED speed by REDUCING the number of cores/threads each query was allowed to use (making sense of our customer investment) maybe you can get a similar effect by reducing the number of pieces your distributed search has to merge my 2 eurocents federico -- Lance Norskog goks...@gmail.com
Re: Questions about LocalParams syntax
: 1.) How should I deal with repeating parameters? If I use multiple : boost queries, it seems that only the last one listed is used... for : example: : : ((_query_:{!dismax qf=\title^500 author^300 allfields\ bq=\format:Book^50\ bq=\format:Journal^150\}test)) Hmmm... that's either a bug or a silly limitation in the local params parsing -- I've file a Jira for it but i have no idea what the fix is (or if it was intentional for some odd reason) https://issues.apache.org/jira/browse/SOLR-2798 ...if you are interested in dig into the code to see what the cause might be and helping to work on a patch that would be awesome. : 2.) What is the proper way to escape quotes? Since there are multiple : nested layers of double quotes, things get ugly and it's easy to end up : with syntax errors. I found that this syntax doesn't cause an error: ... : ((_query_:{!dismax qf=\title^500 author^300 allfields\ bq=\format:\\\Book\\\^50\ bq=\format:\\\Journal\\\^150\}test)) backslash escaping should work, but you need to keep in mind that both the LocalParam syntax and most query parsers treat '' and '\' as significant characters, so you may have to escape them more times then you think For instance, even w/o local params, if you wanted a bq that contained a literal '', you'd need to escape it for the lucene query parser... bq=foo_s:inner\quote OR foo_s:other if you then wanted to use that bq as a quoted local param, you'd need to escape both the '\' and the original '' again ... q={!dismax bq=foo_s:inner\\\quote OR foo_s:other}foo ...and if you then wanted to use that entire {!dismax ... } string inside of a quoted expression using the _query_ hook of the Lucene QParser (which is what it looks like you are doig) you would need to escape *all* of those '\' and '' characters once more q=bob OR _query_:{!dismax bq=\foo_s:inner\\\quote OR foo_s:other\}foo ...and it should work (it does for me) But the other thing you can do to make your life a *lot* simpler is to leverage the parameter derefrencing and put each logical query string into it's own parameter... q=bob OR _query_:{!dismax bq=$myBq}foo myBq=foo_s:inner\quote OR foo_s:other ...or really make youre life easy... qq=foo q=bob OR _query_:{!dismax bq=$myBq v=$qq} myBq=foo_s:inner\quote OR foo_s:other -Hoss
Re: Boost Exact matches on Specific Fields
Yeah I will change the weight for str_category and make it higher . I converted it to lowercase because we cannot expect users to type them in the correct case Thanks Balaji On Thu, Sep 29, 2011 at 3:52 AM, Way Cool way1.wayc...@gmail.com wrote: I will give str_category more weight than ts_category because we want str_category to win if they have exact matches ( you converted to lowercase). On Mon, Sep 26, 2011 at 10:23 PM, Balaji S mcabal...@gmail.com wrote: Hi You mean to say copy the String field to a Text field or the reverse . This is the approach I am currently following Step 1: Created a FieldType fieldType name=string_lower class=solr.TextField sortMissingLast=true omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / /analyzer /fieldType Step 2 : field name=str_category type=string_lower indexed=true stored=true/ Step 3 : copyField source=ts_category dest=str_category/ And in the SOLR Query planning to q=hospitalsqf=body^4.0 title^5.0 ts_category^10.0 str_category^8.0 The One Question I have here is All the above mentioned fields will have Hospital present in them , will the above approach work to get the exact match on the top and bring Hospitalization below in the results Thanks Balaji On Tue, Sep 27, 2011 at 9:38 AM, Way Cool way1.wayc...@gmail.com wrote: If I were you, probably I will try defining two fields: 1. ts_category as a string type 2. ts_category1 as a text_en type Make sure copy ts_category to ts_category1. You can use the following as qf in your dismax: qf=body^4.0 title^5.0 ts_category^10.0 ts_category1^5.0 or something like that. YH http://thetechietutorials.blogspot.com/ On Mon, Sep 26, 2011 at 2:06 PM, balaji mcabal...@gmail.com wrote: Hi all I am new to SOLR and have a doubt on Boosting the Exact Terms to the top on a Particular field For ex : I have a text field names ts_category and I want to give more boost to this field rather than other fields, SO in my Query I pass the following in the QF params qf=body^4.0 title^5.0 ts_category^21.0 and also sort on SCORE desc When I do a search against Hospitals . I get Hospitalization Management , Hospital Equipment Supplies on Top rather than the exact matches of Hospitals So It would be great , If I could be helped over here Thanks Balaji Thanks in Advance Balaji -- View this message in context: http://lucene.472066.n3.nabble.com/Boost-Exact-matches-on-Specific-Fields-tp3370513p3370513.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: LocalParams, bq, and highlighting
: I've run into another strange behavior related to LocalParams syntax in : Solr 1.4.1. If I apply Dismax boosts using bq in LocalParams syntax, : the contents of the boost queries get used by the highlighter. : Obviously, when I use bq as a separate parameter, this is not an issue. ... : Is this a known limitation of the highlighter, or is it a bug? Is this : issue resolved in newer versions of Solr? I *think* what you're encountering here is just an inherent property of how the highlighter works. HighlightComponent asks the QueryComponent and/or default QParser for the highlight query to extract terms from for highlighting. With a request like this... http://localhost:8983/solr/select?defType=dismaxq=solrhl=truefl=namehl.fl=namebq=server ...DismaxQParser is the default query parser, and because of how it is designed to work (and designed to be used) it assumes that the main query should just be what's in the q param and not the other clauses like bq that were added to it for searching. In a query like this however... http://localhost:8983/solr/select?q=inStock:true+AND+_query_:{!dismax}solrhl=truefl=namehl.fl=namebq=server ...LuceneQParser is the default query parser, and it doesn't know/care what all of the subclauses are, or where they came from, or wether they are significant enough to the user that they should be included in the highlighting or not. It just knows that it has a query, so it gives it to the highlighter. So it is what it is. This is definitely an interesting case that i don't think anyone ever really considered before. It seems like a strong argument in favor of adding an hl.q param that the HighlightingComponent would use as an override for whatever the QueryComponent thinks the highlighting query should be, that way people expressing complex queries like you you describe could do something like... qq=solr q=inStock:true AND+_query_:{!dismax v=$qq} hl.q={!v=$qq} hl=true fl=name hl.fl=name bq=server ...what do you think? wanna file a Jira requesting this as a feature? Pretty sure the change would only require a few lines of code (but of course we'd also need JUnit tests which would probably be several dozen lines of code) -Hoss
Re: strategy for post-processing answer set
: it looks to me as if Solr just brings back the URLs. what I want to do is to : get the actual documents in the answer set, simplify their HTML and remove : all the javascript, ads, etc., and append them into a single document. : : Now ... does Nutch already have the documents? can I get them from its db? : or do I have to go get the documents again with something like a wget? i *think* what you are saying is that: a) you built your index using nutch b) when you query Solr, you only get back a url field for each matching document c) what you want is to combine the whole text of webpages corrisponding to all of those urls into one massive html page If that's the case,then you should either: 1) ask on the nutch-user mailing list about how to store the whole content of web pages that nutch crawls so you cna build up a page like this (nutch may already be doing it, i don't know -- depends on the schema) 2) write custom client code (probably outside of the score of velocity) to re-fetch these urls at query time and parse them and combine them as you see fit. which approach is right for you all depends on your goals and use case -- but solr can only give you back the fields you store in it. -Hoss
Re: autosuggest combination of data from documents and popular queries
: If user starts typing m i wil show mango as suggestion. And other : suggestions should come from the document title in index. So if I have a : document in index with title Man .. so suggestions would be : mango : man ... : Is this doable ? any options ? It's totally doable, and you've already done the hard part by building up a database of the popular queries you want to seed the suggestions with, abd building up an suggestion index where each document corrisponds to a single suggestion. but in order to also have suggestions come from the fields of your main index, you'll need to also add them as individual documents to that same suggestion index. you could either get those field values from whatever original source you used, or you crawl your own solr index. If you want individual *terms* from the index to be added as suggestions, then the LukeRequestHandler or the TermsComponent would probably be the easiest way to extract them. -Hoss
Re: Solr Cloud Number of Shard Limitation?
Thanks Mark found the TODO in ZkStateReader.java // TODO: - possibly: incremental update rather than reread everything Was there a patch they provided back to address this? On Tue, Sep 27, 2011 at 9:20 PM, Mark Miller markrmil...@gmail.com wrote: On Sep 26, 2011, at 11:42 AM, Jamie Johnson wrote: Is there any limitation, be it technical or for sanity reasons, on the number of shards that can be part of a solr cloud implementation? The loggly guys ended up hitting a limit somewhere. Essentially, whenever the cloud state is updated, info is read about each shard to update the state (from zookeeper). There is a TODO that I put in there that says something like, consider updating this incrementally - usually the data on most shards has not changed, so no reason to read it all. They implemented that today in their own code, but we have not yet done this in trunk. What that places the upper limit at, I don't know - I imagine it takes quite a few shards before it ends up being too much of a problem - they shard by user I believe, so lot's of shards. - Mark Miller lucidimagination.com 2011.lucene-eurocon.org | Oct 17-20 | Barcelona
Re: Solr Cloud Number of Shard Limitation?
No, we don't have any patches for it yet. You might make a JIRA issue for it? I think the big win is a fairly easy one - basically, right now when we update the cloud state, we look at the children of the 'shards' node, and then we read the data at each node individually. I imagine this is the part that breaks down :) We have already likely have most of that info though - really, you should just have to compare the children of the 'shards' node with the list we already have from the last time we got the cloud state - remove any that are no longer in the list, read the data for those not in the list, and get your new state efficiently. - Mark Miller lucidimagination.com 2011.lucene-eurocon.org | Oct 17-20 | Barcelona On Sep 28, 2011, at 10:35 PM, Jamie Johnson wrote: Thanks Mark found the TODO in ZkStateReader.java // TODO: - possibly: incremental update rather than reread everything Was there a patch they provided back to address this? On Tue, Sep 27, 2011 at 9:20 PM, Mark Miller markrmil...@gmail.com wrote: On Sep 26, 2011, at 11:42 AM, Jamie Johnson wrote: Is there any limitation, be it technical or for sanity reasons, on the number of shards that can be part of a solr cloud implementation? The loggly guys ended up hitting a limit somewhere. Essentially, whenever the cloud state is updated, info is read about each shard to update the state (from zookeeper). There is a TODO that I put in there that says something like, consider updating this incrementally - usually the data on most shards has not changed, so no reason to read it all. They implemented that today in their own code, but we have not yet done this in trunk. What that places the upper limit at, I don't know - I imagine it takes quite a few shards before it ends up being too much of a problem - they shard by user I believe, so lot's of shards. - Mark Miller lucidimagination.com 2011.lucene-eurocon.org | Oct 17-20 | Barcelona
Re: autosuggest combination of data from documents and popular queries
hi hoss, This helps.. But as I understand TermsComponent does not allow sort on popularity..Just coun|index. Or I m missing something? If TermsComponent allows custom sorting i dont even have to use ngrams. Any thoughts? abhay -- View this message in context: http://lucene.472066.n3.nabble.com/autosuggest-combination-of-data-from-documents-and-popular-queries-tp3360657p3378096.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Cloud Number of Shard Limitation?
I'll definitely create a JIRA for this. Looking at the code in CloudState I think we could do the following as we iterate over shardINames we check to see if the oldCloudState had the slice already, if so get the state from there, otherwise do what is already happening. Something like the following: for (String shardIdZkPath : shardIdNames) { Slice slice = null; if(oldCloudState.liveNodesContain(shardIdZkPath)) { slice = oldCloudState.getCollectionStates().get(collection).get(shardIdZkPath); } if(slice == null){ MapString,ZkNodeProps shardsMap = readShards(zkClient, shardIdPaths + / + shardIdZkPath); slice = new Slice(shardIdZkPath, shardsMap); } slices.put(shardIdZkPath, slice); } I don't see a need to remove the old states since we only keep the states that are already in oldCloudState and read new ones. Does that make sense? On Wed, Sep 28, 2011 at 11:01 PM, Mark Miller markrmil...@gmail.com wrote: No, we don't have any patches for it yet. You might make a JIRA issue for it? I think the big win is a fairly easy one - basically, right now when we update the cloud state, we look at the children of the 'shards' node, and then we read the data at each node individually. I imagine this is the part that breaks down :) We have already likely have most of that info though - really, you should just have to compare the children of the 'shards' node with the list we already have from the last time we got the cloud state - remove any that are no longer in the list, read the data for those not in the list, and get your new state efficiently. - Mark Miller lucidimagination.com 2011.lucene-eurocon.org | Oct 17-20 | Barcelona On Sep 28, 2011, at 10:35 PM, Jamie Johnson wrote: Thanks Mark found the TODO in ZkStateReader.java // TODO: - possibly: incremental update rather than reread everything Was there a patch they provided back to address this? On Tue, Sep 27, 2011 at 9:20 PM, Mark Miller markrmil...@gmail.com wrote: On Sep 26, 2011, at 11:42 AM, Jamie Johnson wrote: Is there any limitation, be it technical or for sanity reasons, on the number of shards that can be part of a solr cloud implementation? The loggly guys ended up hitting a limit somewhere. Essentially, whenever the cloud state is updated, info is read about each shard to update the state (from zookeeper). There is a TODO that I put in there that says something like, consider updating this incrementally - usually the data on most shards has not changed, so no reason to read it all. They implemented that today in their own code, but we have not yet done this in trunk. What that places the upper limit at, I don't know - I imagine it takes quite a few shards before it ends up being too much of a problem - they shard by user I believe, so lot's of shards. - Mark Miller lucidimagination.com 2011.lucene-eurocon.org | Oct 17-20 | Barcelona
Re: UIMA DictionaryAnnotator partOfSpeach
At first glance it seems like a simple localization issue as indicated by this: org.apache.uima.annotator.dict_annot.impl.DictionaryAnnotatorProcessException: EXCEPTION MESSAGE LOCALIZATION FAILED: java.util.MissingResourceException: Can't find bundle for base name org.apache.uima.annotator.dict_annot.dictionaryAnnotatorMessages, locale en_US Perhaps you can get the source code for UIMA and run the server hosting Solr in debug mode then remote connect to it via eclipse or some other IDE and use a breakpoint to figure out which resource is the issue. After that it would be UIMA specific solution, I think. On Wed, Sep 28, 2011 at 4:11 PM, chanhangfai chanhang...@hotmail.com wrote: Hi all, I have the dictionary Annotator UIMA-solr running, used my own dictionary file and it works, it will match all the words (Nouns, Verbs and Adjectives) from my dictionary file. *but now, if I only want to match Nouns, (ignore other part of speech)* how can I configure it? http://uima.apache.org/d/uima-addons-current/DictionaryAnnotator/DictionaryAnnotatorUserGuide.html From the above user guide, in section (3.3. Input Match Type Filters), i added the following code to my DictionaryAnnotatorDescriptor.xml, nameValuePair nameInputMatchFilterFeaturePath/name value string*partOfSpeach*/string /value /nameValuePair nameValuePair nameFilterConditionOperator/name value stringEQUALS/string /value /nameValuePair nameValuePair nameFilterConditionValue/name value stringnoun/string /value /nameValuePair but it fails, and the error said featurePathElementNames *partOfSpeach* is invalid. org.apache.uima.annotator.dict_annot.impl.DictionaryAnnotatorProcessException: EXCEPTION MESSAGE LOCALIZATION FAILED: java.util.MissingResourceException: Can't find bundle for base name org.apache.uima.annotator.dict_annot.dictionaryAnnotatorMessages, locale en_US at org.apache.uima.annotator.dict_annot.impl.FeaturePathInfo_impl.typeSystemInit(FeaturePathInfo_impl.java:110) at org.apache.uima.annotator.dict_annot.impl.DictionaryAnnotator.typeSystemInit(DictionaryAnnotator.java:383) at org.apache.uima.analysis_component.CasAnnotator_ImplBase.checkTypeSystemChange(CasAnnotator_ImplBase.java:100) at org.apache.uima.analysis_component.CasAnnotator_ImplBase.process(CasAnnotator_ImplBase.java:55) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.callAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377) at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.processAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.processUntilNextOutputCas(ASB_impl.java:567) at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterator.init(ASB_impl.java:409) at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.java:342) at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.processAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267) at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:267) at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(AnalysisEngineImplBase.java:280) Any idea please, Thanks in advance.. Frankie -- View this message in context: http://lucene.472066.n3.nabble.com/UIMA-DictionaryAnnotator-partOfSpeach-tp3377440p3377440.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: basic solr cloud questions
@Darren: I feel that the question itself is misleading. Creating shards is meant to separate out the data ... not keep the exact same copy of it. I think the two node setup that was attempted by Sam mislead him and us into thinking that configuring two nodes which are to be named shard1 ... somehow means that they are instantly replicated too ... this is not the case! I can see how this misunderstanding can develop as I too was confused until Yury cleared it up. @Sam: If you are interested in performing a quick exercise to understand the pieces involved for replication rather than sharding ... perhaps this link would be of help in taking you through it: http://pulkitsinghal.blogspot.com/2011/09/setup-solr-master-slave-replication.html - Pulkit 2011/9/27 Yury Kats yuryk...@yahoo.com: On 9/27/2011 5:16 PM, Darren Govoni wrote: On 09/27/2011 05:05 PM, Yury Kats wrote: You need to either submit the docs to both nodes, or have a replication setup between the two. Otherwise they are not in sync. I hope that's not the case. :/ My understanding (or hope maybe) is that the new Solr Cloud implementation will support auto-sharding and distributed indexing. This means that shards will receive different documents regardless of which node received the submitted document (spread evenly based on a hash-node assignment). Distributed queries will thus merge all the solr shard/node responses. All cores in the same shard must somehow have the same index. Only then can you continue servicing searches when individual cores fail. Auto-sharding and distributed indexing don't have anything to do with this. In the future, SolrCloud may be managing replication between cores in the same shard automatically. But right now it does not.
Re: Why I can't take an full-import with entity name?
Can you monitor the DB side to see what results it returned for that query? 2011/8/30 于浩 yuhao.1...@gmail.com: I am using solr1.3,I updated solr index throgh solr delta import every two hours. but the delta import is database connection wasteful. So i want to use full-import with entity name instead of delta import. my db-data-config.xml file: entity name=article pk=Article_ID query=select Article_ID,Article_Title,Article_Abstract from Article_Detail field name=Article_ID column=Article_ID / /entity entity name=delta_article pk=Article_ID rootEngity=false query=select Article_ID,Article_Title,Article_Abstract from Article_Detail where Article_IDgt;'${dataimporter.request.minID}' and Article_ID lt;='{dataimporter.request.maxID}' field name=Article_ID column=Article_ID / /entity then I uses http://192.168.1.98:8081/solr/db_article/dataimport?command=full-importentity=delta_articlecommit=trueclean=falsemaxID=1000minID=10 but the solr will finish nearyly instant,and there is no any record imported. but what the fact is there are many records meets the condtion of maxID and minID. the tomcat log: 信息: [db_article] webapp=/solr path=/dataimport params={maxID=6737277clean=falsecommit=trueentity=delta_articlecommand=full-importminID=6736841} status=0 QTime=0 2011-8-29 19:00:03 org.apache.solr.handler.dataimport.DataImporter doFullImport 信息: Starting Full Import 2011-8-29 19:00:03 org.apache.solr.handler.dataimport.SolrWriter readIndexerProperties 信息: Read dataimport.properties 2011-8-29 19:00:03 org.apache.solr.handler.dataimport.SolrWriter persistStartTime 信息: Wrote last indexed time to dataimport.properties 2011-8-29 19:00:03 org.apache.solr.handler.dataimport.DocBuilder commit 信息: Full Import completed successfully some body who can help or some advices?
Re: SolrCloud: is there a programmatic way to create an ensemble
Did you find out about this? 2011/8/2 Yury Kats yuryk...@yahoo.com: I have multiple SolrCloud instances, each running its own Zookeeper (Solr launched with -DzkRun). I would like to create an ensemble out of them. I know about -DzkHost parameter, but can I achieve the same programmatically? Either with SolrJ or REST API? Thanks, Yury
Re: Solr Cloud Number of Shard Limitation?
So I tested what I wrote, and man was that wrong. I have updated it and created a JIRA for this issue. I also attached a patch which will patch CloudState to address this issue. Feedback is appreciated. https://issues.apache.org/jira/browse/SOLR-2799 On Wed, Sep 28, 2011 at 11:46 PM, Jamie Johnson jej2...@gmail.com wrote: I'll definitely create a JIRA for this. Looking at the code in CloudState I think we could do the following as we iterate over shardINames we check to see if the oldCloudState had the slice already, if so get the state from there, otherwise do what is already happening. Something like the following: for (String shardIdZkPath : shardIdNames) { Slice slice = null; if(oldCloudState.liveNodesContain(shardIdZkPath)) { slice = oldCloudState.getCollectionStates().get(collection).get(shardIdZkPath); } if(slice == null){ MapString,ZkNodeProps shardsMap = readShards(zkClient, shardIdPaths + / + shardIdZkPath); slice = new Slice(shardIdZkPath, shardsMap); } slices.put(shardIdZkPath, slice); } I don't see a need to remove the old states since we only keep the states that are already in oldCloudState and read new ones. Does that make sense? On Wed, Sep 28, 2011 at 11:01 PM, Mark Miller markrmil...@gmail.com wrote: No, we don't have any patches for it yet. You might make a JIRA issue for it? I think the big win is a fairly easy one - basically, right now when we update the cloud state, we look at the children of the 'shards' node, and then we read the data at each node individually. I imagine this is the part that breaks down :) We have already likely have most of that info though - really, you should just have to compare the children of the 'shards' node with the list we already have from the last time we got the cloud state - remove any that are no longer in the list, read the data for those not in the list, and get your new state efficiently. - Mark Miller lucidimagination.com 2011.lucene-eurocon.org | Oct 17-20 | Barcelona On Sep 28, 2011, at 10:35 PM, Jamie Johnson wrote: Thanks Mark found the TODO in ZkStateReader.java // TODO: - possibly: incremental update rather than reread everything Was there a patch they provided back to address this? On Tue, Sep 27, 2011 at 9:20 PM, Mark Miller markrmil...@gmail.com wrote: On Sep 26, 2011, at 11:42 AM, Jamie Johnson wrote: Is there any limitation, be it technical or for sanity reasons, on the number of shards that can be part of a solr cloud implementation? The loggly guys ended up hitting a limit somewhere. Essentially, whenever the cloud state is updated, info is read about each shard to update the state (from zookeeper). There is a TODO that I put in there that says something like, consider updating this incrementally - usually the data on most shards has not changed, so no reason to read it all. They implemented that today in their own code, but we have not yet done this in trunk. What that places the upper limit at, I don't know - I imagine it takes quite a few shards before it ends up being too much of a problem - they shard by user I believe, so lot's of shards. - Mark Miller lucidimagination.com 2011.lucene-eurocon.org | Oct 17-20 | Barcelona
Query failing because of omitTermFreqAndPositions
Hi All, My schema consisted of field textForQuery which was defined as field name=textForQuery type=text indexed=true stored=false multiValued=true/ After indexing 10 lakhs of documents I changed the field to field name=textForQuery type=text indexed=true stored=false multiValued=true *omitTermFreqAndPositions=true*/ So documents that were indexed after that omiited the position information of the terms. As a result I was not able to search the text which rely on position information for eg. coke studio at mtv even though its present in some documents. So I again changed the field textForQuery to field name=textForQuery type=text indexed=true stored=false multiValued=true/ But now even for new documents added the query requiring positon information is still failing. For example i reindexed certain documents that consisted of coke studio at mtv but still the query is not returning any documents when searched for *textForQuery:coke studio at mtv* Can anyone please help me out why this is happening -- Thanks Regards, Isan Fulia.
Re: SolrCloud: is there a programmatic way to create an ensemble
I'm not a solrcloud guru, but why not start your zookeeper quorum separately? I also believe that you can specify a zoo.cfg file which will create a zk quorum from solr example zoo.cfg (from http://zookeeper.apache.org/doc/current/zookeeperStarted.html#sc_RunningReplicatedZooKeeper) tickTime=2000 dataDir=/var/zookeeper clientPort=2181 initLimit=5 syncLimit=2 server.1=zoo1:2888:3888 server.2=zoo2:2888:3888 server.3=zoo3:2888:3888 On Thu, Sep 29, 2011 at 12:17 AM, Pulkit Singhal pulkitsing...@gmail.com wrote: Did you find out about this? 2011/8/2 Yury Kats yuryk...@yahoo.com: I have multiple SolrCloud instances, each running its own Zookeeper (Solr launched with -DzkRun). I would like to create an ensemble out of them. I know about -DzkHost parameter, but can I achieve the same programmatically? Either with SolrJ or REST API? Thanks, Yury