Re: exceeded limit of maxWarmingSearchers ERROR
of a document that was used. You could copy/paste that to try it out. 4. JVM tuning and performance result based on Multithreaded environment. 5. Machine Details (RAM, CPU, and settings from SOLR perspective). Default Solr settings with the shipped jetty container. The startup script used is available when you download Solr 3.3 with RankingAlgorithm. It has mx set to 2Gb and uses the default collector with parallel collection enabled for the young generation. The system is a x86_64 Linux (2.6 kernel), 2 core (2.5Ghz) and uses internal disks for indexing. My suggestion would be to download a version of Solr 3.3 with RankingAlgorithm and give it a try to see if any changes are needed from your existing setup. Regards, - Nagendra Nagarajayya http://solr-ra.tgels.org http://rankingalgorithm.tgels.**org http://rankingalgorithm.tgels.org Hoping that you are getting my point. We want to benchmark the performance. If you can involve me in your group, that would be great. Thanks Naveen 2011/8/15 Nagendra Nagarajayyannagarajayya@**transaxtions.comnnagaraja...@transaxtions.com Bill: I did look at Marks performance tests. Looks very interesting. Here is the Apacle Solr 3.3 with RankingAlgorithm NRT performance: http://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search_**ver_**3.xhttp://solr-ra.tgels.com/wiki/**en/Near_Real_Time_Search_ver_**3.x http://solr-ra.**tgels.com/wiki/en/Near_Real_**Time_Search_ver_3.xhttp://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search_ver_3.x Regards - Nagendra Nagarajayya http://solr-ra.tgels.org http://rankingalgorithm.tgels.orghttp://rankingalgorithm.** tgels.org http://rankingalgorithm.tgels.org On 8/14/2011 7:47 PM, Bill Bell wrote: I understand. Have you looked at Mark's patch? From his performance tests, it looks pretty good. When would RA work better? Bill On 8/14/11 8:40 PM, Nagendra Nagarajayyannagarajayya@** transaxtions.comnnagarajayya@**transaxtions.comnnagaraja...@transaxtions.com wrote: Bill: The technical details of the NRT implementation in Apache Solr with RankingAlgorithm (SOLR-RA) is available here: http://solr-ra.tgels.com/papers/NRT_Solr_RankingAlgorithm.pdfhttp://solr-ra.tgels.com/**papers/NRT_Solr_**RankingAlgorithm.pdf http://**solr-ra.tgels.com/papers/NRT_**Solr_RankingAlgorithm.pdfhttp://solr-ra.tgels.com/papers/NRT_Solr_RankingAlgorithm.pdf (Some changes for Solr 3.x, but for most it is as above) Regarding support for 4.0 trunk, should happen sometime soon. Regards - Nagendra Nagarajayya http://solr-ra.tgels.org http://rankingalgorithm.tgels.orghttp://rankingalgorithm.** tgels.org http://rankingalgorithm.tgels.org On 8/14/2011 7:11 PM, Bill Bell wrote: OK, I'll ask the elephant in the roomŠ. What is the difference between the new UpdateHandler from Mark and the SOLR-RA? The UpdateHandler works with 4.0 does SOLR-RA work with 4.0 trunk? Pros/Cons? On 8/14/11 8:10 PM, Nagendra Nagarajayyannagarajayya@**tr**ansaxtions.comhttp://transaxtions.com nnagarajayya@**transaxtions.com nnagaraja...@transaxtions.com wrote: Naveen: NRT with Apache Solr 3.3 and RankingAlgorithm does need a commit for a document to become searchable. Any document that you add through update becomes immediately searchable. So no need to commit from within your update client code. Since there is no commit, the cache does not have to be cleared or the old searchers closed or new searchers opened, and warmed (error that you are facing). Regards - Nagendra Nagarajayya http://solr-ra.tgels.org http://rankingalgorithm.tgels.orghttp://rankingalgorithm.** tgels.org http://rankingalgorithm.tgels.org On 8/14/2011 10:37 AM, Naveen Gupta wrote: Hi Mark/Erick/Nagendra, I was not very confident about NRT at that point of time, when we started project almost 1 year ago, definitely i would try NRT and see the performance. The current requirement was working fine till we were using commitWithin 10 millisecs in the XMLDocument which we were posting to SOLR. But due to which, we were getting very poor performance (almost 3 mins for 15,000 docs) per user. There are many paraller user committing to our SOLR. So we removed the commitWithin, and hence performance was much much better. But then we are getting this maxWarmingSearcher Error, because we are committing separately as a curl request after once entire doc is submitted for indexing. The question here is what is difference between commitWithin and commit (apart from the fact that commit takes memory and processes and additional hardware usage) Why we want it to be visible as soon as possible, since we are applying many business rules on top of the results (older indexes as well as new one) and apply different filters. upto 5 mins is fine for us. but more than that we need to think then other optimizations. We will definitely try NRT
Re: exceeded limit of maxWarmingSearchers ERROR
Nagendra You wrote, Naveen: *NRT with Apache Solr 3.3 and RankingAlgorithm does need a commit for a document to become searchable*. Any document that you add through update becomes immediately searchable. So no need to commit from within your update client code. Since there is no commit, the cache does not have to be cleared or the old searchers closed or new searchers opened, and warmed (error that you are facing). Looking at the link which you mentioned is clearly what we wanted. But the real thing is that you have RA does need a commit for a document to become searchable (please take a look at bold sentence) . In future, for more loads, can it cater to Master Slave (Replication) and etc to scale and perform better? If yes, we would like to go for NRT and looking at the performance described in the article is acceptable. We were expecting the same real time performance for a single user. What about multiple users, should we wait for 1-2 secs before calling the curl request to make SOLR perform better. Or internally it will handle with multiple request (multithreaded and etc). What would be doc size (10,000 docs) to allow JVM perform better? Have you done any kind of benchmarking in terms of multi threaded and multi user for NRT and also JVM tuning in terms of SOLR sever performance. Any kind of performance analysis would help us to decide quickly to switch over to NRT. Questions in terms for switching over to NRT, 1.Should we upgrade to SOLR 4.x ? 2. Any benchmarking (10,000 docs/secs). The question here is more specific the detail of individual doc (fields, number of fields, fields size, parameters affecting performance with faceting or w/o faceting) 3. What about multiple users ? A user in real time might be having an large doc size of .1 million. How to break and analyze which one is better (though it is our task to do). But still any kind of break up will help us. Imagine a user inbox. 4. JVM tuning and performance result based on Multithreaded environment. 5. Machine Details (RAM, CPU, and settings from SOLR perspective). Hoping that you are getting my point. We want to benchmark the performance. If you can involve me in your group, that would be great. Thanks Naveen 2011/8/15 Nagendra Nagarajayya nnagaraja...@transaxtions.com Bill: I did look at Marks performance tests. Looks very interesting. Here is the Apacle Solr 3.3 with RankingAlgorithm NRT performance: http://solr-ra.tgels.com/wiki/**en/Near_Real_Time_Search_ver_**3.xhttp://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search_ver_3.x Regards - Nagendra Nagarajayya http://solr-ra.tgels.org http://rankingalgorithm.tgels.**org http://rankingalgorithm.tgels.org On 8/14/2011 7:47 PM, Bill Bell wrote: I understand. Have you looked at Mark's patch? From his performance tests, it looks pretty good. When would RA work better? Bill On 8/14/11 8:40 PM, Nagendra Nagarajayyannagarajayya@** transaxtions.com nnagaraja...@transaxtions.com wrote: Bill: The technical details of the NRT implementation in Apache Solr with RankingAlgorithm (SOLR-RA) is available here: http://solr-ra.tgels.com/**papers/NRT_Solr_**RankingAlgorithm.pdfhttp://solr-ra.tgels.com/papers/NRT_Solr_RankingAlgorithm.pdf (Some changes for Solr 3.x, but for most it is as above) Regarding support for 4.0 trunk, should happen sometime soon. Regards - Nagendra Nagarajayya http://solr-ra.tgels.org http://rankingalgorithm.tgels.**org http://rankingalgorithm.tgels.org On 8/14/2011 7:11 PM, Bill Bell wrote: OK, I'll ask the elephant in the roomŠ. What is the difference between the new UpdateHandler from Mark and the SOLR-RA? The UpdateHandler works with 4.0 does SOLR-RA work with 4.0 trunk? Pros/Cons? On 8/14/11 8:10 PM, Nagendra Nagarajayyannagarajayya@**transaxtions.comnnagaraja...@transaxtions.com wrote: Naveen: NRT with Apache Solr 3.3 and RankingAlgorithm does need a commit for a document to become searchable. Any document that you add through update becomes immediately searchable. So no need to commit from within your update client code. Since there is no commit, the cache does not have to be cleared or the old searchers closed or new searchers opened, and warmed (error that you are facing). Regards - Nagendra Nagarajayya http://solr-ra.tgels.org http://rankingalgorithm.tgels.**orghttp://rankingalgorithm.tgels.org On 8/14/2011 10:37 AM, Naveen Gupta wrote: Hi Mark/Erick/Nagendra, I was not very confident about NRT at that point of time, when we started project almost 1 year ago, definitely i would try NRT and see the performance. The current requirement was working fine till we were using commitWithin 10 millisecs in the XMLDocument which we were posting to SOLR. But due to which, we were getting very poor performance (almost 3 mins for 15,000 docs) per user. There are many paraller user committing to our SOLR. So we removed the commitWithin, and hence
Re: exceeded limit of maxWarmingSearchers ERROR
Hi Mark/Erick/Nagendra, I was not very confident about NRT at that point of time, when we started project almost 1 year ago, definitely i would try NRT and see the performance. The current requirement was working fine till we were using commitWithin 10 millisecs in the XMLDocument which we were posting to SOLR. But due to which, we were getting very poor performance (almost 3 mins for 15,000 docs) per user. There are many paraller user committing to our SOLR. So we removed the commitWithin, and hence performance was much much better. But then we are getting this maxWarmingSearcher Error, because we are committing separately as a curl request after once entire doc is submitted for indexing. The question here is what is difference between commitWithin and commit (apart from the fact that commit takes memory and processes and additional hardware usage) Why we want it to be visible as soon as possible, since we are applying many business rules on top of the results (older indexes as well as new one) and apply different filters. upto 5 mins is fine for us. but more than that we need to think then other optimizations. We will definitely try NRT. But please tell me other options which we can apply in order to optimize.? Thanks Naveen On Sun, Aug 14, 2011 at 9:42 PM, Erick Erickson erickerick...@gmail.comwrote: Ah, thanks, Mark... I must have been looking at the wrong JIRAs. Erick On Sun, Aug 14, 2011 at 10:02 AM, Mark Miller markrmil...@gmail.com wrote: On Aug 14, 2011, at 9:03 AM, Erick Erickson wrote: You either have to go to near real time (NRT), which is under development, but not committed to trunk yet NRT support is committed to trunk. - Mark Miller lucidimagination.com
exceeded limit of maxWarmingSearchers ERROR
Hi, Most of the settings are default. We have single node (Memory 1 GB, Index Size 4GB) We have a requirement where we are doing very fast commit. This is kind of real time requirement where we are polling many threads from third party and indexes into our system. We want these results to be available soon. We are committing for each user (may have 10k threads and inside that 1 thread may have 10 messages). So overall documents per user will be having around .1 million (10) Earlier we were using commit Within as 10 milliseconds inside the document, but that was slowing the indexing and we were not getting any error. As we removed the commit Within, indexing became very fast. But after that we started experiencing in the system As i read many forums, everybody told that this is happening because of very fast commit rate, but what is the solution for our problem? We are using CURL to post the data and commit Also till now we are using default solrconfig. Aug 14, 2011 12:12:04 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error opening new searcher. exceeded limit of maxWarmingSearchers=2, try again later. at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1052) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:424) at org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:85) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:177) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662)
Re: LockObtainFailedException
HI Peter I found the issue, Actually we were getting this exception because of JVM space. I allocated 512 xms and 1024 xmx .. finally increased the time limit for write lock to 20 secs .. things are working fine ... but still it did not help ... On closely analysis of doc which we were indexing, we were using commitWithin as 10 secs, which was the root cause of taking so long for indexing the document because of so many segments to be committed. On separate commit command using curl solved the issue. The performance improved from 3 mins to 1.5 secs :) Thanks a lot Naveen On Thu, Aug 11, 2011 at 6:27 PM, Peter Sturge peter.stu...@gmail.comwrote: Optimizing indexing time is a very different question. I'm guessing your 3mins+ time you refer to is the commit time. There are a whole host of things to take into account regarding indexing, like: number of segments, schema, how many fields, storing fields, omitting norms, caching, autowarming, search activity etc. - the list goes on... The trouble is, you can look at 100 different Solr installations with slow indexing, and find 200 different reasons why each is slow. The best place to start is to get a full understanding of precisely how your data is being stored in the index, starting with adding docs, going through your schema, Lucene segments, solrconfig.xml etc, looking at caches, commit triggers etc. - really getting to know how each step is affecting performance. Once you really have a handle on all the indexing steps, you'll be able to spot the bottlenecks that relate to your particular environment. An index of 4.5GB isn't that big (but the number of documents tends to have more of an effect than the physical size), so the bottleneck(s) should be findable once you trace through the indexing operations. On Thu, Aug 11, 2011 at 1:02 PM, Naveen Gupta nkgiit...@gmail.com wrote: Yes this was happening because of JVM heap size But the real issue is that if our index size is growing (very high) then indexing time is taking very long (using streaming) earlier for indexing 15,000 docs at a time (commit after 15000 docs) , it was taking 3 mins 20 secs time, after deleting the index data, it is taking 9 secs What would be approach to have better indexing performance as well as index size should also at the same time. The index size was around 4.5 GB Thanks Naveen On Thu, Aug 11, 2011 at 3:47 PM, Peter Sturge peter.stu...@gmail.com wrote: Hi, When you get this exception with no other error or explananation in the logs, this is almost always because the JVM has run out of memory. Have you checked/profiled your mem usage/GC during the stream operation? On Thu, Aug 11, 2011 at 3:18 AM, Naveen Gupta nkgiit...@gmail.com wrote: Hi, We are doing streaming update to solr for multiple user, We are getting Aug 10, 2011 11:56:55 AM org.apache.solr.common.SolrException log SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/var/lib/solr/data/index/write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:84) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1097) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:83) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102) at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127
Re: LockObtainFailedException
Yes this was happening because of JVM heap size But the real issue is that if our index size is growing (very high) then indexing time is taking very long (using streaming) earlier for indexing 15,000 docs at a time (commit after 15000 docs) , it was taking 3 mins 20 secs time, after deleting the index data, it is taking 9 secs What would be approach to have better indexing performance as well as index size should also at the same time. The index size was around 4.5 GB Thanks Naveen On Thu, Aug 11, 2011 at 3:47 PM, Peter Sturge peter.stu...@gmail.comwrote: Hi, When you get this exception with no other error or explananation in the logs, this is almost always because the JVM has run out of memory. Have you checked/profiled your mem usage/GC during the stream operation? On Thu, Aug 11, 2011 at 3:18 AM, Naveen Gupta nkgiit...@gmail.com wrote: Hi, We are doing streaming update to solr for multiple user, We are getting Aug 10, 2011 11:56:55 AM org.apache.solr.common.SolrException log SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/var/lib/solr/data/index/write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:84) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1097) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:83) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102) at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint Aug 10, 2011 12:00:16 PM org.apache.solr.common.SolrException log SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/var/lib/solr/data/index/write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:84) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1097) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:83) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102) at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252
LockObtainFailedException
Hi, We are doing streaming update to solr for multiple user, We are getting Aug 10, 2011 11:56:55 AM org.apache.solr.common.SolrException log SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/var/lib/solr/data/index/write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:84) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1097) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:83) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102) at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint Aug 10, 2011 12:00:16 PM org.apache.solr.common.SolrException log SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/var/lib/solr/data/index/write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:84) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1097) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:83) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102) at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:662)
Re: indexing taking very long time
Hi Erick, We are having a requirement where we are having almost 100,000 documents to be indexed (atleast 20 fields). These fields are not having length greater than 10 KB. Also we are running parallel search for the same index. We found that it is taking almost 3 min to index the entire documents. Strategy what we are doing is that We are making a commit after 15000 docs (single large xml doc) (update streaming using curl in php) We are having merge factor of 10 as if now I am wondering if increasing the merge factor to 25 or 50 would increase the performance. also what about RAM Size (default is 32 MB) ? Which other factors we need to consider ? When should we consider optimize ? Any other deviation from default would help us in achieving the target. We are allocating JVM max heap size allocation 512 MB, default concurrent mark sweep is set for garbage collection. One more thing, we have CPU utilization (20-25 % in all 4 cores) (using htop) Thanks Naveen On Thu, Aug 4, 2011 at 7:05 AM, Erick Erickson erickerick...@gmail.comwrote: What version of Solr are you using? If it's a recent version, then optimizing is not that essential, you can do it during off hours, perhaps nightly or weekly. As far as indexing speed, have you profiled your application to see whether it's Solr or your indexing process that's the bottleneck? A quick check would be to monitor the CPU utilization on the server and see if it's high. As far as multithreading, one option is to simply have multiple clients indexing simultaneously. But you haven't indicated how the indexing is being done. Are you using DIH? SolrJ? Streaming documents to Solr? You have to provide those kinds of details to get meaningful help. Best Erick On Aug 2, 2011 8:06 AM, Naveen Gupta nkgiit...@gmail.com wrote: Hi We have a requirement where we are indexing all the messages of a a thread, a thread may have attachment too . We are adding to the solr for indexing and searching for applying few business rule. For a user, we have almost many threads (100k) in number and each thread may be having 10-20 messages. Now what we are finding is that it is taking 30 mins to index the entire threads. When we run optimize then it is taking faster time. The question here is that how frequently this optimize should be called and when ? Please note that we are following commit strategy (that is every after 10k threads, commit is called). we are not calling commit after every doc. Secondly how can we use multi threading from solr perspective in order to improve jvm and other utilization ? Thanks Naveen
Re: indexing taking very long time
Hi ERick, Version of SOLR 3.0 We are indexing the data using CURL call from C interface to SOLR server using REST. We are merging 15,000 docs in a single XML doc and directly using CURL to index the data and then calling commit. (update) For each of the client, we are creating a new connection .(a php script uses exec() command to start new C process for every user) and hitting the SOLR server. We are using default solrconfig except few of the fields changes.inschema.xml Max JVM heap allocation (512 MB RAM) (512 MB RAM is for linux box as well) Initially i increased merge factor 50 and Ram size of 50 MB, but needed to reduce since we were getting java.lang.OutOfMemoryError: Java heap space it is taking 3 mins to index 15,000 docs ( a client can have 100 000 docs and we have many multiple clients). Also we run in parallel search query from other client to this index as well. its the time between curl was called and the time response came back When we commit, CPU usage goes upto 25 % (not all the cores, but yeah few of them). The total number of cores is 4. Can you please advise where to start from tuning perspective. Some blog i was going through, it clearly says that it should take 40 secs to index 100,000 docs (if you have 10-12 fields defined). I forgot the link. They talked about increasing the merge factor. Thanks Naveen On Thu, Aug 4, 2011 at 7:05 AM, Erick Erickson erickerick...@gmail.comwrote: What version of Solr are you using? If it's a recent version, then optimizing is not that essential, you can do it during off hours, perhaps nightly or weekly. As far as indexing speed, have you profiled your application to see whether it's Solr or your indexing process that's the bottleneck? A quick check would be to monitor the CPU utilization on the server and see if it's high. As far as multithreading, one option is to simply have multiple clients indexing simultaneously. But you haven't indicated how the indexing is being done. Are you using DIH? SolrJ? Streaming documents to Solr? You have to provide those kinds of details to get meaningful help. Best Erick On Aug 2, 2011 8:06 AM, Naveen Gupta nkgiit...@gmail.com wrote: Hi We have a requirement where we are indexing all the messages of a a thread, a thread may have attachment too . We are adding to the solr for indexing and searching for applying few business rule. For a user, we have almost many threads (100k) in number and each thread may be having 10-20 messages. Now what we are finding is that it is taking 30 mins to index the entire threads. When we run optimize then it is taking faster time. The question here is that how frequently this optimize should be called and when ? Please note that we are following commit strategy (that is every after 10k threads, commit is called). we are not calling commit after every doc. Secondly how can we use multi threading from solr perspective in order to improve jvm and other utilization ? Thanks Naveen
merge factor performance
Hi, We are having a requirement where we are having almost 100,000 documents to be indexed (atleast 20 fields). These fields are not having length greater than 10 KB. Also we are running parallel search for the same index. We found that it is taking almost 3 min to index the entire documents. Strategy what we are doing is that We are making a commit after 15000 docs (single large xml doc) We are having merge factor of 10 as if now I am wondering if increasing the merge factor to 25 or 50 would increase the performance. also what about RAM Size (default is 32 MB) ? Which other factors we need to consider ? When should we consider optimize ? Any other deviation from default would help us in achieving the target. We are allocating JVM max heap size allocation 512 MB, default concurrent mark sweep is set for garbage collection. Thanks Naveen
Re: merge factor performance
Sorry for 15k Docs, it is taking 3 mins. On Thu, Aug 4, 2011 at 10:07 PM, Naveen Gupta nkgiit...@gmail.com wrote: Hi, We are having a requirement where we are having almost 100,000 documents to be indexed (atleast 20 fields). These fields are not having length greater than 10 KB. Also we are running parallel search for the same index. We found that it is taking almost 3 min to index the entire documents. Strategy what we are doing is that We are making a commit after 15000 docs (single large xml doc) We are having merge factor of 10 as if now I am wondering if increasing the merge factor to 25 or 50 would increase the performance. also what about RAM Size (default is 32 MB) ? Which other factors we need to consider ? When should we consider optimize ? Any other deviation from default would help us in achieving the target. We are allocating JVM max heap size allocation 512 MB, default concurrent mark sweep is set for garbage collection. Thanks Naveen
indexing taking very long time
Hi We have a requirement where we are indexing all the messages of a a thread, a thread may have attachment too . We are adding to the solr for indexing and searching for applying few business rule. For a user, we have almost many threads (100k) in number and each thread may be having 10-20 messages. Now what we are finding is that it is taking 30 mins to index the entire threads. When we run optimize then it is taking faster time. The question here is that how frequently this optimize should be called and when ? Please note that we are following commit strategy (that is every after 10k threads, commit is called). we are not calling commit after every doc. Secondly how can we use multi threading from solr perspective in order to improve jvm and other utilization ? Thanks Naveen
Re: IMP: indexing taking very long time
Can somebody answer this? What should be the best strategy for optimize (when million of messages we are indexing for a new registered user) Thanks Naveen On Tue, Aug 2, 2011 at 5:36 PM, Naveen Gupta nkgiit...@gmail.com wrote: Hi We have a requirement where we are indexing all the messages of a a thread, a thread may have attachment too . We are adding to the solr for indexing and searching for applying few business rule. For a user, we have almost many threads (100k) in number and each thread may be having 10-20 messages. Now what we are finding is that it is taking 30 mins to index the entire threads. When we run optimize then it is taking faster time. The question here is that how frequently this optimize should be called and when ? Please note that we are following commit strategy (that is every after 10k threads, commit is called). we are not calling commit after every doc. Secondly how can we use multi threading from solr perspective in order to improve jvm and other utilization ? Thanks Naveen
relevant result for query with boost factor on parameters
Hi, I am trying to achieve this use case with following expectation three fields 1. field1 2. field2 3. field3 field1 should have the max relevance field2 should have the next field3 is the last the term will be entered by end user (say* rock roll*) i want to show the results which will contain *rock and roll* both in field1 (first) i want to show the results which will contain *rock and roll* both in field 2 (first) these should be only done for a given* field3 (x...@gmail.com)* but if suppose field1 does not contain both the term *rock and roll, * *special attention *then field 2 results should take the priority (show the results which has both the terms first and then show the results with respect to boost factor or relevance) if both the fields do not contain these terms together (show as normal one with field1 having more relevance than field2) how to join the results for field3 that means for a given field3, the above results should be filtered. I am trying this one, giving satisfactory results, but not the best one, field1:(rock roll)^20 field2:(rock roll)^4 field3:x...@gmail.com i was thinking of givning filed1 field2 field3 but not working. Can you help in this regard? What other config should i consider in terms of given context ? Thanks Naveen
Re: tika integration exception and other related queries
Hi Gary, Similar thing we are doing, but we are not creating an XML doc, rather we are leaving TIKA to extract the content and depends on dynamic fields. We are not storing the text as well. But not sure if in future that would be the case. What about microsoft 7 and later related attachments. Is this working for you, because we are always getting number format exception. I posted as well in the community, but till now no response has some. Thanks Naveen On Thu, Jun 9, 2011 at 6:43 PM, Gary Taylor g...@inovem.com wrote: Naveen, Not sure our requirement matches yours, but one of the things we index is a comment item that can have one or more files attached to it. To index the whole thing as a single Solr document we create a zipfile containing a file with the comment details in it and any additional attached files. This is submitted to Solr as a TEXT field in an XML doc, along with other meta-data fields from the comment. In our schema the TEXT field is indexed but not stored, so when we search and get a match back it doesn't contain all of the contents from the attached files etc., only the stored fields in our schema. Admittedly, the user can therefore get back a comment match with no indication as to WHERE the match occurred (ie. was it in the meta-data or the contents of the attached files), but at the moment we're only interested in getting appropriate matches, not explaining where the match is. Hope that helps. Kind regards, Gary. On 09/06/2011 03:00, Naveen Gupta wrote: Hi Gary It started working .. though i did not test for Zip files, but for rar files, it is working fine .. only thing what i wanted to do is to index the metadata (text mapped to content) not store the data Also in search result, i want to filter the stuffs ... and it started working fine .. i don't want to show the content stuffs to the end user, since the way it extracts the information is not very helpful to the user .. although we can apply few of the analyzers and filters to remove the unnecessary tags ..still the information would not be of much help .. looking for your opinion ... what you did in order to filter out the content or are you showing the content extracted to the end user? Even in case, we are showing the text part to the end user, how can i limit the number of characters while querying the search results ... is there any feature where we can achieve this ... the concept of snippet kind of thing ... Thanks Naveen On Wed, Jun 8, 2011 at 1:45 PM, Gary Taylorg...@inovem.com wrote: Naveen, For indexing Zip files with Tika, take a look at the following thread : http://lucene.472066.n3.nabble.com/Extracting-contents-of-zipped-files-with-Tika-and-Solr-1-4-1-td2327933.html I got it to work with the 3.1 source and a couple of patches. Hope this helps. Regards, Gary. On 08/06/2011 04:12, Naveen Gupta wrote: Hi Can somebody answer this ... 3. can somebody tell me an idea how to do indexing for a zip file ? 1. while sending docx, we are getting following error.
ERROR on posting update request using CURL in php
Hi This is my document in php $xmldoc = 'adddocfield name=idF_146/fieldfield name=userid74/fieldfield name=groupuseidgmail.com/fieldfield name=attachment_size121/fieldfield name=attachment_namesample.pptx/field/doc/add'; $ch = curl_init(http://localhost:8080/solr/update;); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); curl_setopt ($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_HTTPHEADER, array(Content-Type: text/xml) ); curl_setopt($ch, CURLOPT_POSTFIELDS,$xmldoc); $result= curl_exec($ch); if(!curl_errno($ch)) { $info = curl_getinfo($ch); $header = substr($response, 0, $info['header_size']); echo 'Took ' . $info['total_time'] . ' seconds to send a request to ' . $info['url']; }else{ print_r('no idea'); } println('result of query'.' '.' - '.$result); It is throwing error htmlheadtitleApache Tomcat/6.0.18 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--/style /headbodyh1HTTP Status 400 - Unexpected character ''' (code 39) in prolog; expected 'lt;' at [row,col {unknown-source}]: [1,1]/h1HR size=1 noshade=noshadepbtype/b Status report/ppbmessage/b uUnexpected character ''' (code 39) in prolog; expected 'lt;' at [row,col {unknown-source}]: [1,1]/u/ppbdescription/b uThe request sent by the client was syntactically incorrect (Unexpected character ''' (code 39) in prolog; expected 'lt;' at [row,col {unknown-source}]: [1,1])./u/pHR size=1 noshade=noshadeh3Apache Tomcat/6.0.18/h3/body/html Thanks Naveen
Re: ERROR on posting update request using CURL in php
Hi, curl http://localhost:8983/solr/update?commit=true -H Content-Type: text/xml --data-binary 'adddocfield name=idtestdoc/field/doc/add' Regards Naveen On Fri, Jun 10, 2011 at 10:18 AM, Naveen Gupta nkgiit...@gmail.com wrote: Hi This is my document in php $xmldoc = 'adddocfield name=idF_146/fieldfield name=userid74/fieldfield name=groupuseidgmail.com/fieldfield name=attachment_size121/fieldfield name=attachment_namesample.pptx/field/doc/add'; $ch = curl_init(http://localhost:8080/solr/update;); curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); curl_setopt ($ch, CURLOPT_POST, 1); curl_setopt($ch, CURLOPT_HTTPHEADER, array(Content-Type: text/xml) ); curl_setopt($ch, CURLOPT_POSTFIELDS,$xmldoc); $result= curl_exec($ch); if(!curl_errno($ch)) { $info = curl_getinfo($ch); $header = substr($response, 0, $info['header_size']); echo 'Took ' . $info['total_time'] . ' seconds to send a request to ' . $info['url']; }else{ print_r('no idea'); } println('result of query'.' '.' - '.$result); It is throwing error htmlheadtitleApache Tomcat/6.0.18 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--/style /headbodyh1HTTP Status 400 - Unexpected character ''' (code 39) in prolog; expected 'lt;' at [row,col {unknown-source}]: [1,1]/h1HR size=1 noshade=noshadepbtype/b Status report/ppbmessage/b uUnexpected character ''' (code 39) in prolog; expected 'lt;' at [row,col {unknown-source}]: [1,1]/u/ppbdescription/b uThe request sent by the client was syntactically incorrect (Unexpected character ''' (code 39) in prolog; expected 'lt;' at [row,col {unknown-source}]: [1,1])./u/pHR size=1 noshade=noshadeh3Apache Tomcat/6.0.18/h3/body/html Thanks Naveen
Re: tika integration exception and other related queries
Hi Gary It started working .. though i did not test for Zip files, but for rar files, it is working fine .. only thing what i wanted to do is to index the metadata (text mapped to content) not store the data Also in search result, i want to filter the stuffs ... and it started working fine .. i don't want to show the content stuffs to the end user, since the way it extracts the information is not very helpful to the user .. although we can apply few of the analyzers and filters to remove the unnecessary tags ..still the information would not be of much help .. looking for your opinion ... what you did in order to filter out the content or are you showing the content extracted to the end user? Even in case, we are showing the text part to the end user, how can i limit the number of characters while querying the search results ... is there any feature where we can achieve this ... the concept of snippet kind of thing ... Thanks Naveen On Wed, Jun 8, 2011 at 1:45 PM, Gary Taylor g...@inovem.com wrote: Naveen, For indexing Zip files with Tika, take a look at the following thread : http://lucene.472066.n3.nabble.com/Extracting-contents-of-zipped-files-with-Tika-and-Solr-1-4-1-td2327933.html I got it to work with the 3.1 source and a couple of patches. Hope this helps. Regards, Gary. On 08/06/2011 04:12, Naveen Gupta wrote: Hi Can somebody answer this ... 3. can somebody tell me an idea how to do indexing for a zip file ? 1. while sending docx, we are getting following error.
getting numberformat exception while using tika
Hi We are using requestextractinghandler and we are getting following error. we are giving microsoft docx file for indexing. I think that this is something to do with field date definition .. but now very sure ...what field type should we use? 2. we are trying to index jpg (when we search over the name of the jpg, it is not coming .. though in id i am passing one) 3. what about zip files or rar files.. does tika with solr handle this one ? java.lang.NumberFormatException: For input string: quot;2011-01-27T07:18:00Zquot; at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Long.parseLong(Long.java:412) at java.lang.Long.parseLong(Long.java:461) at org.apache.solr.schema.TrieField.createField(TrieField.java:434) at org.apache.solr.schema.SchemaField.createField(SchemaField.java:98) at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:204) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:277) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:121) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:126) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:198) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:238) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Thanks Naveen
tika integration exception and other related queries
Hi Can somebody answer this ... 3. can somebody tell me an idea how to do indexing for a zip file ? 1. while sending docx, we are getting following error. java.lang. NumberFormatException: For input string: quot;2011-01-27T07:18:00Zquot; at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Long.parseLong(Long.java:412) at java.lang.Long.parseLong(Long.java:461) at org.apache.solr.schema.TrieField.createField(TrieField.java:434) at org.apache.solr.schema.SchemaField.createField(SchemaField.java:98) at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:204) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:277) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:121) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:126) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:198) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:238) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Thanks Naveen On Tue, Jun 7, 2011 at 3:33 PM, Naveen Gupta nkgiit...@gmail.com wrote: Hi We are using requestextractinghandler and we are getting following error. we are giving microsoft docx file for indexing. I think that this is something to do with field date definition .. but now very sure ...what field type should we use? 2. we are trying to index jpg (when we search over the name of the jpg, it is not coming .. though in id i am passing one) 3. what about zip files or rar files.. does tika with solr handle this one ? java.lang.NumberFormatException: For input string: quot;2011-01-27T07:18:00Zquot; at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Long.parseLong(Long.java:412) at java.lang.Long.parseLong(Long.java:461) at org.apache.solr.schema.TrieField.createField(TrieField.java:434) at org.apache.solr.schema.SchemaField.createField(SchemaField.java:98) at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:204) at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:277) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:121) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:126) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:198) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:238) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360
Re: TIKA INTEGRATION PERFORMANCE
Hi Tomas, 1. Regarding SolrInputDocument, We are not using java client, rather we are using php solr, wrapping content in SolrInputDocument, i am not sure how to do in PHP client? In this case, we need tika related jars to avail the metadata such as content .. we certainly don't want to handle all these things in PHP client. Secondly, what i was asking about commit strategy -- what about suppose you have 100 docs iterate over 99 docs and fire curl without commit in url and for 100th doc, we will use commit so doing so, will it also update the indexes for last 99 docs while(upto 99){ curl_command = url without commit; } when i = 100, url would be commit i wanted to achieve something similar to optimize kind of thing why these kind of use cases which are general purpose not included in example (especially in other language ...java guys can easily do using API) I am basically a Java Guy, so i can feel the problem Thanks Naveen 2011/6/6 Tomás Fernández Löbbe tomasflo...@gmail.com 1. About the commit strategy, all the ExtractingRequestHandler (request handler that uses Tika to extract content from the input file) will do is extract the content of your file and add it to a SolrInputDocument. The commit strategy should not change because of this, compared to other documents you might be indexing. It is usually not recommended to commit on every new / updated document. 2. Don't know if I understand the question. you can add all the static fields you want to the document by adding the literal. prefix to the name of the fields when using ExtractingRequestHandler (as you are doing with literal.id). You can also leave empty fields if they are not marked as required at the schema.xml file. See: http://wiki.apache.org/solr/ExtractingRequestHandler#Literals 3. Solr cores can work almost as completely different Solr instances. You could tell one core to replicate from another core. I don't think this would be of any help here. If you want to separate the indexing operations from the query operations, you could probably use different machines, that's usually a better option. Configure the indexing box as master and the query box as slave. Here you have some more information about it: http://wiki.apache.org/solr/SolrReplication Were this the answers you were looking for or did I misunderstand your questions? Tomás On Mon, Jun 6, 2011 at 2:54 AM, Naveen Gupta nkgiit...@gmail.com wrote: Hi Since it is php, we are using solphp for calling curl based call, what my concern here is that for each user, we might be having 20-40 attachments needed to be indexed each day, and there are various users ..daily we are targeting around 500-1000 users .. right now if you see, we ?php $ch = curl_init(' http://localhost:8010/solr/update/extract?literal.id=doc2commit=true'); curl_setopt ($ch, CURLOPT_POST, 1); curl_setopt ($ch, CURLOPT_POSTFIELDS, array('myfile'=@paper.pdf)); $result= curl_exec ($ch); ? also we are planning to use other fields which are to be indexed and stored ... There are couple of questions here 1. what would be the best strategies for commit. if we take all the documents in an array and iterating one by one and fire the curl and for the last doc, if we commit, will it work or for each doc, we need to commit? 2. we are having several fields which are already defined in schema and few of the them are required earlier, but for this purpose, we don't want, how to have two requirement together in the same schema? 3. since it is frequent commit, how to use solr multicore for write and read operations separately ? Thanks Naveen
TIKA INTEGRATION PERFORMANCE
Hi Since it is php, we are using solphp for calling curl based call, what my concern here is that for each user, we might be having 20-40 attachments needed to be indexed each day, and there are various users ..daily we are targeting around 500-1000 users .. right now if you see, we ?php $ch = curl_init(' http://localhost:8010/solr/update/extract?literal.id=doc2commit=true'); curl_setopt ($ch, CURLOPT_POST, 1); curl_setopt ($ch, CURLOPT_POSTFIELDS, array('myfile'=@paper.pdf)); $result= curl_exec ($ch); ? also we are planning to use other fields which are to be indexed and stored ... There are couple of questions here 1. what would be the best strategies for commit. if we take all the documents in an array and iterating one by one and fire the curl and for the last doc, if we commit, will it work or for each doc, we need to commit? 2. we are having several fields which are already defined in schema and few of the them are required earlier, but for this purpose, we don't want, how to have two requirement together in the same schema? 3. since it is frequent commit, how to use solr multicore for write and read operations separately ? Thanks Naveen
different indexes for multitenant approach
Hi I want to implement different index strategy where we want to keep indexes with respect to each tennant and we want to maintain indexes separately ... first level of category -- company name second level of category - company name + fields to be indexed then further categories - group of different company name based on some heuristic (hashing) (if it grows furhter) i want to do in the same solr instance. can it be possible ? Thanks Naveen
Re: How to display search results of solr in to other application.
Hi Romi As per me, you need to understand how ajax with jquery works .. then go for json and then jsonp (if you are fetching from different) query here is dynamic query which you will be trying to hit solr .. (it could be simple text, or more advanced query string) http://wiki.apache.org/solr/CommonQueryParameters Callback is the method name which you will define .. after getting response, this method will be called (callback mechanism) using the response from solr (json format), you need to show the response or analyze the response as per your business need. Thanks Naveen On Fri, Jun 3, 2011 at 12:00 PM, Romi romijain3...@gmail.com wrote: $.getJSON( http://[server]:[port]/solr/select/?jsoncallback=?;, {q: queryString, version: 2.2, start: 0, rows: 10, indent: on, json.wrf: callbackFunctionToDoSomethingWithOurData, wt: json, fl: field1} ); would you please explain what are queryString and json.wrf: callbackFunctionToDoSomethingWithOurData. and what if i want to change my query string each time. - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-display-search-results-of-solr-in-to-other-application-tp3014101p3018740.html Sent from the Solr - User mailing list archive at Nabble.com.
php library for extractrequest handler
Hi We want to post to solr server with some of the files (rtf,doc,etc) using php .. one way is to post using curl is there any client like java client (solrcell) urls will also help Thanks Naveen
Re: Strategy -- Frequent updates in our application
Hi Pravesh We don't have that setup right now .. we are thinking of doing that for writes we are going to have one instance and for read, we are going to have another... do you have other design in mind .. kindly share Thanks Naveen On Fri, Jun 3, 2011 at 2:50 PM, pravesh suyalprav...@yahoo.com wrote: You can use DataImportHandler for your full/incremental indexing. Now NRT indexing could vary as per business requirements (i mean delay cud be 5-mins ,10-mins,15-mins,OR, 30-mins). Then it also depends on how much volume will be indexed incrementally. BTW, r u having Master+Slave SOLR setup? -- View this message in context: http://lucene.472066.n3.nabble.com/Strategy-Frequent-updates-in-our-application-tp3018386p3019040.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: php library for extractrequest handler
Yes, that one i used and it is working fine .thanks to nabble .. Thanks Naveen On Fri, Jun 3, 2011 at 4:02 PM, Gora Mohanty g...@mimirtech.com wrote: On Fri, Jun 3, 2011 at 3:55 PM, Naveen Gupta nkgiit...@gmail.com wrote: Hi We want to post to solr server with some of the files (rtf,doc,etc) using php .. one way is to post using curl Do not normally use PHP, and have not tried it myself. However, there is a PHP extension for Solr: http://wiki.apache.org/solr/SolPHP http://php.net/manual/en/book.solr.php Regards, Gora
tika and solr 3,1 integration
Hi I am trying to integrate solr 3.1 and tika (which comes default with the version) and using curl command trying to index few of the documents, i am getting this error. the error is attr_meta field is unknown. i checked the solrconfig, it looks perfect to me. can you please tell me what i am missing. I copied all the jars from contrib/extraction/lib to solr/lib folder that is there in same place where conf is there I am using the same request handler which is coming with default requestHandler name=/update/extract startup=lazy class=solr.extraction.ExtractingRequestHandler lst name=defaults !-- All the main content goes into text... if you need to return the extracted text or do highlighting, use a stored field. -- str name=fmap.contenttext/str str name=lowernamestrue/str str name=uprefixignored_/str !-- capture link hrefs but ignore div attributes -- str name=captureAttrtrue/str str name=fmap.alinks/str str name=fmap.divignored_/str /lst /requestHandler * curl http://dev.grexit.com:8080/solr1/update/extract?literal.id=who.pdfuprefix=attr_attr_fmap.content=attr_contentcommit=true; -F myfile=@/root/apache-solr-3.1.0/docs/who.pdf* htmlheadtitleApache Tomcat/6.0.18 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--/style /headbodyh1HTTP Status 400 - ERROR:unknown field 'attr_meta'/h1HR size=1 noshade=noshadepbtype/b Status report/ppbmessage/b uERROR:unknown field 'attr_meta'/u/ppbdescription/b uThe request sent by the client was syntactically incorrect (ERROR:unknown field 'attr_meta')./u/pHR size=1 noshade=noshadeh3Apache Tomcat/6.0.18/h3/body/htmlroot@weforpeople:/usr/share/solr1/lib# Please note i integrated apacha tika 0.9 with apache-solr-1.4 locally on windows machine and using solr cell calling the program works fine without any changes in configuration. Thanks Naveen
tika and solr 3,1 integration error
Hi I am trying to integrate solr 3.1 and tika (which comes default with the version) and using curl command trying to index few of the documents, i am getting this error. the error is attr_meta field is unknown. i checked the solrconfig, it looks perfect to me. can you please tell me what i am missing. I copied all the jars from contrib/extraction/lib to solr/lib folder that is there in same place where conf is there I am using the same request handler which is coming with default requestHandler name=/update/extract startup=lazy class=solr.extraction. ExtractingRequestHandler lst name=defaults !-- All the main content goes into text... if you need to return the extracted text or do highlighting, use a stored field. -- str name=fmap.contenttext/str str name=lowernamestrue/str str name=uprefixignored_/str !-- capture link hrefs but ignore div attributes -- str name=captureAttrtrue/str str name=fmap.alinks/str str name=fmap.divignored_/str /lst /requestHandler * curl http://dev.grexit.com:8080/solr1/update/extract?literal.id=who.pdfuprefix=attr_attr_fmap.content=attr_contentcommit=true; -F myfile=@/root/apache-solr-3.1.0/docs/who.pdf* htmlheadtitleApache Tomcat/6.0.18 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--/style /headbodyh1HTTP Status 400 - ERROR:unknown field 'attr_meta'/h1HR size=1 noshade=noshadepbtype/b Status report/ppbmessage/b uERROR:unknown field 'attr_meta'/u/ppbdescription/b uThe request sent by the client was syntactically incorrect (ERROR:unknown field 'attr_meta')./u/pHR size=1 noshade=noshadeh3Apache Tomcat/6.0.18/h3/body/htmlroot@weforpeople:/usr/share/solr1/lib# Please note i integrated apacha tika 0.9 with apache-solr-1.4 locally on windows machine and using solr cell calling the program works fine without any changes in configuration. Thanks Naveen
Re: tika and solr 3,1 integration
Hi This is fixed .. yes, schema.xml was the culprit and i fixed it looking at the sample schema provided in the sample. But in windows, i am getting slf4j (illegalacess exception) which looks like jar problem. looking at the fixes, suggested in their FAQs, they are suggesting to use 1.5.5 version, which is already there in lib folder .. i have been finding a lot of jars to be deployed .. i am afraid if that is causing the problem .. Has somebody experienced the same ? Thanks Naveen On Fri, Jun 3, 2011 at 2:41 AM, Juan Grande juan.gra...@gmail.com wrote: Hi Naveen, Check if there is a dynamic field named attr_* in the schema. The uprefix=attr_ parameter means that if Solr can't find an extracted field in the schema, it'll add the prefix attr_ and try again. *Juan* On Thu, Jun 2, 2011 at 4:21 AM, Naveen Gupta nkgiit...@gmail.com wrote: Hi I am trying to integrate solr 3.1 and tika (which comes default with the version) and using curl command trying to index few of the documents, i am getting this error. the error is attr_meta field is unknown. i checked the solrconfig, it looks perfect to me. can you please tell me what i am missing. I copied all the jars from contrib/extraction/lib to solr/lib folder that is there in same place where conf is there I am using the same request handler which is coming with default requestHandler name=/update/extract startup=lazy class=solr.extraction.ExtractingRequestHandler lst name=defaults !-- All the main content goes into text... if you need to return the extracted text or do highlighting, use a stored field. -- str name=fmap.contenttext/str str name=lowernamestrue/str str name=uprefixignored_/str !-- capture link hrefs but ignore div attributes -- str name=captureAttrtrue/str str name=fmap.alinks/str str name=fmap.divignored_/str /lst /requestHandler * curl http://dev.grexit.com:8080/solr1/update/extract?literal.id=who.pdfuprefix=attr_attr_fmap.content=attr_contentcommit=true -F myfile=@/root/apache-solr-3.1.0/docs/who.pdf* htmlheadtitleApache Tomcat/6.0.18 - Error report/titlestyle!--H1 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:22px;} H2 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:16px;} H3 {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;font-size:14px;} BODY {font-family:Tahoma,Arial,sans-serif;color:black;background-color:white;} B {font-family:Tahoma,Arial,sans-serif;color:white;background-color:#525D76;} P {font-family:Tahoma,Arial,sans-serif;background:white;color:black;font-size:12px;}A {color : black;}A.name {color : black;}HR {color : #525D76;}--/style /headbodyh1HTTP Status 400 - ERROR:unknown field 'attr_meta'/h1HR size=1 noshade=noshadepbtype/b Status report/ppbmessage/b uERROR:unknown field 'attr_meta'/u/ppbdescription/b uThe request sent by the client was syntactically incorrect (ERROR:unknown field 'attr_meta')./u/pHR size=1 noshade=noshadeh3Apache Tomcat/6.0.18/h3/body/htmlroot@weforpeople:/usr/share/solr1/lib# Please note i integrated apacha tika 0.9 with apache-solr-1.4 locally on windows machine and using solr cell calling the program works fine without any changes in configuration. Thanks Naveen
Strategy -- Frequent updates in our application
Hi We are having an application where every 10 mins, we are doing indexing of users docs repository, and eventually, if some thread is being added in that particular discussion, we need to index the thread again (please note we are not doing blind indexing each time, we have various rules to filter out which thread is new and thus that is a candidate for indexing plus new ones which has arrived). So we are doing updates for each user docs repository .. the performance is not looking so far very good. the future is that we are going to get hits in volume(1000 to 10,000 hits per mins), so looking for strategy where we can tune solr in order to index the data in real time and what about NRT, is it fine to apply in this case of scenario. i read that solr NRT is not very good in performance, but i am not going to believe it since it is one of the best open sources ..so it is going to have this problem sorted in near future ..but if any benchmark is there, kindly share with me ... we would like to analyze with our requirements. Is there any way to add incremental indexes which we generally find in other search engine like endeca and etc? i don't know much in detail about solr... since i am newbie, so can you please tell me if we can have some settings which can keep track of incremental indexing? Thanks Naveen