to reduce indexing time
Before indexing , this was the memory layout, System Memory : 63.2% ,2.21 gb JVM Memory : 8.3% , 81.60mb of 981.38mb I have indexed 700 documents of total size 12MB. Following are the results i get : Qtime: 8122, System time : 00:00:12.7318648 System Memory : 65.4% ,2.29 gb JVM Memory : 15.3% , 148.32mb of 981.38mb After indexing 7,000 documents, Qtime: 51817, System time : 00:01:12.6028320 System Memory : 69.4% 2.43Gb JVM Memoery : *26.5%* , 266.60mb After indexing 70,000 documents of 1200mb size, this are the results : Qtime: 511447, System time : 00:11:14.0398768 System memory : 82.7% , 2.89Gb JVM memory :* 11.8%* , 118.46mb Here the JVM usage decreases as compared to 7000 doc, why is it so?? This is* solrconfig.xml *; updateHandler class=solr.DirectUpdateHandler2 updateLog str name=dir${solr.document.log.dir:}/str /updateLog autoSoftCommit maxTime1000/maxTime /autoSoftCommit autoCommit maxTime60/maxTime openSearchertrue/openSearcher /autoCommit /updateHandler I am indexing through solrnet, indexing each document, var res = solr.Add(doc). // Doc doc = new Doc(); How do i reduce the time for indexing, as the size of data indexed is quite less?? Will batch indexing reduce the indexing time?? But then, do i need to make changes in solrconfig.xml Also, i want the documents to be searched in 1sec of indexing. Is it true that, if softcommit is done, then faceting cannot be done on the data?? -- View this message in context: http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: to reduce indexing time
Hi, Batch/bulk indexing is the way to go for speed. * Disable autoSoftCommit feature for the bulk indexing. * Disable transaction log for the bulk indexing. Ater you finish bulk indexing, you can enable above. Again you are too generous with 1 second refresh rate (autoSoftCommit maxTime). Here is an excellent write up : http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ By the way, for auto hard commit openSearcher=false is advised. It is used to flush tlogs. On Wednesday, March 5, 2014 4:48 PM, sweety sweetyshind...@yahoo.com wrote: Before indexing , this was the memory layout, System Memory : 63.2% ,2.21 gb JVM Memory : 8.3% , 81.60mb of 981.38mb I have indexed 700 documents of total size 12MB. Following are the results i get : Qtime: 8122, System time : 00:00:12.7318648 System Memory : 65.4% ,2.29 gb JVM Memory : 15.3% , 148.32mb of 981.38mb After indexing 7,000 documents, Qtime: 51817, System time : 00:01:12.6028320 System Memory : 69.4% 2.43Gb JVM Memoery : *26.5%* , 266.60mb After indexing 70,000 documents of 1200mb size, this are the results : Qtime: 511447, System time : 00:11:14.0398768 System memory : 82.7% , 2.89Gb JVM memory :* 11.8%* , 118.46mb Here the JVM usage decreases as compared to 7000 doc, why is it so?? This is* solrconfig.xml *; updateHandler class=solr.DirectUpdateHandler2 updateLog str name=dir${solr.document.log.dir:}/str /updateLog autoSoftCommit maxTime1000/maxTime /autoSoftCommit autoCommit maxTime60/maxTime openSearchertrue/openSearcher /autoCommit /updateHandler I am indexing through solrnet, indexing each document, var res = solr.Add(doc). // Doc doc = new Doc(); How do i reduce the time for indexing, as the size of data indexed is quite less?? Will batch indexing reduce the indexing time?? But then, do i need to make changes in solrconfig.xml Also, i want the documents to be searched in 1sec of indexing. Is it true that, if softcommit is done, then faceting cannot be done on the data?? -- View this message in context: http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: to reduce indexing time
On 3/5/2014 7:47 AM, sweety wrote: Before indexing , this was the memory layout, System Memory : 63.2% ,2.21 gb JVM Memory : 8.3% , 81.60mb of 981.38mb I have indexed 700 documents of total size 12MB. Following are the results i get : Qtime: 8122, System time : 00:00:12.7318648 System Memory : 65.4% ,2.29 gb JVM Memory : 15.3% , 148.32mb of 981.38mb After indexing 7,000 documents, Qtime: 51817, System time : 00:01:12.6028320 System Memory : 69.4% 2.43Gb JVM Memoery : *26.5%* , 266.60mb After indexing 70,000 documents of 1200mb size, this are the results : Qtime: 511447, System time : 00:11:14.0398768 System memory : 82.7% , 2.89Gb JVM memory :* 11.8%* , 118.46mb Here the JVM usage decreases as compared to 7000 doc, why is it so?? Ahmet already addressed your configuration and how to speed things up. Here's the answer to your memory question: The simple fact is that during *all* of those tests, at least one of your heap memory pools will have reached maximum size and then been reduced by garbage collection. This memory graph from the jconsole shows what happens with Java heap memory: http://docs.oracle.com/javase/1.5.0/docs/guide/management/jconsole.html#memory After indexing, your final example was simply at one of the low points in the graph, but the other examples were at a higher point. Thanks, Shawn
Re: to reduce indexing time
Now i have batch indexed, with batch of 250 documents.These were the results. After 7,000 documents, Qtime: 46894, System time : 00:00:55.9384892 JVM memory : 249.02mb, 24.8% This shows quite a reduction in timing. After 70,000 documents, Qtime: 480435, System time : 00:09:29.5206727 System memory : 82.8%, 2.90gb JVM memory : 82% , 818.06mb //Here, the memory usage has increased, though the timing has reduced. After disabling softcommit and tlog, for 70,000 contracts. Qtime: 461331, System time : 00:09:09.7930326 JVM Memory : 62.4% , 623.42mb. //Memory usage is less. What causes this memory usage to change, if the data to be indexed is same? -- View this message in context: http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391p4121441.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: to reduce indexing time
It doesn't sound like you have much of an understanding of java's garbage collection. You might read http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html to get a better understanding of how it works and why you're seeing different levels of memory utilization at any given point in time. The thing to note is that while doing anything java applications are going to create objects that reside in the java heap and take up space. After a while those objects are which are no longer in use and orphaned by their creating objects get collected by the jvm which lowers the amount of heap used. In most cases you shouldn't expect memory usage in a jvm to be static unless it's sitting around idle. Thanks, Greg On Mar 5, 2014, at 11:58 AM, sweety sweetyshind...@yahoo.com wrote: Now i have batch indexed, with batch of 250 documents.These were the results. After 7,000 documents, Qtime: 46894, System time : 00:00:55.9384892 JVM memory : 249.02mb, 24.8% This shows quite a reduction in timing. After 70,000 documents, Qtime: 480435, System time : 00:09:29.5206727 System memory : 82.8%, 2.90gb JVM memory : 82% , 818.06mb //Here, the memory usage has increased, though the timing has reduced. After disabling softcommit and tlog, for 70,000 contracts. Qtime: 461331, System time : 00:09:09.7930326 JVM Memory : 62.4% , 623.42mb. //Memory usage is less. What causes this memory usage to change, if the data to be indexed is same? -- View this message in context: http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391p4121441.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: to reduce indexing time
I will surely read about JVM Garbage collection. Thanks a lot, all of you. But, is the time required for my indexing good enough? I dont know about the ideal timings. I think that my indexing is taking more time. -- View this message in context: http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391p4121483.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: to reduce indexing time
Hi, One thing to consider is, I think solrnet use xml update, there is xml parsing overhead with it. Switching to solrJ or CSV can cause additional gain. http://wiki.apache.org/lucene-java/ImproveIndexingSpeed Ahmet On Wednesday, March 5, 2014 10:13 PM, sweety sweetyshind...@yahoo.com wrote: I will surely read about JVM Garbage collection. Thanks a lot, all of you. But, is the time required for my indexing good enough? I dont know about the ideal timings. I think that my indexing is taking more time. -- View this message in context: http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391p4121483.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: to reduce indexing time
I believe SolrJ uses XML under the covers. If so, I don't think you would improve performance by switching to SolrJ, since the client would convert it to XML before sending it on the wire. Toby *** Toby Lazar Capital Technology Group Email: tla...@capitaltg.com Mobile: 646-469-5865 *** On Wed, Mar 5, 2014 at 3:25 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, One thing to consider is, I think solrnet use xml update, there is xml parsing overhead with it. Switching to solrJ or CSV can cause additional gain. http://wiki.apache.org/lucene-java/ImproveIndexingSpeed Ahmet On Wednesday, March 5, 2014 10:13 PM, sweety sweetyshind...@yahoo.com wrote: I will surely read about JVM Garbage collection. Thanks a lot, all of you. But, is the time required for my indexing good enough? I dont know about the ideal timings. I think that my indexing is taking more time. -- View this message in context: http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391p4121483.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: to reduce indexing time
Hi Toby, SolrJ uses javabin by default. Ahmet On Wednesday, March 5, 2014 11:31 PM, Toby Lazar tla...@capitaltg.com wrote: I believe SolrJ uses XML under the covers. If so, I don't think you would improve performance by switching to SolrJ, since the client would convert it to XML before sending it on the wire. Toby *** Toby Lazar Capital Technology Group Email: tla...@capitaltg.com Mobile: 646-469-5865 *** On Wed, Mar 5, 2014 at 3:25 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, One thing to consider is, I think solrnet use xml update, there is xml parsing overhead with it. Switching to solrJ or CSV can cause additional gain. http://wiki.apache.org/lucene-java/ImproveIndexingSpeed Ahmet On Wednesday, March 5, 2014 10:13 PM, sweety sweetyshind...@yahoo.com wrote: I will surely read about JVM Garbage collection. Thanks a lot, all of you. But, is the time required for my indexing good enough? I dont know about the ideal timings. I think that my indexing is taking more time. -- View this message in context: http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391p4121483.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: to reduce indexing time
Thanks Ahmet for the correction. I used wireshark to capture an UpdateRequest to solr and saw this XML: adddoc boost=1.0field name=caseID123/fieldfield name=caseNameblah/field/doc/add and figured that javabin was only for the responses. Does wt apply for how solrj send requests to solr? Could this HTTP content be in javabin format? Toby On Wed, Mar 5, 2014 at 4:34 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Toby, SolrJ uses javabin by default. Ahmet On Wednesday, March 5, 2014 11:31 PM, Toby Lazar tla...@capitaltg.com wrote: I believe SolrJ uses XML under the covers. If so, I don't think you would improve performance by switching to SolrJ, since the client would convert it to XML before sending it on the wire. Toby *** Toby Lazar Capital Technology Group Email: tla...@capitaltg.com Mobile: 646-469-5865 *** On Wed, Mar 5, 2014 at 3:25 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, One thing to consider is, I think solrnet use xml update, there is xml parsing overhead with it. Switching to solrJ or CSV can cause additional gain. http://wiki.apache.org/lucene-java/ImproveIndexingSpeed Ahmet On Wednesday, March 5, 2014 10:13 PM, sweety sweetyshind...@yahoo.com wrote: I will surely read about JVM Garbage collection. Thanks a lot, all of you. But, is the time required for my indexing good enough? I dont know about the ideal timings. I think that my indexing is taking more time. -- View this message in context: http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391p4121483.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: to reduce indexing time
On 3/5/2014 2:31 PM, Toby Lazar wrote: I believe SolrJ uses XML under the covers. If so, I don't think you would improve performance by switching to SolrJ, since the client would convert it to XML before sending it on the wire. Until recently, SolrJ always used XML by default for requests and javabin for responses. That is moving to javabin for both. This is already the case in the newest versions for CloudSolrServer. HttpSolrServer is still using the XML RequestWriter by default, but you can change this very easily to BinaryRequestWriter. If you plan to use SolrJ, it's a change I would highly recommend. Thanks, Shawn
Re: to reduce indexing time
OK, I was using HttpSolrServer since I haven't yet migrated to CloudSolrServer. I added the line: solrServer.setRequestWriter(new BinaryRequestWriter()) after creating the server object and now see the difference through wireshark. Is it fair to assume that this usage is multi-thread safe? Thank you Shawn and Ahmet, Toby *** Toby Lazar Capital Technology Group Email: tla...@capitaltg.com Mobile: 646-469-5865 *** On Wed, Mar 5, 2014 at 4:46 PM, Shawn Heisey s...@elyograg.org wrote: On 3/5/2014 2:31 PM, Toby Lazar wrote: I believe SolrJ uses XML under the covers. If so, I don't think you would improve performance by switching to SolrJ, since the client would convert it to XML before sending it on the wire. Until recently, SolrJ always used XML by default for requests and javabin for responses. That is moving to javabin for both. This is already the case in the newest versions for CloudSolrServer. HttpSolrServer is still using the XML RequestWriter by default, but you can change this very easily to BinaryRequestWriter. If you plan to use SolrJ, it's a change I would highly recommend. Thanks, Shawn
Re: to reduce indexing time
On 3/5/2014 2:58 PM, Toby Lazar wrote: OK, I was using HttpSolrServer since I haven't yet migrated to CloudSolrServer. I added the line: solrServer.setRequestWriter(new BinaryRequestWriter()) after creating the server object and now see the difference through wireshark. Is it fair to assume that this usage is multi-thread safe? Yes, SolrServer is entirely threadsafe. You can have one SolrServer object (for each Solr core) and use it throughout your entire application. Thanks, Shawn