to reduce indexing time

2014-03-05 Thread sweety
Before indexing , this was the memory layout,

System Memory : 63.2% ,2.21 gb
JVM Memory : 8.3% , 81.60mb of 981.38mb

I have indexed 700 documents of total size 12MB.
Following are the results i get : 
Qtime: 8122, System time : 00:00:12.7318648
System Memory : 65.4% ,2.29 gb
JVM Memory : 15.3% , 148.32mb of 981.38mb

After indexing 7,000 documents,
Qtime: 51817, System time : 00:01:12.6028320
System Memory : 69.4% 2.43Gb
JVM Memoery : *26.5%* , 266.60mb

After indexing 70,000 documents of 1200mb size, this are the results :
Qtime: 511447, System time : 00:11:14.0398768
System memory : 82.7% , 2.89Gb
JVM memory :* 11.8%* , 118.46mb

Here the JVM usage decreases as compared to 7000 doc, why is it  so?? 

This is* solrconfig.xml *;
updateHandler class=solr.DirectUpdateHandler2
updateLog
str name=dir${solr.document.log.dir:}/str
/updateLog
   autoSoftCommit
maxTime1000/maxTime
/autoSoftCommit
autoCommit
maxTime60/maxTime 
openSearchertrue/openSearcher
/autoCommit
/updateHandler
 
I am indexing through solrnet, indexing each document,  var res =
solr.Add(doc). // Doc doc = new Doc();

How do i reduce the time for indexing, as the size of data indexed is quite
less?? Will batch indexing reduce the indexing time?? But then, do i need to
make changes in solrconfig.xml
Also, i want the documents to be searched in 1sec of indexing.
Is it true that, if softcommit is done, then faceting cannot be done on the
data??



--
View this message in context: 
http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: to reduce indexing time

2014-03-05 Thread Ahmet Arslan
Hi,

Batch/bulk indexing is the way to go for speed. 

* Disable autoSoftCommit feature for the bulk indexing.
* Disable transaction log for the bulk indexing.

Ater you finish bulk indexing, you can enable above. Again you are too generous 
with 1 second refresh rate (autoSoftCommit maxTime). 

Here is an excellent write up : 
http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

By the way, for auto hard commit openSearcher=false is advised. It is used to 
flush tlogs.




On Wednesday, March 5, 2014 4:48 PM, sweety sweetyshind...@yahoo.com wrote:
Before indexing , this was the memory layout,

System Memory : 63.2% ,2.21 gb
JVM Memory : 8.3% , 81.60mb of 981.38mb

I have indexed 700 documents of total size 12MB.
Following are the results i get : 
Qtime: 8122, System time : 00:00:12.7318648
System Memory : 65.4% ,2.29 gb
JVM Memory : 15.3% , 148.32mb of 981.38mb

After indexing 7,000 documents,
Qtime: 51817, System time : 00:01:12.6028320
System Memory : 69.4% 2.43Gb
JVM Memoery : *26.5%* , 266.60mb

After indexing 70,000 documents of 1200mb size, this are the results :
Qtime: 511447, System time : 00:11:14.0398768
System memory : 82.7% , 2.89Gb
JVM memory :* 11.8%* , 118.46mb

Here the JVM usage decreases as compared to 7000 doc, why is it  so?? 

This is* solrconfig.xml *;
updateHandler class=solr.DirectUpdateHandler2
    updateLog
        str name=dir${solr.document.log.dir:}/str
    /updateLog
   autoSoftCommit
        maxTime1000/maxTime
    /autoSoftCommit
    autoCommit
        maxTime60/maxTime 
        openSearchertrue/openSearcher
    /autoCommit
/updateHandler

I am indexing through solrnet, indexing each document,  var res =
solr.Add(doc). // Doc doc = new Doc();

How do i reduce the time for indexing, as the size of data indexed is quite
less?? Will batch indexing reduce the indexing time?? But then, do i need to
make changes in solrconfig.xml
Also, i want the documents to be searched in 1sec of indexing.
Is it true that, if softcommit is done, then faceting cannot be done on the
data??



--
View this message in context: 
http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: to reduce indexing time

2014-03-05 Thread Shawn Heisey
On 3/5/2014 7:47 AM, sweety wrote:
 Before indexing , this was the memory layout,
 
 System Memory : 63.2% ,2.21 gb
 JVM Memory : 8.3% , 81.60mb of 981.38mb
 
 I have indexed 700 documents of total size 12MB.
 Following are the results i get : 
 Qtime: 8122, System time : 00:00:12.7318648
 System Memory : 65.4% ,2.29 gb
 JVM Memory : 15.3% , 148.32mb of 981.38mb
 
 After indexing 7,000 documents,
 Qtime: 51817, System time : 00:01:12.6028320
 System Memory : 69.4% 2.43Gb
 JVM Memoery : *26.5%* , 266.60mb
 
 After indexing 70,000 documents of 1200mb size, this are the results :
 Qtime: 511447, System time : 00:11:14.0398768
 System memory : 82.7% , 2.89Gb
 JVM memory :* 11.8%* , 118.46mb
 
 Here the JVM usage decreases as compared to 7000 doc, why is it  so?? 

Ahmet already addressed your configuration and how to speed things up.
Here's the answer to your memory question:

The simple fact is that during *all* of those tests, at least one of
your heap memory pools will have reached maximum size and then been
reduced by garbage collection.  This memory graph from the jconsole
shows what happens with Java heap memory:

http://docs.oracle.com/javase/1.5.0/docs/guide/management/jconsole.html#memory

After indexing, your final example was simply at one of the low points
in the graph, but the other examples were at a higher point.

Thanks,
Shawn



Re: to reduce indexing time

2014-03-05 Thread sweety
Now i have batch indexed, with batch of 250 documents.These were the results.
After 7,000 documents,
Qtime: 46894, System time : 00:00:55.9384892
JVM memory : 249.02mb, 24.8%
This shows quite a reduction in timing.

After 70,000 documents,
Qtime: 480435, System time : 00:09:29.5206727 
System memory : 82.8%, 2.90gb
JVM memory : 82% , 818.06mb //Here, the memory usage has increased, though
the timing has reduced.

After disabling softcommit and tlog, for 70,000 contracts.
Qtime: 461331, System time : 00:09:09.7930326
JVM Memory : 62.4% , 623.42mb. //Memory usage is less.

What causes this memory usage to  change, if the data to be indexed is same?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391p4121441.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: to reduce indexing time

2014-03-05 Thread Greg Walters
It doesn't sound like you have much of an understanding of java's garbage 
collection. You might read 
http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html 
to get a better understanding of how it works and why you're seeing different 
levels of memory utilization at any given point in time. The thing to note is 
that while doing anything java applications are going to create objects that 
reside in the java heap and take up space. After a while those objects are 
which are no longer in use and orphaned by their creating objects get collected 
by the jvm which lowers the amount of heap used. In most cases you shouldn't 
expect memory usage in a jvm to be static unless it's sitting around idle.

Thanks,
Greg

On Mar 5, 2014, at 11:58 AM, sweety sweetyshind...@yahoo.com wrote:

 Now i have batch indexed, with batch of 250 documents.These were the results.
 After 7,000 documents,
 Qtime: 46894, System time : 00:00:55.9384892
 JVM memory : 249.02mb, 24.8%
 This shows quite a reduction in timing.
 
 After 70,000 documents,
 Qtime: 480435, System time : 00:09:29.5206727 
 System memory : 82.8%, 2.90gb
 JVM memory : 82% , 818.06mb //Here, the memory usage has increased, though
 the timing has reduced.
 
 After disabling softcommit and tlog, for 70,000 contracts.
 Qtime: 461331, System time : 00:09:09.7930326
 JVM Memory : 62.4% , 623.42mb. //Memory usage is less.
 
 What causes this memory usage to  change, if the data to be indexed is same?
 
 
 
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391p4121441.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: to reduce indexing time

2014-03-05 Thread sweety
I will surely read about JVM Garbage collection. Thanks a lot, all of you.

But, is the time required for my indexing good enough? I dont know about the
ideal timings.
I think that my indexing is taking more time.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391p4121483.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: to reduce indexing time

2014-03-05 Thread Ahmet Arslan
Hi,

One thing to consider is, I think solrnet use xml update, there is xml parsing 
overhead with it.
Switching to solrJ or CSV can cause additional gain.

http://wiki.apache.org/lucene-java/ImproveIndexingSpeed

Ahmet


On Wednesday, March 5, 2014 10:13 PM, sweety sweetyshind...@yahoo.com wrote:
I will surely read about JVM Garbage collection. Thanks a lot, all of you.

But, is the time required for my indexing good enough? I dont know about the
ideal timings.
I think that my indexing is taking more time.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391p4121483.html

Sent from the Solr - User mailing list archive at Nabble.com.



Re: to reduce indexing time

2014-03-05 Thread Toby Lazar
I believe SolrJ uses XML under the covers.  If so, I don't think you would
improve performance by switching to SolrJ, since the client would convert
it to XML before sending it on the wire.

Toby

***
  Toby Lazar
  Capital Technology Group
  Email: tla...@capitaltg.com
  Mobile: 646-469-5865
***


On Wed, Mar 5, 2014 at 3:25 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 One thing to consider is, I think solrnet use xml update, there is xml
 parsing overhead with it.
 Switching to solrJ or CSV can cause additional gain.

 http://wiki.apache.org/lucene-java/ImproveIndexingSpeed

 Ahmet


 On Wednesday, March 5, 2014 10:13 PM, sweety sweetyshind...@yahoo.com
 wrote:
 I will surely read about JVM Garbage collection. Thanks a lot, all of you.

 But, is the time required for my indexing good enough? I dont know about
 the
 ideal timings.
 I think that my indexing is taking more time.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391p4121483.html

 Sent from the Solr - User mailing list archive at Nabble.com.




Re: to reduce indexing time

2014-03-05 Thread Ahmet Arslan
Hi Toby,

SolrJ uses javabin by default.

Ahmet


On Wednesday, March 5, 2014 11:31 PM, Toby Lazar tla...@capitaltg.com wrote:
I believe SolrJ uses XML under the covers.  If so, I don't think you would
improve performance by switching to SolrJ, since the client would convert
it to XML before sending it on the wire.

Toby

***
  Toby Lazar
  Capital Technology Group
  Email: tla...@capitaltg.com
  Mobile: 646-469-5865
***



On Wed, Mar 5, 2014 at 3:25 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi,

 One thing to consider is, I think solrnet use xml update, there is xml
 parsing overhead with it.
 Switching to solrJ or CSV can cause additional gain.

 http://wiki.apache.org/lucene-java/ImproveIndexingSpeed

 Ahmet


 On Wednesday, March 5, 2014 10:13 PM, sweety sweetyshind...@yahoo.com
 wrote:
 I will surely read about JVM Garbage collection. Thanks a lot, all of you.

 But, is the time required for my indexing good enough? I dont know about
 the
 ideal timings.
 I think that my indexing is taking more time.



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391p4121483.html

 Sent from the Solr - User mailing list archive at Nabble.com.





Re: to reduce indexing time

2014-03-05 Thread Toby Lazar
Thanks Ahmet for the correction.  I used wireshark to capture an
UpdateRequest to solr and saw this XML:

adddoc boost=1.0field name=caseID123/fieldfield
name=caseNameblah/field/doc/add

and figured that javabin was only for the responses.  Does wt apply for how
solrj send requests to solr?  Could this HTTP content be in javabin format?

Toby


On Wed, Mar 5, 2014 at 4:34 PM, Ahmet Arslan iori...@yahoo.com wrote:

 Hi Toby,

 SolrJ uses javabin by default.

 Ahmet


 On Wednesday, March 5, 2014 11:31 PM, Toby Lazar tla...@capitaltg.com
 wrote:
 I believe SolrJ uses XML under the covers.  If so, I don't think you would
 improve performance by switching to SolrJ, since the client would convert
 it to XML before sending it on the wire.

 Toby

 ***
   Toby Lazar
   Capital Technology Group
   Email: tla...@capitaltg.com
   Mobile: 646-469-5865
 ***



 On Wed, Mar 5, 2014 at 3:25 PM, Ahmet Arslan iori...@yahoo.com wrote:

  Hi,
 
  One thing to consider is, I think solrnet use xml update, there is xml
  parsing overhead with it.
  Switching to solrJ or CSV can cause additional gain.
 
  http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
 
  Ahmet
 
 
  On Wednesday, March 5, 2014 10:13 PM, sweety sweetyshind...@yahoo.com
  wrote:
  I will surely read about JVM Garbage collection. Thanks a lot, all of
 you.
 
  But, is the time required for my indexing good enough? I dont know about
  the
  ideal timings.
  I think that my indexing is taking more time.
 
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/to-reduce-indexing-time-tp4121391p4121483.html
 
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 




Re: to reduce indexing time

2014-03-05 Thread Shawn Heisey

On 3/5/2014 2:31 PM, Toby Lazar wrote:

I believe SolrJ uses XML under the covers.  If so, I don't think you would
improve performance by switching to SolrJ, since the client would convert
it to XML before sending it on the wire.


Until recently, SolrJ always used XML by default for requests and 
javabin for responses.  That is moving to javabin for both.  This is 
already the case in the newest versions for CloudSolrServer.  
HttpSolrServer is still using the XML RequestWriter by default, but you 
can change this very easily to BinaryRequestWriter.  If you plan to use 
SolrJ, it's a change I would highly recommend.


Thanks,
Shawn



Re: to reduce indexing time

2014-03-05 Thread Toby Lazar
OK, I was using HttpSolrServer since I haven't yet migrated to
CloudSolrServer.  I added the line:

   solrServer.setRequestWriter(new BinaryRequestWriter())

after creating the server object and now see the difference through
wireshark.  Is it fair to assume that this usage is multi-thread safe?

Thank you Shawn and Ahmet,

Toby

***
  Toby Lazar
  Capital Technology Group
  Email: tla...@capitaltg.com
  Mobile: 646-469-5865
***


On Wed, Mar 5, 2014 at 4:46 PM, Shawn Heisey s...@elyograg.org wrote:

 On 3/5/2014 2:31 PM, Toby Lazar wrote:

 I believe SolrJ uses XML under the covers.  If so, I don't think you would
 improve performance by switching to SolrJ, since the client would convert
 it to XML before sending it on the wire.


 Until recently, SolrJ always used XML by default for requests and javabin
 for responses.  That is moving to javabin for both.  This is already the
 case in the newest versions for CloudSolrServer.  HttpSolrServer is still
 using the XML RequestWriter by default, but you can change this very easily
 to BinaryRequestWriter.  If you plan to use SolrJ, it's a change I would
 highly recommend.

 Thanks,
 Shawn




Re: to reduce indexing time

2014-03-05 Thread Shawn Heisey

On 3/5/2014 2:58 PM, Toby Lazar wrote:

OK, I was using HttpSolrServer since I haven't yet migrated to
CloudSolrServer.  I added the line:

solrServer.setRequestWriter(new BinaryRequestWriter())

after creating the server object and now see the difference through
wireshark.  Is it fair to assume that this usage is multi-thread safe?


Yes, SolrServer is entirely threadsafe.  You can have one SolrServer 
object (for each Solr core) and use it throughout your entire application.


Thanks,
Shawn