Re: Indexing huge data onto solr

2020-05-26 Thread Erick Erickson
ch one of parent > tuples and execute the child entity sql’s(with where condition of parent) to > create one solr document? Won’t it be more load on database by executing more > sqls? Is there an optimum solution? > > Thanks, > Srinivas > From: Erick Erickson > Sent: 22 May 2

RE: Indexing huge data onto solr

2020-05-25 Thread Srinivas Kashyap
22:52 To: solr-user@lucene.apache.org Subject: Re: Indexing huge data onto solr You have a lot more control over the speed and form of importing data if you just do the initial load in SolrJ. Here’s an example, taking the Tika parts out is easy: https://lucidworks.com/post/indexing-with-solrj

Re: Indexing huge data onto solr

2020-05-22 Thread matthew sporleder
I can index (without nested entities ofc ;) ) 100M records in about 6-8 hours on a pretty low-powered machine using vanilla DIH -> mysql so it is probably worth looking at why it is going slow before writing your own indexer (which we are finally having to do) On Fri, May 22, 2020 at 1:22 PM Erick

Re: Indexing huge data onto solr

2020-05-22 Thread Erick Erickson
You have a lot more control over the speed and form of importing data if you just do the initial load in SolrJ. Here’s an example, taking the Tika parts out is easy: https://lucidworks.com/post/indexing-with-solrj/ It’s especially instructive to comment out just the call to CloudSolrClient.add(d

Indexing huge data onto solr

2020-05-22 Thread Srinivas Kashyap
Hi All, We are runnnig solr 8.4.1. We have a database table which has more than 100 million of records. Till now we were using DIH to do full-import on the tables. But for this table, when we do full-import via DIH it is taking more than 3-4 days to complete and also it consumes fair bit of JVM

Re: Indexing huge data

2014-03-08 Thread Rallavagu
[mailto:erickerick...@gmail.com] Sent: Wednesday, March 05, 2014 8:03 PM To: solr-user@lucene.apache.org Subject: Re: Indexing huge data Here's the easiest thing to try to figure out where to concentrate your energies. Just comment out the server.add call in your SolrJ program. Well, and a

Re: Indexing huge data

2014-03-07 Thread Erick Erickson
nto Solr >>> using CSV handler & curl. This will give you the pure indexing time & the >>> differences. >>> >>> Thanks, >>> Susheel >>> >>> -Original Message- >>> From: Erick Erickson [mailto:erickerick...@gmail.

Re: Indexing huge data

2014-03-06 Thread Kranti Parisa
al Message- >> From: Erick Erickson [mailto:erickerick...@gmail.com] >> Sent: Wednesday, March 05, 2014 8:03 PM >> To: solr-user@lucene.apache.org >> Subject: Re: Indexing huge data >> >> Here's the easiest thing to try to figure out where to concentrate your >

Re: Indexing huge data

2014-03-06 Thread Rallavagu
into Solr using CSV handler & curl. This will give you the pure indexing time & the differences. Thanks, Susheel -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, March 05, 2014 8:03 PM To: solr-user@lucene.apache.org Subject: Re: Indexing h

Re: Indexing huge data

2014-03-06 Thread Rallavagu
avagu Sent: Wednesday, March 5, 2014 2:37 PM To: solr-user@lucene.apache.org Subject: Indexing huge data All, Wondering about best practices/common practices to index/re-index huge amount of data in Solr. The data is about 6 million entries in the db and other source (data is not located in on

RE: Indexing huge data

2014-03-05 Thread Susheel Kumar
m: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, March 05, 2014 8:03 PM To: solr-user@lucene.apache.org Subject: Re: Indexing huge data Here's the easiest thing to try to figure out where to concentrate your energies. Just comment out the server.add call in your SolrJ progra

Re: Indexing huge data

2014-03-05 Thread Erick Erickson
dd. Commit > every few minutes or every few hundred or few thousand documents is > sufficient. You can set up auto commit in solrconfig.xml. > > -- Jack Krupansky > > -Original Message- From: Rallavagu > Sent: Wednesday, March 5, 2014 2:37 PM > To: solr-user@lucene.ap

Re: Indexing huge data

2014-03-05 Thread Jack Krupansky
rch 5, 2014 2:37 PM To: solr-user@lucene.apache.org Subject: Indexing huge data All, Wondering about best practices/common practices to index/re-index huge amount of data in Solr. The data is about 6 million entries in the db and other source (data is not located in one resource). Trying with solrj

Re: Indexing huge data

2014-03-05 Thread Otis Gospodnetic
Hi, Each doc is 100K? That's on the big side, yes, and the server seems on the small side, yes. Hence the "speed". :) Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ On Wed, Mar 5, 2014 at 3:37 PM, Rallavagu wrote: > Otis

Re: Indexing huge data

2014-03-05 Thread Rallavagu
Otis, Good points. I guess you are suggesting that it depends on the resources. The document is 100k each the pre processing server is a 2 cpu VM running with 4G RAM. So, that could be a "small" machine relatively to process such amount of data?? On 3/5/14, 12:27 PM, Otis Gospodnetic wrote:

Re: Indexing huge data

2014-03-05 Thread Otis Gospodnetic
Hi, It depends. Are docs huge or small? Server single core or 32 core? Heap big or small? etc. etc. Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ On Wed, Mar 5, 2014 at 3:02 PM, Rallavagu wrote: > It seems the latency

Re: Indexing huge data

2014-03-05 Thread Rallavagu
It seems the latency is introduced by collecting the data from different sources and putting them together then actual Solr index. I would say all these activities are contributing equally though I would say So, is it normal to expect to run indexing to run for long? Wondering what to expect in

Re: Indexing huge data

2014-03-05 Thread Otis Gospodnetic
Hi, 6M is really not huge these days. 6B is big, though also still not huge any more. What seems to be the bottleneck? Solr or DB or network or something else? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr & Elasticsearch Support * http://sematext.com/ On Wed, Mar 5,

Indexing huge data

2014-03-05 Thread Rallavagu
All, Wondering about best practices/common practices to index/re-index huge amount of data in Solr. The data is about 6 million entries in the db and other source (data is not located in one resource). Trying with solrj based solution to collect data from difference resources to index into So