ch one of parent
> tuples and execute the child entity sql’s(with where condition of parent) to
> create one solr document? Won’t it be more load on database by executing more
> sqls? Is there an optimum solution?
>
> Thanks,
> Srinivas
> From: Erick Erickson
> Sent: 22 May 2
22:52
To: solr-user@lucene.apache.org
Subject: Re: Indexing huge data onto solr
You have a lot more control over the speed and form of importing data if
you just do the initial load in SolrJ. Here’s an example, taking the Tika
parts out is easy:
https://lucidworks.com/post/indexing-with-solrj
I can index (without nested entities ofc ;) ) 100M records in about
6-8 hours on a pretty low-powered machine using vanilla DIH -> mysql
so it is probably worth looking at why it is going slow before writing
your own indexer (which we are finally having to do)
On Fri, May 22, 2020 at 1:22 PM Erick
You have a lot more control over the speed and form of importing data if
you just do the initial load in SolrJ. Here’s an example, taking the Tika
parts out is easy:
https://lucidworks.com/post/indexing-with-solrj/
It’s especially instructive to comment out just the call to
CloudSolrClient.add(d
Hi All,
We are runnnig solr 8.4.1. We have a database table which has more than 100
million of records. Till now we were using DIH to do full-import on the tables.
But for this table, when we do full-import via DIH it is taking more than 3-4
days to complete and also it consumes fair bit of JVM
[mailto:erickerick...@gmail.com]
Sent: Wednesday, March 05, 2014 8:03 PM
To: solr-user@lucene.apache.org
Subject: Re: Indexing huge data
Here's the easiest thing to try to figure out where to concentrate your
energies. Just comment out the server.add call in your SolrJ program.
Well, and a
nto Solr
>>> using CSV handler & curl. This will give you the pure indexing time & the
>>> differences.
>>>
>>> Thanks,
>>> Susheel
>>>
>>> -Original Message-
>>> From: Erick Erickson [mailto:erickerick...@gmail.
al Message-
>> From: Erick Erickson [mailto:erickerick...@gmail.com]
>> Sent: Wednesday, March 05, 2014 8:03 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Indexing huge data
>>
>> Here's the easiest thing to try to figure out where to concentrate your
>
into Solr using CSV handler & curl.
This will give you the pure indexing time & the differences.
Thanks,
Susheel
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Wednesday, March 05, 2014 8:03 PM
To: solr-user@lucene.apache.org
Subject: Re: Indexing h
avagu
Sent: Wednesday, March 5, 2014 2:37 PM
To: solr-user@lucene.apache.org
Subject: Indexing huge data
All,
Wondering about best practices/common practices to index/re-index huge
amount of data in Solr. The data is about 6 million entries in the db
and other source (data is not located in on
m: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: Wednesday, March 05, 2014 8:03 PM
To: solr-user@lucene.apache.org
Subject: Re: Indexing huge data
Here's the easiest thing to try to figure out where to concentrate your
energies. Just comment out the server.add call in your SolrJ progra
dd. Commit
> every few minutes or every few hundred or few thousand documents is
> sufficient. You can set up auto commit in solrconfig.xml.
>
> -- Jack Krupansky
>
> -Original Message- From: Rallavagu
> Sent: Wednesday, March 5, 2014 2:37 PM
> To: solr-user@lucene.ap
rch 5, 2014 2:37 PM
To: solr-user@lucene.apache.org
Subject: Indexing huge data
All,
Wondering about best practices/common practices to index/re-index huge
amount of data in Solr. The data is about 6 million entries in the db
and other source (data is not located in one resource). Trying with
solrj
Hi,
Each doc is 100K? That's on the big side, yes, and the server seems on the
small side, yes. Hence the "speed". :)
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/
On Wed, Mar 5, 2014 at 3:37 PM, Rallavagu wrote:
> Otis
Otis,
Good points. I guess you are suggesting that it depends on the
resources. The document is 100k each the pre processing server is a 2
cpu VM running with 4G RAM. So, that could be a "small" machine
relatively to process such amount of data??
On 3/5/14, 12:27 PM, Otis Gospodnetic wrote:
Hi,
It depends. Are docs huge or small? Server single core or 32 core? Heap
big or small? etc. etc.
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/
On Wed, Mar 5, 2014 at 3:02 PM, Rallavagu wrote:
> It seems the latency
It seems the latency is introduced by collecting the data from different
sources and putting them together then actual Solr index. I would say
all these activities are contributing equally though I would say So, is
it normal to expect to run indexing to run for long? Wondering what to
expect in
Hi,
6M is really not huge these days. 6B is big, though also still not huge
any more. What seems to be the bottleneck? Solr or DB or network or
something else?
Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/
On Wed, Mar 5,
All,
Wondering about best practices/common practices to index/re-index huge
amount of data in Solr. The data is about 6 million entries in the db
and other source (data is not located in one resource). Trying with
solrj based solution to collect data from difference resources to index
into So
19 matches
Mail list logo