Re: How to do multi-threading indexing on huge volume of JSON files?

2018-05-08 Thread Erick Erickson
I'd seriously consider a SolrJ program rather than posting, posting files is really intended to be a simple way to get started, when it comes to indexing large volumes it's not very efficient. As a comparison, I index 3-4K docs/second (Wikipedia dump) on my macbook pro. Note that if each of your

How to do multi-threading indexing on huge volume of JSON files?

2018-05-08 Thread Raymond Xie
I have a huge amount of JSON files to be indexed in Solr, it costs me 22 minutes to index 300,000 JSON files which were generated from 1 single bz2 file, this is only 0.25% of the total amount of data from the same business flow, there are 100+ business flow to be index'ed. I absolutely need a