Re: Slow Indexing scaling issue

2019-08-19 Thread Furkan KAMACI
Hi Parmeshwor, 2 hours for 3 gb of data seems too slow. We scale up to PBs in such a way: 1) Ignore all commits from client via IgnoreCommitOptimizeUpdateProcessorFactory 2) Heavy processes are done on external Tika server instead of Solr Cell with embedded Tika feature. 3) Adjust autocommit, sof

Re: Slow Indexing scaling issue

2019-08-13 Thread Erick Erickson
Here’s some sample SolrJ code using TIka outside of Solr’s Extracting Request Handler, along with some info about why loading Solr with the job of extracting text is not optimal speed wise: https://lucidworks.com/post/indexing-with-solrj/ > On Aug 13, 2019, at 12:15 PM, Jan Høydahl wrote: > >

Re: Slow Indexing scaling issue

2019-08-13 Thread Jan Høydahl
You May want to review https://cwiki.apache.org/confluence/display/SOLR/SolrPerformanceProblems#SolrPerformanceProblems-SlowIndexing for some hints. Make sure to index with multiple parallel threads. Also remember that using /extract on the solr side is resource intensive and may make your clus

Slow Indexing scaling issue

2019-08-13 Thread Parmeshwor Thapa
Hi, We are having some issue on scaling solr indexing. Looking for suggestion. Setup : We have two solr cloud (7.4) instances running in separate cloud VMs with an external zookeeper ensemble. We are sending async / non-blocking http request to index documents in solr. 2 cloud VMs ( 4 core * 3