I narrowed down the cause. And it is a character issue! The .msg file content I'm extracting using Tika parser has this text (daƱos) If I remove the character n with the tilde, it works.
Explicitly convert to UTF-8 before sending it to solr? Erick - I'm in the QA phase. I'll be ingesting around 800K documents total (word, pdf, excel, .msg, txt, etc.) For now I'm considering daily updates when we first go to prod end of month. i.e., capture all the new and modified documents on a daily basis and update solr. Once we get a grasp of things, we want to go near real time. Thanks for the link to your post. It is very helpful. -----Original Message----- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Sunday, July 12, 2015 11:24 AM To: solr-user@lucene.apache.org Subject: Re: Solr cloud error during document ingestion Probably not related to your problem, but if you're sending lots of docs at Solr, committing every 100 is very aggressive. I'm assuming you're committing from the client, which, while OK doesn't scale very well if you ever decide to have more than 1 client sending docs. I'd recommend setting your hard commit to a minute or so and just leaving it at that if possible, with soft committing to make the docs visible. Here's more than you ever wanted to know about soft commits, hard commits and such: https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/ Best, Erick On Sun, Jul 12, 2015 at 8:40 AM, Mikhail Khludnev <mkhlud...@griddynamics.com> wrote: > I suggest to check > http://10.222.238.35:8983/solr/serviceorder_shard1_replica2 > <http://10.222.238.35:8983/solr/serviceorder_shard1_replica2/update?up > date.distrib=TOLEADER&distrib.from=http%3A%2F%2F10.222.238.36%3A8983%2 > Fsolr%2Fserviceorder_shard2_replica1%2F&wt=javabin&version=2> > logs to find root cause. > > On Sun, Jul 12, 2015 at 6:33 AM, Tarala, Magesh <mtar...@bh.com> wrote: > >> I'm using 4.10.2 in a 3 node solr cloud setup I have a collection >> with 3 shards and 2 replicas each. >> I'm ingesting solr documents via solrj. >> >> While ingesting the documents, I get the following error: >> >> 264147944 [updateExecutor-1-thread-268] ERROR >> org.apache.solr.update.StreamingSolrServers ? error >> org.apache.solr.common.SolrException: Bad Request >> >> request: >> http://10.222.238.35:8983/solr/serviceorder_shard1_replica2/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F10.222.238.36%3A8983%2Fsolr%2Fserviceorder_shard2_replica1%2F&wt=javabin&version=2 >> at >> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:241) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) >> at java.lang.Thread.run(Thread.java:745) >> >> I commit after every 100 documents in solrj. >> And I also have the following solrconfig.xml setting: >> <autoCommit> >> <maxTime>${solr.autoCommit.maxTime:15000}</maxTime> >> <openSearcher>false</openSearcher> >> </autoCommit> >> >> >> IMO, tlogs (for serviceorder_shard1_replica2) are not too big >> -rw-r--r-- 1 solr users 8338 Jul 11 21:40 tlog.0000000000000000364 >> -rw-r--r-- 1 solr users 6385 Jul 11 21:40 tlog.0000000000000000365 >> -rw-r--r-- 1 solr users 10221 Jul 11 21:41 tlog.0000000000000000366 >> -rw-r--r-- 1 solr users 5981 Jul 11 21:41 tlog.0000000000000000367 >> -rw-r--r-- 1 solr users 2682 Jul 11 21:41 tlog.0000000000000000368 >> -rw-r--r-- 1 solr users 8515 Jul 11 21:42 tlog.0000000000000000369 >> -rw-r--r-- 1 solr users 7373 Jul 11 21:42 tlog.0000000000000000370 >> -rw-r--r-- 1 solr users 6907 Jul 11 21:42 tlog.0000000000000000371 >> -rw-r--r-- 1 solr users 5524 Jul 11 21:42 tlog.0000000000000000372 >> -rw-r--r-- 1 solr users 5600 Jul 11 21:43 tlog.0000000000000000373 >> >> >> So far I've not been able to resolve this issue. Any ideas / pointers >> would be greatly appreciated! >> >> Thanks, >> Magesh >> >> > > > -- > Sincerely yours > Mikhail Khludnev > Principal Engineer, > Grid Dynamics > > <http://www.griddynamics.com> > <mkhlud...@griddynamics.com>