I narrowed down the cause. And it is a character issue! 

The .msg file content I'm extracting using Tika parser has this text (daƱos)
If I remove the character n with the tilde, it works. 

Explicitly convert to UTF-8 before sending it to solr?

Erick - I'm in the QA phase. I'll be ingesting around 800K documents total 
(word, pdf, excel, .msg, txt, etc.) For now I'm considering daily updates when 
we first go to prod end of month. i.e., capture all the new and modified 
documents on a daily basis and update solr. Once we get a grasp of things, we 
want to go near real time. Thanks for the link to your post. It is very 
helpful. 



-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Sunday, July 12, 2015 11:24 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr cloud error during document ingestion

Probably not related to your problem, but if you're sending lots of docs at 
Solr, committing every 100 is very aggressive.
I'm assuming you're committing from the client, which, while OK doesn't scale 
very well if you ever decide to have more than
1 client sending docs.

I'd recommend setting your hard commit to a minute or so and just leaving it at 
that if possible, with soft committing to make the docs visible.

Here's more than you ever wanted to know about soft commits, hard commits and 
such:
https://lucidworks.com/blog/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick

On Sun, Jul 12, 2015 at 8:40 AM, Mikhail Khludnev <mkhlud...@griddynamics.com> 
wrote:
> I suggest to check
> http://10.222.238.35:8983/solr/serviceorder_shard1_replica2
> <http://10.222.238.35:8983/solr/serviceorder_shard1_replica2/update?up
> date.distrib=TOLEADER&distrib.from=http%3A%2F%2F10.222.238.36%3A8983%2
> Fsolr%2Fserviceorder_shard2_replica1%2F&wt=javabin&version=2>
> logs to find root cause.
>
> On Sun, Jul 12, 2015 at 6:33 AM, Tarala, Magesh <mtar...@bh.com> wrote:
>
>> I'm using 4.10.2 in a 3 node solr cloud setup I have a collection 
>> with 3 shards and 2 replicas each.
>> I'm ingesting solr documents via solrj.
>>
>> While ingesting the documents, I get the following error:
>>
>> 264147944 [updateExecutor-1-thread-268] ERROR 
>> org.apache.solr.update.StreamingSolrServers  ? error
>> org.apache.solr.common.SolrException: Bad Request
>>
>> request:
>> http://10.222.238.35:8983/solr/serviceorder_shard1_replica2/update?update.distrib=TOLEADER&distrib.from=http%3A%2F%2F10.222.238.36%3A8983%2Fsolr%2Fserviceorder_shard2_replica1%2F&wt=javabin&version=2
>>         at
>> org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:241)
>>         at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>         at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>        at java.lang.Thread.run(Thread.java:745)
>>
>> I commit after every 100 documents in solrj.
>> And I also have the following solrconfig.xml setting:
>>      <autoCommit>
>>        <maxTime>${solr.autoCommit.maxTime:15000}</maxTime>
>>        <openSearcher>false</openSearcher>
>>      </autoCommit>
>>
>>
>> IMO, tlogs (for serviceorder_shard1_replica2) are not too big
>> -rw-r--r-- 1 solr users  8338 Jul 11 21:40 tlog.0000000000000000364
>> -rw-r--r-- 1 solr users  6385 Jul 11 21:40 tlog.0000000000000000365
>> -rw-r--r-- 1 solr users 10221 Jul 11 21:41 tlog.0000000000000000366
>> -rw-r--r-- 1 solr users  5981 Jul 11 21:41 tlog.0000000000000000367
>> -rw-r--r-- 1 solr users  2682 Jul 11 21:41 tlog.0000000000000000368
>> -rw-r--r-- 1 solr users  8515 Jul 11 21:42 tlog.0000000000000000369
>> -rw-r--r-- 1 solr users  7373 Jul 11 21:42 tlog.0000000000000000370
>> -rw-r--r-- 1 solr users  6907 Jul 11 21:42 tlog.0000000000000000371
>> -rw-r--r-- 1 solr users  5524 Jul 11 21:42 tlog.0000000000000000372
>> -rw-r--r-- 1 solr users  5600 Jul 11 21:43 tlog.0000000000000000373
>>
>>
>> So far I've not been able to resolve this issue. Any ideas / pointers 
>> would be greatly appreciated!
>>
>> Thanks,
>> Magesh
>>
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
> <mkhlud...@griddynamics.com>

Reply via email to