Re: solrcloud and csv import hangs

Yonik Seeley Mon, 24 Sep 2012 08:43:18 -0700

On Mon, Sep 24, 2012 at 11:03 AM, dan sutton <danbsut...@gmail.com> wrote:
> Hi,
>
> This appears to happen in trunk too.
>
> It appears that the add command request parameters get sent to the
> nodes. If I comment these out like so for add and commit:
>
> core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java
>
> -      params = new ModifiableSolrParams(req.getParams());
> +      //params = new ModifiableSolrParams(req.getParams());
> +      params = new ModifiableSolrParams();
>
> This things work as expected.
>
> Otherwise params like stream.url gets sent to the replicant nodes
> which causes failure if the file is missing, or worse repeatedly
> importing the same file if exists on a replicant.
>
> This might not be the right thing to do? ... what should be sent here
> for a streaming CSV import?
>
> Dan
>
>
> On Thu, Sep 20, 2012 at 4:32 PM, dan sutton <danbsut...@gmail.com> wrote:
>> Hi,
>>
>> I'm using Solr 4.0-BETA and trying to import a CSV file as follows:
>>
>> curl http://localhost:8080/solr/<core>/update -d overwrite=false -d
>> commit=true -d stream.contentType='text/csv;charset=utf-8' -d
>> stream.url=file:///dir/file.csv
>>
>> I have 2 tomcat servers running on different machines and a separate
>> zookeeper quorum (3  zoo servers, 2 on same machine).  This is a 1
>> shard core, replicated to the other machine.
>>
>> It seems that for a 255K line file I have 170 docs on the server that
>> issued the command, but on the other, the index seems to grow
>> unbounded?
>>
>> Has anyone been seen this, or been successful in using the CSV import
>> with solrcloud?


Yikes! Thanks for investigating, this looks pretty serious.
Could you open a JIRA issue for this bug?

-Yonik
http://lucidworks.com

Re: solrcloud and csv import hangs

Reply via email to