solrcloud and csv import hangs

2012-09-24 Thread dan sutton
Hi,

This appears to happen in trunk too.

It appears that the add command request parameters get sent to the
nodes. If I comment these out like so for add and commit:

core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java

-  params = new ModifiableSolrParams(req.getParams());
+  //params = new ModifiableSolrParams(req.getParams());
+  params = new ModifiableSolrParams();

This things work as expected.

Otherwise params like stream.url gets sent to the replicant nodes
which causes failure if the file is missing, or worse repeatedly
importing the same file if exists on a replicant.

This might not be the right thing to do? ... what should be sent here
for a streaming CSV import?

Dan


On Thu, Sep 20, 2012 at 4:32 PM, dan sutton danbsut...@gmail.com wrote:
 Hi,

 I'm using Solr 4.0-BETA and trying to import a CSV file as follows:

 curl http://localhost:8080/solr/core/update -d overwrite=false -d
 commit=true -d stream.contentType='text/csv;charset=utf-8' -d
 stream.url=file:///dir/file.csv

 I have 2 tomcat servers running on different machines and a separate
 zookeeper quorum (3  zoo servers, 2 on same machine).  This is a 1
 shard core, replicated to the other machine.

 It seems that for a 255K line file I have 170 docs on the server that
 issued the command, but on the other, the index seems to grow
 unbounded?

 Has anyone been seen this, or been successful in using the CSV import
 with solrcloud?

 Cheers,
 Dan


Re: solrcloud and csv import hangs

2012-09-24 Thread Yonik Seeley
On Mon, Sep 24, 2012 at 11:03 AM, dan sutton danbsut...@gmail.com wrote:
 Hi,

 This appears to happen in trunk too.

 It appears that the add command request parameters get sent to the
 nodes. If I comment these out like so for add and commit:

 core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java

 -  params = new ModifiableSolrParams(req.getParams());
 +  //params = new ModifiableSolrParams(req.getParams());
 +  params = new ModifiableSolrParams();

 This things work as expected.

 Otherwise params like stream.url gets sent to the replicant nodes
 which causes failure if the file is missing, or worse repeatedly
 importing the same file if exists on a replicant.

 This might not be the right thing to do? ... what should be sent here
 for a streaming CSV import?

 Dan


 On Thu, Sep 20, 2012 at 4:32 PM, dan sutton danbsut...@gmail.com wrote:
 Hi,

 I'm using Solr 4.0-BETA and trying to import a CSV file as follows:

 curl http://localhost:8080/solr/core/update -d overwrite=false -d
 commit=true -d stream.contentType='text/csv;charset=utf-8' -d
 stream.url=file:///dir/file.csv

 I have 2 tomcat servers running on different machines and a separate
 zookeeper quorum (3  zoo servers, 2 on same machine).  This is a 1
 shard core, replicated to the other machine.

 It seems that for a 255K line file I have 170 docs on the server that
 issued the command, but on the other, the index seems to grow
 unbounded?

 Has anyone been seen this, or been successful in using the CSV import
 with solrcloud?

Yikes! Thanks for investigating, this looks pretty serious.
Could you open a JIRA issue for this bug?

-Yonik
http://lucidworks.com


Re: solrcloud and csv import hangs

2012-09-24 Thread Yonik Seeley
https://issues.apache.org/jira/browse/SOLR-3883

-Yonik
http://lucidworks.com


On Mon, Sep 24, 2012 at 11:42 AM, Yonik Seeley yo...@lucidworks.com wrote:
 On Mon, Sep 24, 2012 at 11:03 AM, dan sutton danbsut...@gmail.com wrote:
 Hi,

 This appears to happen in trunk too.

 It appears that the add command request parameters get sent to the
 nodes. If I comment these out like so for add and commit:

 core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java

 -  params = new ModifiableSolrParams(req.getParams());
 +  //params = new ModifiableSolrParams(req.getParams());
 +  params = new ModifiableSolrParams();

 This things work as expected.

 Otherwise params like stream.url gets sent to the replicant nodes
 which causes failure if the file is missing, or worse repeatedly
 importing the same file if exists on a replicant.

 This might not be the right thing to do? ... what should be sent here
 for a streaming CSV import?

 Dan


 On Thu, Sep 20, 2012 at 4:32 PM, dan sutton danbsut...@gmail.com wrote:
 Hi,

 I'm using Solr 4.0-BETA and trying to import a CSV file as follows:

 curl http://localhost:8080/solr/core/update -d overwrite=false -d
 commit=true -d stream.contentType='text/csv;charset=utf-8' -d
 stream.url=file:///dir/file.csv

 I have 2 tomcat servers running on different machines and a separate
 zookeeper quorum (3  zoo servers, 2 on same machine).  This is a 1
 shard core, replicated to the other machine.

 It seems that for a 255K line file I have 170 docs on the server that
 issued the command, but on the other, the index seems to grow
 unbounded?

 Has anyone been seen this, or been successful in using the CSV import
 with solrcloud?

 Yikes! Thanks for investigating, this looks pretty serious.
 Could you open a JIRA issue for this bug?

 -Yonik
 http://lucidworks.com


solrcloud and csv import hangs

2012-09-20 Thread dan sutton
Hi,

I'm using Solr 4.0-BETA and trying to import a CSV file as follows:

curl http://localhost:8080/solr/core/update -d overwrite=false -d
commit=true -d stream.contentType='text/csv;charset=utf-8' -d
stream.url=file:///dir/file.csv

I have 2 tomcat servers running on different machines and a separate
zookeeper quorum (3  zoo servers, 2 on same machine).  This is a 1
shard core, replicated to the other machine.

It seems that for a 255K line file I have 170 docs on the server that
issued the command, but on the other, the index seems to grow
unbounded?

Has anyone been seen this, or been successful in using the CSV import
with solrcloud?

Cheers,
Dan