solrcloud and csv import hangs
Hi, This appears to happen in trunk too. It appears that the add command request parameters get sent to the nodes. If I comment these out like so for add and commit: core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java - params = new ModifiableSolrParams(req.getParams()); + //params = new ModifiableSolrParams(req.getParams()); + params = new ModifiableSolrParams(); This things work as expected. Otherwise params like stream.url gets sent to the replicant nodes which causes failure if the file is missing, or worse repeatedly importing the same file if exists on a replicant. This might not be the right thing to do? ... what should be sent here for a streaming CSV import? Dan On Thu, Sep 20, 2012 at 4:32 PM, dan sutton danbsut...@gmail.com wrote: Hi, I'm using Solr 4.0-BETA and trying to import a CSV file as follows: curl http://localhost:8080/solr/core/update -d overwrite=false -d commit=true -d stream.contentType='text/csv;charset=utf-8' -d stream.url=file:///dir/file.csv I have 2 tomcat servers running on different machines and a separate zookeeper quorum (3 zoo servers, 2 on same machine). This is a 1 shard core, replicated to the other machine. It seems that for a 255K line file I have 170 docs on the server that issued the command, but on the other, the index seems to grow unbounded? Has anyone been seen this, or been successful in using the CSV import with solrcloud? Cheers, Dan
Re: solrcloud and csv import hangs
On Mon, Sep 24, 2012 at 11:03 AM, dan sutton danbsut...@gmail.com wrote: Hi, This appears to happen in trunk too. It appears that the add command request parameters get sent to the nodes. If I comment these out like so for add and commit: core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java - params = new ModifiableSolrParams(req.getParams()); + //params = new ModifiableSolrParams(req.getParams()); + params = new ModifiableSolrParams(); This things work as expected. Otherwise params like stream.url gets sent to the replicant nodes which causes failure if the file is missing, or worse repeatedly importing the same file if exists on a replicant. This might not be the right thing to do? ... what should be sent here for a streaming CSV import? Dan On Thu, Sep 20, 2012 at 4:32 PM, dan sutton danbsut...@gmail.com wrote: Hi, I'm using Solr 4.0-BETA and trying to import a CSV file as follows: curl http://localhost:8080/solr/core/update -d overwrite=false -d commit=true -d stream.contentType='text/csv;charset=utf-8' -d stream.url=file:///dir/file.csv I have 2 tomcat servers running on different machines and a separate zookeeper quorum (3 zoo servers, 2 on same machine). This is a 1 shard core, replicated to the other machine. It seems that for a 255K line file I have 170 docs on the server that issued the command, but on the other, the index seems to grow unbounded? Has anyone been seen this, or been successful in using the CSV import with solrcloud? Yikes! Thanks for investigating, this looks pretty serious. Could you open a JIRA issue for this bug? -Yonik http://lucidworks.com
Re: solrcloud and csv import hangs
https://issues.apache.org/jira/browse/SOLR-3883 -Yonik http://lucidworks.com On Mon, Sep 24, 2012 at 11:42 AM, Yonik Seeley yo...@lucidworks.com wrote: On Mon, Sep 24, 2012 at 11:03 AM, dan sutton danbsut...@gmail.com wrote: Hi, This appears to happen in trunk too. It appears that the add command request parameters get sent to the nodes. If I comment these out like so for add and commit: core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java - params = new ModifiableSolrParams(req.getParams()); + //params = new ModifiableSolrParams(req.getParams()); + params = new ModifiableSolrParams(); This things work as expected. Otherwise params like stream.url gets sent to the replicant nodes which causes failure if the file is missing, or worse repeatedly importing the same file if exists on a replicant. This might not be the right thing to do? ... what should be sent here for a streaming CSV import? Dan On Thu, Sep 20, 2012 at 4:32 PM, dan sutton danbsut...@gmail.com wrote: Hi, I'm using Solr 4.0-BETA and trying to import a CSV file as follows: curl http://localhost:8080/solr/core/update -d overwrite=false -d commit=true -d stream.contentType='text/csv;charset=utf-8' -d stream.url=file:///dir/file.csv I have 2 tomcat servers running on different machines and a separate zookeeper quorum (3 zoo servers, 2 on same machine). This is a 1 shard core, replicated to the other machine. It seems that for a 255K line file I have 170 docs on the server that issued the command, but on the other, the index seems to grow unbounded? Has anyone been seen this, or been successful in using the CSV import with solrcloud? Yikes! Thanks for investigating, this looks pretty serious. Could you open a JIRA issue for this bug? -Yonik http://lucidworks.com
solrcloud and csv import hangs
Hi, I'm using Solr 4.0-BETA and trying to import a CSV file as follows: curl http://localhost:8080/solr/core/update -d overwrite=false -d commit=true -d stream.contentType='text/csv;charset=utf-8' -d stream.url=file:///dir/file.csv I have 2 tomcat servers running on different machines and a separate zookeeper quorum (3 zoo servers, 2 on same machine). This is a 1 shard core, replicated to the other machine. It seems that for a 255K line file I have 170 docs on the server that issued the command, but on the other, the index seems to grow unbounded? Has anyone been seen this, or been successful in using the CSV import with solrcloud? Cheers, Dan