On Mon, Sep 24, 2012 at 11:03 AM, dan sutton <danbsut...@gmail.com> wrote: > Hi, > > This appears to happen in trunk too. > > It appears that the add command request parameters get sent to the > nodes. If I comment these out like so for add and commit: > > core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java > > - params = new ModifiableSolrParams(req.getParams()); > + //params = new ModifiableSolrParams(req.getParams()); > + params = new ModifiableSolrParams(); > > This things work as expected. > > Otherwise params like stream.url gets sent to the replicant nodes > which causes failure if the file is missing, or worse repeatedly > importing the same file if exists on a replicant. > > This might not be the right thing to do? ... what should be sent here > for a streaming CSV import? > > Dan > > > On Thu, Sep 20, 2012 at 4:32 PM, dan sutton <danbsut...@gmail.com> wrote: >> Hi, >> >> I'm using Solr 4.0-BETA and trying to import a CSV file as follows: >> >> curl http://localhost:8080/solr/<core>/update -d overwrite=false -d >> commit=true -d stream.contentType='text/csv;charset=utf-8' -d >> stream.url=file:///dir/file.csv >> >> I have 2 tomcat servers running on different machines and a separate >> zookeeper quorum (3 zoo servers, 2 on same machine). This is a 1 >> shard core, replicated to the other machine. >> >> It seems that for a 255K line file I have 170 docs on the server that >> issued the command, but on the other, the index seems to grow >> unbounded? >> >> Has anyone been seen this, or been successful in using the CSV import >> with solrcloud?
Yikes! Thanks for investigating, this looks pretty serious. Could you open a JIRA issue for this bug? -Yonik http://lucidworks.com