On 5/6/2013 10:48 AM, Kevin Osborn wrote:
I am looking to export a large amount of data from Solr. This export will
be done by a Java application and then written to file. Initially, I was
thinking of using direct HTTP calls and using the CSV response writer. And
then my Java application can quickly parse each line from a stream.

But, with SolrCloud, I prefer to use SolrJ due to its communication with
Zookeeper. Is there any way to use the CSV response writer with SolrJ?

Would the overhead of using SolrJ's "solrbin" format be much slower than
the CSV response writer?

What do you intend to do with the exported data? If you're going to use it to import into a new Solr index, you might be better off using the dataimport handler with SolrEntityProcessor. Just point it at one of your servers and include the collection name in the URL.

If the export will have other uses and CSV format will work for you, that would probably be more efficient than something you could whip together quickly with SolrJ. If you've got really excellent java skills and have a lot of time to work on it, you might be able to write something efficient, but Solr can already do it.

If you plan to page through your data rather than grab it all with one query, it is MUCH more efficient to use a range query on a field with sequential data than to use the start and rows parameters. This is *especially* true if you're using a sharded index, which is typically the case with SolrCloud.

By the way, I am assuming that this process will be a one-time (or very rare) thing for migration purposes, or possibly something that you occasionally do for some kind of index verification. If this is something that you'll be doing all the time, then you probably want to develop a SolrJ application.

Thanks,
Shawn

Reply via email to