Re: how to quickly export data from SolrCloud

Shawn Heisey Mon, 06 May 2013 10:11:55 -0700

On 5/6/2013 10:48 AM, Kevin Osborn wrote:

I am looking to export a large amount of data from Solr. This export will
be done by a Java application and then written to file. Initially, I was
thinking of using direct HTTP calls and using the CSV response writer. And
then my Java application can quickly parse each line from a stream.


But, with SolrCloud, I prefer to use SolrJ due to its communication with
Zookeeper. Is there any way to use the CSV response writer with SolrJ?

Would the overhead of using SolrJ's "solrbin" format be much slower than
the CSV response writer?

What do you intend to do with the exported data? If you're going to useit to import into a new Solr index, you might be better off using thedataimport handler with SolrEntityProcessor. Just point it at one ofyour servers and include the collection name in the URL.

If the export will have other uses and CSV format will work for you,that would probably be more efficient than something you could whiptogether quickly with SolrJ. If you've got really excellent java skillsand have a lot of time to work on it, you might be able to writesomething efficient, but Solr can already do it.

If you plan to page through your data rather than grab it all with onequery, it is MUCH more efficient to use a range query on a field withsequential data than to use the start and rows parameters. This is*especially* true if you're using a sharded index, which is typicallythe case with SolrCloud.

By the way, I am assuming that this process will be a one-time (or veryrare) thing for migration purposes, or possibly something that youoccasionally do for some kind of index verification. If this issomething that you'll be doing all the time, then you probably want todevelop a SolrJ application.


Thanks,
Shawn

Re: how to quickly export data from SolrCloud

Reply via email to