Hi Nick and Shawn,

Thanks so much for the pointers. I will try that out. Thank you again!

On Tue, Aug 9, 2016 at 9:40 AM, Nick Vasilyev <nick.vasily...@gmail.com>
wrote:

> Hi, I work on a python Solr Client
> <http://solrclient.readthedocs.io/en/latest/> library and there is a
> reindexing helper module that you can use if you are on Solr 4.9+. I use it
> all the time and I think it works pretty well. You can re-index all
> documents from a collection into another collection or dump them to the
> filesystem as JSON. It also supports parallel execution and can run
> independently on each shard. There is also a way to resume if your job
> craps out half way through if your existing schema is set up with a good
> date field and unique id.
>
> You can read the documentation here:
> http://solrclient.readthedocs.io/en/latest/Reindexer.html
>
> Code is pretty short and is here:
> https://github.com/moonlitesolutions/SolrClient/blob/master/SolrClient/
> helpers/reindexer.py
>
> Here is sample:
> from SolrClient import SolrClient
> from SolrClient.helpers import Reindexer
>
> r = Reindexer(SolrClient('http://source_solr:8983/solr'), SolrClient('
> http://destination_solr:8983/solr') , source_coll='source_collection',
> dest_coll='destination-collection')
> r.reindex()
>
>
>
>
>
>
> On Tue, Aug 9, 2016 at 9:56 AM, Shawn Heisey <apa...@elyograg.org> wrote:
>
> > On 8/9/2016 1:48 AM, bharath.mvkumar wrote:
> > > What would be the best way to re-index the data in the SOLR cloud? We
> > > have around 65 million data and we are planning to change the schema
> > > by changing the unique key type from long to string. How long does it
> > > take to re-index 65 million documents in SOLR and can you please
> > > suggest how to do that?
> >
> > There is no magic bullet.  And there's no way for anybody but you to
> > determine how long it's going to take.  There are people who have
> > achieved over 50K inserts per second, and others who have difficulty
> > reaching 1000 per second.  Many factors affect indexing speed, including
> > the size of your documents, the complexity of your analysis, the
> > capabilities of your hardware, and how many threads/processes you are
> > using at the same time when you index.
> >
> > Here's some more detailed info about reindexing, but it's probably not
> > what you wanted to hear:
> >
> > https://wiki.apache.org/solr/HowToReindex
> >
> > Thanks,
> > Shawn
> >
> >
>



-- 
Thanks & Regards,
Bharath MV Kumar

"Life is short, enjoy every moment of it"

Reply via email to