In case this helps...

Assuming you have the resources to build a copy of your production
environment and assuming you have the time, you don't need to take your
production down - or even affect it's processing...

What I've done (with admittedly smaller data sets) is build a separate
environment (usually on VM's) and once it's set up, I do the new indexing
according to the new "rules"  (Like your change of long to string)

Then, in a sense, I don't care how long it takes because it is not
affecting Prod.

When it's done, I simply switch my load balancer to point to the new
environment and shut down the old one.

To users, this could be seamless if you handle the load balancer correctly
and have it refuse new connections to the old servers while routing all new
connections to the new Solr servers...

On Tue, Aug 9, 2016 at 3:04 PM, Bharath Kumar <bharath.mvku...@gmail.com>
wrote:

> Hi Nick and Shawn,
>
> Thanks so much for the pointers. I will try that out. Thank you again!
>
> On Tue, Aug 9, 2016 at 9:40 AM, Nick Vasilyev <nick.vasily...@gmail.com>
> wrote:
>
> > Hi, I work on a python Solr Client
> > <http://solrclient.readthedocs.io/en/latest/> library and there is a
> > reindexing helper module that you can use if you are on Solr 4.9+. I use
> it
> > all the time and I think it works pretty well. You can re-index all
> > documents from a collection into another collection or dump them to the
> > filesystem as JSON. It also supports parallel execution and can run
> > independently on each shard. There is also a way to resume if your job
> > craps out half way through if your existing schema is set up with a good
> > date field and unique id.
> >
> > You can read the documentation here:
> > http://solrclient.readthedocs.io/en/latest/Reindexer.html
> >
> > Code is pretty short and is here:
> > https://github.com/moonlitesolutions/SolrClient/blob/master/SolrClient/
> > helpers/reindexer.py
> >
> > Here is sample:
> > from SolrClient import SolrClient
> > from SolrClient.helpers import Reindexer
> >
> > r = Reindexer(SolrClient('http://source_solr:8983/solr'), SolrClient('
> > http://destination_solr:8983/solr') , source_coll='source_collection',
> > dest_coll='destination-collection')
> > r.reindex()
> >
> >
> >
> >
> >
> >
> > On Tue, Aug 9, 2016 at 9:56 AM, Shawn Heisey <apa...@elyograg.org>
> wrote:
> >
> > > On 8/9/2016 1:48 AM, bharath.mvkumar wrote:
> > > > What would be the best way to re-index the data in the SOLR cloud? We
> > > > have around 65 million data and we are planning to change the schema
> > > > by changing the unique key type from long to string. How long does it
> > > > take to re-index 65 million documents in SOLR and can you please
> > > > suggest how to do that?
> > >
> > > There is no magic bullet.  And there's no way for anybody but you to
> > > determine how long it's going to take.  There are people who have
> > > achieved over 50K inserts per second, and others who have difficulty
> > > reaching 1000 per second.  Many factors affect indexing speed,
> including
> > > the size of your documents, the complexity of your analysis, the
> > > capabilities of your hardware, and how many threads/processes you are
> > > using at the same time when you index.
> > >
> > > Here's some more detailed info about reindexing, but it's probably not
> > > what you wanted to hear:
> > >
> > > https://wiki.apache.org/solr/HowToReindex
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
>
>
>
> --
> Thanks & Regards,
> Bharath MV Kumar
>
> "Life is short, enjoy every moment of it"
>

Reply via email to