Hi Nick and Shawn, Thanks so much for the pointers. I will try that out. Thank you again!
On Tue, Aug 9, 2016 at 9:40 AM, Nick Vasilyev <nick.vasily...@gmail.com> wrote: > Hi, I work on a python Solr Client > <http://solrclient.readthedocs.io/en/latest/> library and there is a > reindexing helper module that you can use if you are on Solr 4.9+. I use it > all the time and I think it works pretty well. You can re-index all > documents from a collection into another collection or dump them to the > filesystem as JSON. It also supports parallel execution and can run > independently on each shard. There is also a way to resume if your job > craps out half way through if your existing schema is set up with a good > date field and unique id. > > You can read the documentation here: > http://solrclient.readthedocs.io/en/latest/Reindexer.html > > Code is pretty short and is here: > https://github.com/moonlitesolutions/SolrClient/blob/master/SolrClient/ > helpers/reindexer.py > > Here is sample: > from SolrClient import SolrClient > from SolrClient.helpers import Reindexer > > r = Reindexer(SolrClient('http://source_solr:8983/solr'), SolrClient(' > http://destination_solr:8983/solr') , source_coll='source_collection', > dest_coll='destination-collection') > r.reindex() > > > > > > > On Tue, Aug 9, 2016 at 9:56 AM, Shawn Heisey <apa...@elyograg.org> wrote: > > > On 8/9/2016 1:48 AM, bharath.mvkumar wrote: > > > What would be the best way to re-index the data in the SOLR cloud? We > > > have around 65 million data and we are planning to change the schema > > > by changing the unique key type from long to string. How long does it > > > take to re-index 65 million documents in SOLR and can you please > > > suggest how to do that? > > > > There is no magic bullet. And there's no way for anybody but you to > > determine how long it's going to take. There are people who have > > achieved over 50K inserts per second, and others who have difficulty > > reaching 1000 per second. Many factors affect indexing speed, including > > the size of your documents, the complexity of your analysis, the > > capabilities of your hardware, and how many threads/processes you are > > using at the same time when you index. > > > > Here's some more detailed info about reindexing, but it's probably not > > what you wanted to hear: > > > > https://wiki.apache.org/solr/HowToReindex > > > > Thanks, > > Shawn > > > > > -- Thanks & Regards, Bharath MV Kumar "Life is short, enjoy every moment of it"