Re: Questions about the IndexUpgrader tool.
Earlier I recommended clients to set stored=false wherever they could in order to save index space, but now I do the opposite (well, either stored or docValues) to prepare for a smooth re-index process from the existing Solr install into a new cluster. That is, of course, unless you have the source data readily available and re-indexing from it is fairly quick. Sometimes you have the source repo but indexing takes two weeks, then a Solr-Solr reindex may be much faster! -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 19. des. 2018 kl. 09:56 skrev Charlie Hull : > > On 18/12/2018 17:40, Erick Erickson wrote: >> You are far better off re-indexing totally. > > I would add '...if you have the original data'. Not everyone *can* re-index, > and there are various hairy ways of updating an index in place, but they > require deep-level magic. > > But if you have the original source data, you should re-index. > > Cheers > > Charlie >> Using IndexUpgraderTool has never guaranteed compatibility >> across multiple major releases. I.e. if you have an index built >> with 4x, using that tool will work for 5x, but then going from 5x >> to 6x _even after the entire index is rewritten from 4 x format_ >> has never been guaranteed to work. By "guaranteed to work" >> here, I mean that there can be subtle problems, regardless >> of appearances >> The two most succinct statements as to why this is true follow. >> I will not second guess _anything_ these two people have to >> say about how Lucene works ;) >> From Mike McCandless: >> “This really is the difference between an index and a database: >> we do not store, precisely, the original documents. We store an >> efficient derived/computed index from them.” >> From Robert Muir: >> “I think the key issue here is Lucene is an index not a database. >> Because it is a lossy index and does not retain all of the user's >> data, its not possible to safely migrate some things automagically... >> The function is y = f(x) and if x is not available its not possible, so >> lucene can't do it.” >> As of 6x, a marker is written into each segments and the lowest >> version is retained when segments are merged. 8x will refuse >> to start if it detects a 6x marker so this will be enforced soon. >> Best, >> Erick >> On Mon, Dec 17, 2018 at 12:27 PM Pushkar Raste >> wrote: >>> >>> Hi, >>> I have questions about the IndexUpgrader tool. >>> >>> - I want to upgrade from Solr 4 to Solr 7. Can I run upgrade the index from >>> 4 to 5 then 5 to 6 and finally 6 to 7 using appropriate version of the >>> IndexUpgrader but without loading the Index in the Solr at all during the >>> successive upgrades. >>> >>> - The note in the tool says "This tool only keeps last commit in an index". >>> Does this mean I have optimize the index before running the tool? >>> >>> - There is another note about partially upgraded index. How can the index >>> be partially upgraded. One scenario I can think of is 'If I upgraded let's >>> say from Solr 5 to Solr 6 and then added some documents. The new documents >>> will be in Lucerne 6 format already, while old documents will still be Solr >>> 5 format’ Is my understanding correct? > > > -- > Charlie Hull > Flax - Open Source Enterprise Search > > tel/fax: +44 (0)8700 118334 > mobile: +44 (0)7767 825828 > web: www.flax.co.uk <http://www.flax.co.uk/>
Re: Questions about the IndexUpgrader tool.
On 18/12/2018 17:40, Erick Erickson wrote: You are far better off re-indexing totally. I would add '...if you have the original data'. Not everyone *can* re-index, and there are various hairy ways of updating an index in place, but they require deep-level magic. But if you have the original source data, you should re-index. Cheers Charlie Using IndexUpgraderTool has never guaranteed compatibility across multiple major releases. I.e. if you have an index built with 4x, using that tool will work for 5x, but then going from 5x to 6x _even after the entire index is rewritten from 4 x format_ has never been guaranteed to work. By "guaranteed to work" here, I mean that there can be subtle problems, regardless of appearances The two most succinct statements as to why this is true follow. I will not second guess _anything_ these two people have to say about how Lucene works ;) From Mike McCandless: “This really is the difference between an index and a database: we do not store, precisely, the original documents. We store an efficient derived/computed index from them.” From Robert Muir: “I think the key issue here is Lucene is an index not a database. Because it is a lossy index and does not retain all of the user's data, its not possible to safely migrate some things automagically... The function is y = f(x) and if x is not available its not possible, so lucene can't do it.” As of 6x, a marker is written into each segments and the lowest version is retained when segments are merged. 8x will refuse to start if it detects a 6x marker so this will be enforced soon. Best, Erick On Mon, Dec 17, 2018 at 12:27 PM Pushkar Raste wrote: Hi, I have questions about the IndexUpgrader tool. - I want to upgrade from Solr 4 to Solr 7. Can I run upgrade the index from 4 to 5 then 5 to 6 and finally 6 to 7 using appropriate version of the IndexUpgrader but without loading the Index in the Solr at all during the successive upgrades. - The note in the tool says "This tool only keeps last commit in an index". Does this mean I have optimize the index before running the tool? - There is another note about partially upgraded index. How can the index be partially upgraded. One scenario I can think of is 'If I upgraded let's say from Solr 5 to Solr 6 and then added some documents. The new documents will be in Lucerne 6 format already, while old documents will still be Solr 5 format’ Is my understanding correct? -- Charlie Hull Flax - Open Source Enterprise Search tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.flax.co.uk
Re: Questions about the IndexUpgrader tool.
See more in this jira https://issues.apache.org/jira/browse/SOLR-12281 <https://issues.apache.org/jira/browse/SOLR-12281> One potential showstopper is deprecation/removal of functionality that your 4.x index relies on. E.g. you may use TrieIntFeld in 4.x but that is deprecated in 7.x and may eventually disappear. That means you need to re-index into e.g. the new IntPointField. Other changes could be subtle changes to e.g. Analysis which will cause new documents to produce different values in the index compared to your "upgraded" one. So all in all, if your index is very simple and you do due diligence when preparing and understand all changes, it MAY work for your use case, but it may also fail in mysterious ways :) -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 18. des. 2018 kl. 18:40 skrev Erick Erickson : > > You are far better off re-indexing totally. > > Using IndexUpgraderTool has never guaranteed compatibility > across multiple major releases. I.e. if you have an index built > with 4x, using that tool will work for 5x, but then going from 5x > to 6x _even after the entire index is rewritten from 4 x format_ > has never been guaranteed to work. By "guaranteed to work" > here, I mean that there can be subtle problems, regardless > of appearances > > The two most succinct statements as to why this is true follow. > I will not second guess _anything_ these two people have to > say about how Lucene works ;) > > From Mike McCandless: > “This really is the difference between an index and a database: > we do not store, precisely, the original documents. We store an > efficient derived/computed index from them.” > > From Robert Muir: > “I think the key issue here is Lucene is an index not a database. > Because it is a lossy index and does not retain all of the user's > data, its not possible to safely migrate some things automagically... > The function is y = f(x) and if x is not available its not possible, so > lucene can't do it.” > > As of 6x, a marker is written into each segments and the lowest > version is retained when segments are merged. 8x will refuse > to start if it detects a 6x marker so this will be enforced soon. > > Best, > Erick > > On Mon, Dec 17, 2018 at 12:27 PM Pushkar Raste > wrote: >> >> Hi, >> I have questions about the IndexUpgrader tool. >> >> - I want to upgrade from Solr 4 to Solr 7. Can I run upgrade the index from >> 4 to 5 then 5 to 6 and finally 6 to 7 using appropriate version of the >> IndexUpgrader but without loading the Index in the Solr at all during the >> successive upgrades. >> >> - The note in the tool says "This tool only keeps last commit in an index". >> Does this mean I have optimize the index before running the tool? >> >> - There is another note about partially upgraded index. How can the index >> be partially upgraded. One scenario I can think of is 'If I upgraded let's >> say from Solr 5 to Solr 6 and then added some documents. The new documents >> will be in Lucerne 6 format already, while old documents will still be Solr >> 5 format’ Is my understanding correct?
Re: Questions about the IndexUpgrader tool.
You are far better off re-indexing totally. Using IndexUpgraderTool has never guaranteed compatibility across multiple major releases. I.e. if you have an index built with 4x, using that tool will work for 5x, but then going from 5x to 6x _even after the entire index is rewritten from 4 x format_ has never been guaranteed to work. By "guaranteed to work" here, I mean that there can be subtle problems, regardless of appearances The two most succinct statements as to why this is true follow. I will not second guess _anything_ these two people have to say about how Lucene works ;) From Mike McCandless: “This really is the difference between an index and a database: we do not store, precisely, the original documents. We store an efficient derived/computed index from them.” From Robert Muir: “I think the key issue here is Lucene is an index not a database. Because it is a lossy index and does not retain all of the user's data, its not possible to safely migrate some things automagically... The function is y = f(x) and if x is not available its not possible, so lucene can't do it.” As of 6x, a marker is written into each segments and the lowest version is retained when segments are merged. 8x will refuse to start if it detects a 6x marker so this will be enforced soon. Best, Erick On Mon, Dec 17, 2018 at 12:27 PM Pushkar Raste wrote: > > Hi, > I have questions about the IndexUpgrader tool. > > - I want to upgrade from Solr 4 to Solr 7. Can I run upgrade the index from > 4 to 5 then 5 to 6 and finally 6 to 7 using appropriate version of the > IndexUpgrader but without loading the Index in the Solr at all during the > successive upgrades. > > - The note in the tool says "This tool only keeps last commit in an index". > Does this mean I have optimize the index before running the tool? > > - There is another note about partially upgraded index. How can the index > be partially upgraded. One scenario I can think of is 'If I upgraded let's > say from Solr 5 to Solr 6 and then added some documents. The new documents > will be in Lucerne 6 format already, while old documents will still be Solr > 5 format’ Is my understanding correct?
Questions about the IndexUpgrader tool.
Hi, I have questions about the IndexUpgrader tool. - I want to upgrade from Solr 4 to Solr 7. Can I run upgrade the index from 4 to 5 then 5 to 6 and finally 6 to 7 using appropriate version of the IndexUpgrader but without loading the Index in the Solr at all during the successive upgrades. - The note in the tool says "This tool only keeps last commit in an index". Does this mean I have optimize the index before running the tool? - There is another note about partially upgraded index. How can the index be partially upgraded. One scenario I can think of is 'If I upgraded let's say from Solr 5 to Solr 6 and then added some documents. The new documents will be in Lucerne 6 format already, while old documents will still be Solr 5 format’ Is my understanding correct?