Re: Questions about the IndexUpgrader tool.

2018-12-19 Thread Jan Høydahl
Earlier I recommended clients to set stored=false wherever they could in order 
to save index space,
but now I do the opposite (well, either stored or docValues) to prepare for a 
smooth re-index process
from the existing Solr install into a new cluster. 

That is, of course, unless you have the source data readily available and 
re-indexing from it is fairly quick.
Sometimes you have the source repo but indexing takes two weeks, then a 
Solr-Solr reindex may be much faster!

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 19. des. 2018 kl. 09:56 skrev Charlie Hull :
> 
> On 18/12/2018 17:40, Erick Erickson wrote:
>> You are far better off re-indexing totally.
> 
> I would add '...if you have the original data'. Not everyone *can* re-index, 
> and there are various hairy ways of updating an index in place, but they 
> require deep-level magic.
> 
> But if you have the original source data, you should re-index.
> 
> Cheers
> 
> Charlie
>> Using IndexUpgraderTool has never guaranteed compatibility
>> across multiple major releases. I.e. if you have an index built
>> with 4x, using that tool will work for 5x, but then going from 5x
>> to 6x _even after the entire index is rewritten from 4 x format_
>> has  never been guaranteed to work. By "guaranteed to work"
>> here, I mean that there can be subtle problems, regardless
>> of appearances
>> The two most succinct statements as to why this is true follow.
>> I will not second guess _anything_ these two people have to
>> say about how Lucene works ;)
>>  From Mike McCandless:
>> “This really is the difference between an index and a database:
>> we do not store, precisely, the original documents.  We store an
>> efficient derived/computed index from them.”
>>  From Robert Muir:
>> “I think the key issue here is Lucene is an index not a database.
>> Because it is a lossy index and does not retain all of the user's
>> data, its not possible to safely migrate some things automagically...
>> The function is y = f(x) and if x is not available its not possible, so
>> lucene can't do it.”
>> As of 6x, a marker is written into each segments and the lowest
>> version is retained when segments are merged. 8x will refuse
>> to start if it detects a 6x marker so this will be enforced soon.
>> Best,
>> Erick
>> On Mon, Dec 17, 2018 at 12:27 PM Pushkar Raste  
>> wrote:
>>> 
>>> Hi,
>>> I have questions about the IndexUpgrader tool.
>>> 
>>> - I want to upgrade from Solr 4 to Solr 7. Can I run upgrade the index from
>>> 4 to 5 then 5 to 6 and finally 6 to 7 using appropriate version of the
>>> IndexUpgrader but without loading the Index in the Solr at all during the
>>> successive upgrades.
>>> 
>>> - The note in the tool says "This tool only keeps last commit in an index".
>>> Does this mean I have optimize the index before running the tool?
>>> 
>>> - There is another note about partially upgraded index. How can the index
>>> be partially upgraded. One scenario I can think of is 'If I upgraded let's
>>> say from Solr 5 to Solr 6 and then added some documents. The new documents
>>> will be in Lucerne 6 format already, while old documents will still be Solr
>>> 5 format’ Is my understanding correct?
> 
> 
> -- 
> Charlie Hull
> Flax - Open Source Enterprise Search
> 
> tel/fax: +44 (0)8700 118334 
> mobile:  +44 (0)7767 825828 
> web: www.flax.co.uk <http://www.flax.co.uk/>


Re: Questions about the IndexUpgrader tool.

2018-12-19 Thread Charlie Hull

On 18/12/2018 17:40, Erick Erickson wrote:

You are far better off re-indexing totally.


I would add '...if you have the original data'. Not everyone *can* 
re-index, and there are various hairy ways of updating an index in 
place, but they require deep-level magic.


But if you have the original source data, you should re-index.

Cheers

Charlie


Using IndexUpgraderTool has never guaranteed compatibility
across multiple major releases. I.e. if you have an index built
with 4x, using that tool will work for 5x, but then going from 5x
to 6x _even after the entire index is rewritten from 4 x format_
has  never been guaranteed to work. By "guaranteed to work"
here, I mean that there can be subtle problems, regardless
of appearances

The two most succinct statements as to why this is true follow.
I will not second guess _anything_ these two people have to
say about how Lucene works ;)

  From Mike McCandless:
“This really is the difference between an index and a database:
we do not store, precisely, the original documents.  We store an
efficient derived/computed index from them.”

  From Robert Muir:
“I think the key issue here is Lucene is an index not a database.
Because it is a lossy index and does not retain all of the user's
data, its not possible to safely migrate some things automagically...
The function is y = f(x) and if x is not available its not possible, so
lucene can't do it.”

As of 6x, a marker is written into each segments and the lowest
version is retained when segments are merged. 8x will refuse
to start if it detects a 6x marker so this will be enforced soon.

Best,
Erick

On Mon, Dec 17, 2018 at 12:27 PM Pushkar Raste  wrote:


Hi,
I have questions about the IndexUpgrader tool.

- I want to upgrade from Solr 4 to Solr 7. Can I run upgrade the index from
4 to 5 then 5 to 6 and finally 6 to 7 using appropriate version of the
IndexUpgrader but without loading the Index in the Solr at all during the
successive upgrades.

- The note in the tool says "This tool only keeps last commit in an index".
Does this mean I have optimize the index before running the tool?

- There is another note about partially upgraded index. How can the index
be partially upgraded. One scenario I can think of is 'If I upgraded let's
say from Solr 5 to Solr 6 and then added some documents. The new documents
will be in Lucerne 6 format already, while old documents will still be Solr
5 format’ Is my understanding correct?



--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: Questions about the IndexUpgrader tool.

2018-12-19 Thread Jan Høydahl
See more in this jira https://issues.apache.org/jira/browse/SOLR-12281 
<https://issues.apache.org/jira/browse/SOLR-12281>

One potential showstopper is deprecation/removal of functionality that
your 4.x index relies on. E.g. you may use TrieIntFeld in 4.x but that is
deprecated in 7.x and may eventually disappear. That means you need
to re-index into e.g. the new IntPointField. Other changes could be subtle
changes to e.g. Analysis which will cause new documents to produce
different values in the index compared to your "upgraded" one.

So all in all, if your index is very simple and you do due diligence when
preparing and understand all changes, it MAY work for your use case, but
it may also fail in mysterious ways :)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 18. des. 2018 kl. 18:40 skrev Erick Erickson :
> 
> You are far better off re-indexing totally.
> 
> Using IndexUpgraderTool has never guaranteed compatibility
> across multiple major releases. I.e. if you have an index built
> with 4x, using that tool will work for 5x, but then going from 5x
> to 6x _even after the entire index is rewritten from 4 x format_
> has  never been guaranteed to work. By "guaranteed to work"
> here, I mean that there can be subtle problems, regardless
> of appearances
> 
> The two most succinct statements as to why this is true follow.
> I will not second guess _anything_ these two people have to
> say about how Lucene works ;)
> 
> From Mike McCandless:
> “This really is the difference between an index and a database:
> we do not store, precisely, the original documents.  We store an
> efficient derived/computed index from them.”
> 
> From Robert Muir:
> “I think the key issue here is Lucene is an index not a database.
> Because it is a lossy index and does not retain all of the user's
> data, its not possible to safely migrate some things automagically...
> The function is y = f(x) and if x is not available its not possible, so
> lucene can't do it.”
> 
> As of 6x, a marker is written into each segments and the lowest
> version is retained when segments are merged. 8x will refuse
> to start if it detects a 6x marker so this will be enforced soon.
> 
> Best,
> Erick
> 
> On Mon, Dec 17, 2018 at 12:27 PM Pushkar Raste  
> wrote:
>> 
>> Hi,
>> I have questions about the IndexUpgrader tool.
>> 
>> - I want to upgrade from Solr 4 to Solr 7. Can I run upgrade the index from
>> 4 to 5 then 5 to 6 and finally 6 to 7 using appropriate version of the
>> IndexUpgrader but without loading the Index in the Solr at all during the
>> successive upgrades.
>> 
>> - The note in the tool says "This tool only keeps last commit in an index".
>> Does this mean I have optimize the index before running the tool?
>> 
>> - There is another note about partially upgraded index. How can the index
>> be partially upgraded. One scenario I can think of is 'If I upgraded let's
>> say from Solr 5 to Solr 6 and then added some documents. The new documents
>> will be in Lucerne 6 format already, while old documents will still be Solr
>> 5 format’ Is my understanding correct?



Re: Questions about the IndexUpgrader tool.

2018-12-18 Thread Erick Erickson
You are far better off re-indexing totally.

Using IndexUpgraderTool has never guaranteed compatibility
across multiple major releases. I.e. if you have an index built
with 4x, using that tool will work for 5x, but then going from 5x
to 6x _even after the entire index is rewritten from 4 x format_
has  never been guaranteed to work. By "guaranteed to work"
here, I mean that there can be subtle problems, regardless
of appearances

The two most succinct statements as to why this is true follow.
I will not second guess _anything_ these two people have to
say about how Lucene works ;)

 From Mike McCandless:
“This really is the difference between an index and a database:
we do not store, precisely, the original documents.  We store an
efficient derived/computed index from them.”

 From Robert Muir:
“I think the key issue here is Lucene is an index not a database.
Because it is a lossy index and does not retain all of the user's
data, its not possible to safely migrate some things automagically...
The function is y = f(x) and if x is not available its not possible, so
lucene can't do it.”

As of 6x, a marker is written into each segments and the lowest
version is retained when segments are merged. 8x will refuse
to start if it detects a 6x marker so this will be enforced soon.

Best,
Erick

On Mon, Dec 17, 2018 at 12:27 PM Pushkar Raste  wrote:
>
> Hi,
> I have questions about the IndexUpgrader tool.
>
> - I want to upgrade from Solr 4 to Solr 7. Can I run upgrade the index from
> 4 to 5 then 5 to 6 and finally 6 to 7 using appropriate version of the
> IndexUpgrader but without loading the Index in the Solr at all during the
> successive upgrades.
>
> - The note in the tool says "This tool only keeps last commit in an index".
> Does this mean I have optimize the index before running the tool?
>
> - There is another note about partially upgraded index. How can the index
> be partially upgraded. One scenario I can think of is 'If I upgraded let's
> say from Solr 5 to Solr 6 and then added some documents. The new documents
> will be in Lucerne 6 format already, while old documents will still be Solr
> 5 format’ Is my understanding correct?


Questions about the IndexUpgrader tool.

2018-12-17 Thread Pushkar Raste
Hi,
I have questions about the IndexUpgrader tool.

- I want to upgrade from Solr 4 to Solr 7. Can I run upgrade the index from
4 to 5 then 5 to 6 and finally 6 to 7 using appropriate version of the
IndexUpgrader but without loading the Index in the Solr at all during the
successive upgrades.

- The note in the tool says "This tool only keeps last commit in an index".
Does this mean I have optimize the index before running the tool?

- There is another note about partially upgraded index. How can the index
be partially upgraded. One scenario I can think of is 'If I upgraded let's
say from Solr 5 to Solr 6 and then added some documents. The new documents
will be in Lucerne 6 format already, while old documents will still be Solr
5 format’ Is my understanding correct?