[jira] [Commented] (LUCENE-8961) CheckIndex: pre-exorcise document id salvage

2019-09-09 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926078#comment-16926078
 ] 

Adrien Grand commented on LUCENE-8961:
--

Agreed it is awkward. When I said "on top of CheckIndex", I was rather thinking 
of running CheckIndex programmatically and then looking at the return value to 
understand what segments might need salvaging. A separate stand-alone tool 
sounds good to me too.

> CheckIndex: pre-exorcise document id salvage
> 
>
> Key: LUCENE-8961
> URL: https://issues.apache.org/jira/browse/LUCENE-8961
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Christine Poerschke
>Priority: Minor
> Attachments: LUCENE-8961.patch, LUCENE-8961.patch
>
>
> The 
> [CheckIndex|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.2.0/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java]
>  tool supports the exorcising of corrupt segments from an index.
> This ticket proposes to add an extra option which could first be used to 
> potentially salvage the document ids of the segment(s) about to be exorcised. 
> Re-ingestion for those documents could then be arranged so as to repair the 
> data damage caused by the exorcising.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8961) CheckIndex: pre-exorcise document id salvage

2019-09-03 Thread Christine Poerschke (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921391#comment-16921391
 ] 

Christine Poerschke commented on LUCENE-8961:
-

Thanks [~jpountz] for your input.

The latest attached patch facilitates potential salvaging of terms by making 
the {{CheckIndex}} class extensible so that developer's own deriving classes 
could:
 * customise the checkIntegrity call
 * filter the fields being checked
 * intercept any (field,term) pairs e.g. for logging purposes

It seems to me to be a rather awkward change though and if out-of-the-box 
{{CheckIndex}} would not support id salvaging then a stand-alone tool just for 
that purpose might be a cleaner solution? Either way, I won't have bandwidth to 
pursue this further in the near future i.e. just sharing things 'as is' in case 
it might help others in the meantime.

> CheckIndex: pre-exorcise document id salvage
> 
>
> Key: LUCENE-8961
> URL: https://issues.apache.org/jira/browse/LUCENE-8961
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Christine Poerschke
>Priority: Minor
> Attachments: LUCENE-8961.patch, LUCENE-8961.patch
>
>
> The 
> [CheckIndex|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.2.0/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java]
>  tool supports the exorcising of corrupt segments from an index.
> This ticket proposes to add an extra option which could first be used to 
> potentially salvage the document ids of the segment(s) about to be exorcised. 
> Re-ingestion for those documents could then be arranged so as to repair the 
> data damage caused by the exorcising.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8961) CheckIndex: pre-exorcise document id salvage

2019-09-02 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920958#comment-16920958
 ] 

Adrien Grand commented on LUCENE-8961:
--

This feels too unsafe to me for CheckIndex. For instance, what if idField is 
the corrupt field, you could end up with missing ids or the wrong ids? I'm fine 
with adding more information to the CheckIndex status in order to make it 
easier to do this kind of hacks on top of CheckIndex, but I'd like to keep 
CheckIndex something that is rock solid.

> CheckIndex: pre-exorcise document id salvage
> 
>
> Key: LUCENE-8961
> URL: https://issues.apache.org/jira/browse/LUCENE-8961
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Christine Poerschke
>Priority: Minor
> Attachments: LUCENE-8961.patch
>
>
> The 
> [CheckIndex|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.2.0/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java]
>  tool supports the exorcising of corrupt segments from an index.
> This ticket proposes to add an extra option which could first be used to 
> potentially salvage the document ids of the segment(s) about to be exorcised. 
> Re-ingestion for those documents could then be arranged so as to repair the 
> data damage caused by the exorcising.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8961) CheckIndex: pre-exorcise document id salvage

2019-09-02 Thread Christine Poerschke (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920917#comment-16920917
 ] 

Christine Poerschke commented on LUCENE-8961:
-

Attached outline work-in-progress patch:
* a new {{-skipCheckIntegrity}} option would allow the tool to proceed past the 
initial integrity checks (which would fail e.g. due to footer checksum failure)
* a new {{-idField F}} option would identify the field from which terms are to 
be salvaged
* the salvaged terms are currently printed to std.err (and this obviously would 
have to be changed somehow)

> CheckIndex: pre-exorcise document id salvage
> 
>
> Key: LUCENE-8961
> URL: https://issues.apache.org/jira/browse/LUCENE-8961
> Project: Lucene - Core
>  Issue Type: New Feature
>Reporter: Christine Poerschke
>Priority: Minor
> Attachments: LUCENE-8961.patch
>
>
> The 
> [CheckIndex|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.2.0/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java]
>  tool supports the exorcising of corrupt segments from an index.
> This ticket proposes to add an extra option which could first be used to 
> potentially salvage the document ids of the segment(s) about to be exorcised. 
> Re-ingestion for those documents could then be arranged so as to repair the 
> data damage caused by the exorcising.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org