[jira] [Commented] (LUCENE-8961) CheckIndex: pre-exorcise document id salvage
[ https://issues.apache.org/jira/browse/LUCENE-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16926078#comment-16926078 ] Adrien Grand commented on LUCENE-8961: -- Agreed it is awkward. When I said "on top of CheckIndex", I was rather thinking of running CheckIndex programmatically and then looking at the return value to understand what segments might need salvaging. A separate stand-alone tool sounds good to me too. > CheckIndex: pre-exorcise document id salvage > > > Key: LUCENE-8961 > URL: https://issues.apache.org/jira/browse/LUCENE-8961 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Christine Poerschke >Priority: Minor > Attachments: LUCENE-8961.patch, LUCENE-8961.patch > > > The > [CheckIndex|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.2.0/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java] > tool supports the exorcising of corrupt segments from an index. > This ticket proposes to add an extra option which could first be used to > potentially salvage the document ids of the segment(s) about to be exorcised. > Re-ingestion for those documents could then be arranged so as to repair the > data damage caused by the exorcising. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8961) CheckIndex: pre-exorcise document id salvage
[ https://issues.apache.org/jira/browse/LUCENE-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16921391#comment-16921391 ] Christine Poerschke commented on LUCENE-8961: - Thanks [~jpountz] for your input. The latest attached patch facilitates potential salvaging of terms by making the {{CheckIndex}} class extensible so that developer's own deriving classes could: * customise the checkIntegrity call * filter the fields being checked * intercept any (field,term) pairs e.g. for logging purposes It seems to me to be a rather awkward change though and if out-of-the-box {{CheckIndex}} would not support id salvaging then a stand-alone tool just for that purpose might be a cleaner solution? Either way, I won't have bandwidth to pursue this further in the near future i.e. just sharing things 'as is' in case it might help others in the meantime. > CheckIndex: pre-exorcise document id salvage > > > Key: LUCENE-8961 > URL: https://issues.apache.org/jira/browse/LUCENE-8961 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Christine Poerschke >Priority: Minor > Attachments: LUCENE-8961.patch, LUCENE-8961.patch > > > The > [CheckIndex|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.2.0/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java] > tool supports the exorcising of corrupt segments from an index. > This ticket proposes to add an extra option which could first be used to > potentially salvage the document ids of the segment(s) about to be exorcised. > Re-ingestion for those documents could then be arranged so as to repair the > data damage caused by the exorcising. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8961) CheckIndex: pre-exorcise document id salvage
[ https://issues.apache.org/jira/browse/LUCENE-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920958#comment-16920958 ] Adrien Grand commented on LUCENE-8961: -- This feels too unsafe to me for CheckIndex. For instance, what if idField is the corrupt field, you could end up with missing ids or the wrong ids? I'm fine with adding more information to the CheckIndex status in order to make it easier to do this kind of hacks on top of CheckIndex, but I'd like to keep CheckIndex something that is rock solid. > CheckIndex: pre-exorcise document id salvage > > > Key: LUCENE-8961 > URL: https://issues.apache.org/jira/browse/LUCENE-8961 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Christine Poerschke >Priority: Minor > Attachments: LUCENE-8961.patch > > > The > [CheckIndex|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.2.0/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java] > tool supports the exorcising of corrupt segments from an index. > This ticket proposes to add an extra option which could first be used to > potentially salvage the document ids of the segment(s) about to be exorcised. > Re-ingestion for those documents could then be arranged so as to repair the > data damage caused by the exorcising. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-8961) CheckIndex: pre-exorcise document id salvage
[ https://issues.apache.org/jira/browse/LUCENE-8961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16920917#comment-16920917 ] Christine Poerschke commented on LUCENE-8961: - Attached outline work-in-progress patch: * a new {{-skipCheckIntegrity}} option would allow the tool to proceed past the initial integrity checks (which would fail e.g. due to footer checksum failure) * a new {{-idField F}} option would identify the field from which terms are to be salvaged * the salvaged terms are currently printed to std.err (and this obviously would have to be changed somehow) > CheckIndex: pre-exorcise document id salvage > > > Key: LUCENE-8961 > URL: https://issues.apache.org/jira/browse/LUCENE-8961 > Project: Lucene - Core > Issue Type: New Feature >Reporter: Christine Poerschke >Priority: Minor > Attachments: LUCENE-8961.patch > > > The > [CheckIndex|https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.2.0/lucene/core/src/java/org/apache/lucene/index/CheckIndex.java] > tool supports the exorcising of corrupt segments from an index. > This ticket proposes to add an extra option which could first be used to > potentially salvage the document ids of the segment(s) about to be exorcised. > Re-ingestion for those documents could then be arranged so as to repair the > data damage caused by the exorcising. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org