[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible
[ https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13675907#comment-13675907 ] Grant Ingersoll commented on LUCENE-4055: - Hmm, Mike, CODEC_FILE_PATTERN is package access only. Easy enough to replicate/fix, any reason not too? Refactor SegmentInfo / FieldInfo to make them extensible Key: LUCENE-4055 URL: https://issues.apache.org/jira/browse/LUCENE-4055 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Robert Muir Fix For: 4.0-ALPHA Attachments: LUCENE-4055.patch After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes should be made abstract so that they can be extended by Codec-s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible
[ https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13676083#comment-13676083 ] Michael McCandless commented on LUCENE-4055: Hmm looks like it's package private in 4.3 but is (will be) public in 4.x/trunk. Just replicate for now :) Refactor SegmentInfo / FieldInfo to make them extensible Key: LUCENE-4055 URL: https://issues.apache.org/jira/browse/LUCENE-4055 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Robert Muir Fix For: 4.0-ALPHA Attachments: LUCENE-4055.patch After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes should be made abstract so that they can be extended by Codec-s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible
[ https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672556#comment-13672556 ] Grant Ingersoll commented on LUCENE-4055: - [~mikemccand] what's the replacement strategy for IndexFileNameFilter? I'm updating some old code from 3.x to 4.3 (MAHOUT-944) and not sure on what the equivalent approach is, or whether I even need it. (Note, I'm still trying to figure out whether the patch for MAHOUT-944 is even the best way to do what it is trying to do, but I want to at least get it compiling first) Refactor SegmentInfo / FieldInfo to make them extensible Key: LUCENE-4055 URL: https://issues.apache.org/jira/browse/LUCENE-4055 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Robert Muir Fix For: 4.0-ALPHA Attachments: LUCENE-4055.patch After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes should be made abstract so that they can be extended by Codec-s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible
[ https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13672622#comment-13672622 ] Michael McCandless commented on LUCENE-4055: Hi [~gsingers], I think you can use the IndexFileNames.CODEC_FILE_PATTERN in your filter? You may need to add in segments_N and segments.gen as well ... Refactor SegmentInfo / FieldInfo to make them extensible Key: LUCENE-4055 URL: https://issues.apache.org/jira/browse/LUCENE-4055 Project: Lucene - Core Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Robert Muir Fix For: 4.0-ALPHA Attachments: LUCENE-4055.patch After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes should be made abstract so that they can be extended by Codec-s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible
Thanks Robert for the answers, I'll investigate this approach. -- Renaud Delbru On 28/05/12 21:59, Robert Muir (JIRA) wrote: [ https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284553#comment-13284553 ] Robert Muir commented on LUCENE-4055: - Well you can do postingsFormat instanceof PerFieldPostingsFormat + postingsFormat.getPostingsFormatForField if you really want. But keep in mind PerFieldPostingsFormat is not really special and just one we provide for convenience, obviously one could write their own PostingsFormat that implements the same thing in a different way. Refactor SegmentInfo / FieldInfo to make them extensible Key: LUCENE-4055 URL: https://issues.apache.org/jira/browse/LUCENE-4055 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Robert Muir Fix For: 4.0 Attachments: LUCENE-4055.patch After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes should be made abstract so that they can be extended by Codec-s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible
[ https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284523#comment-13284523 ] Robert Muir commented on LUCENE-4055: - Renaud, what you are talking about is not really related to this issue: this issue is about allowing codecs to add metadata to SegmentInfo and FieldInfo. what you are talking about is the indexing chain, if you want to customize that, just set a custom one on IndexWriterConfig. Refactor SegmentInfo / FieldInfo to make them extensible Key: LUCENE-4055 URL: https://issues.apache.org/jira/browse/LUCENE-4055 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Robert Muir Fix For: 4.0 Attachments: LUCENE-4055.patch After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes should be made abstract so that they can be extended by Codec-s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible
[ https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284533#comment-13284533 ] Renaud Delbru commented on LUCENE-4055: --- Hi Robert, sorry if it seemed a bit out of context, but I am trying to understand how to properly do it. Indeed, I can create my own indexing chain which includes my TermsHashConsumer customisation. However, I would still need the codec metadata for every field. But from what you told me, it seems that this codec specific metadata could be now added to FeildInfo. Is that correct ? Refactor SegmentInfo / FieldInfo to make them extensible Key: LUCENE-4055 URL: https://issues.apache.org/jira/browse/LUCENE-4055 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Robert Muir Fix For: 4.0 Attachments: LUCENE-4055.patch After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes should be made abstract so that they can be extended by Codec-s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible
[ https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284537#comment-13284537 ] Robert Muir commented on LUCENE-4055: - {quote} However, I would still need the codec metadata for every field. {quote} What codec metadata? this change only allows codecs to add additional things to fieldinfo/segmentinfo so they can later be read when the segment is opened. E.g. a CompressingStoredFieldsWriter could put in the segmentinfo an additional key-value like CompressingStoredFieldsWriter.algorithm=deflate These are private to that component basically: i don't understand why your indexing chain would care about this? its at a level above all that. Refactor SegmentInfo / FieldInfo to make them extensible Key: LUCENE-4055 URL: https://issues.apache.org/jira/browse/LUCENE-4055 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Robert Muir Fix For: 4.0 Attachments: LUCENE-4055.patch After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes should be made abstract so that they can be extended by Codec-s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible
[ https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284544#comment-13284544 ] Renaud Delbru commented on LUCENE-4055: --- {quote} What codec metadata? {quote} Metadata that indicates which codec is used for a particular field. Let say I want to have a specific TermsHashConsumerPerField depending on the codec used by a field. For example, for field A and field B which use the Lucen40 codec, we need to use the FreqProxTermsWriterPerField. And for field C that uses my own specific codec, I need to use the MyOwnTermsWriterPerField. My current understanding tells me that to do this, the only way is to customise the IndexingChain with a new TermsHashConsumer that overrides the method TermsHashConsumer#addField(TermsHashPerField termsHashPerField, FieldInfo fieldInfo). This method addField will be able to instantiate the correct TermsHashConsumerPerField if and only if there is codec metadata in the FieldInfo parameter. That's why I am interested of using a customised FieldInfo to store codec-related metadata about a field. Or is there a better way to get codec-related information about a field in the IndexingChain ? Refactor SegmentInfo / FieldInfo to make them extensible Key: LUCENE-4055 URL: https://issues.apache.org/jira/browse/LUCENE-4055 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Robert Muir Fix For: 4.0 Attachments: LUCENE-4055.patch After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes should be made abstract so that they can be extended by Codec-s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible
[ https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284548#comment-13284548 ] Robert Muir commented on LUCENE-4055: - Codecs are not per-field, they encode the entire inverted index segment. Refactor SegmentInfo / FieldInfo to make them extensible Key: LUCENE-4055 URL: https://issues.apache.org/jira/browse/LUCENE-4055 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Robert Muir Fix For: 4.0 Attachments: LUCENE-4055.patch After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes should be made abstract so that they can be extended by Codec-s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible
[ https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284551#comment-13284551 ] Renaud Delbru commented on LUCENE-4055: --- Sorry, I meant PostingsFormat instead of Codec. Refactor SegmentInfo / FieldInfo to make them extensible Key: LUCENE-4055 URL: https://issues.apache.org/jira/browse/LUCENE-4055 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Robert Muir Fix For: 4.0 Attachments: LUCENE-4055.patch After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes should be made abstract so that they can be extended by Codec-s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible
[ https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284553#comment-13284553 ] Robert Muir commented on LUCENE-4055: - Well you can do postingsFormat instanceof PerFieldPostingsFormat + postingsFormat.getPostingsFormatForField if you really want. But keep in mind PerFieldPostingsFormat is not really special and just one we provide for convenience, obviously one could write their own PostingsFormat that implements the same thing in a different way. Refactor SegmentInfo / FieldInfo to make them extensible Key: LUCENE-4055 URL: https://issues.apache.org/jira/browse/LUCENE-4055 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Robert Muir Fix For: 4.0 Attachments: LUCENE-4055.patch After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes should be made abstract so that they can be extended by Codec-s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible
[ https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13284153#comment-13284153 ] Renaud Delbru commented on LUCENE-4055: --- Does this patch allows FreqProxTermsWriterPerField to be dependent to the Codec/Field ? I have a use case where I need my own FreqProxTermsWriterPerField for certain fields. I am asking this because I see in your patch the following line: {code} final FreqProxTermsWriterPerField fieldWriter = allFields.get(fieldNumber); {code} which seems to retrieve a particular FreqProxTermsWriterPerField for each field type. Refactor SegmentInfo / FieldInfo to make them extensible Key: LUCENE-4055 URL: https://issues.apache.org/jira/browse/LUCENE-4055 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Robert Muir Fix For: 4.0 Attachments: LUCENE-4055.patch After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes should be made abstract so that they can be extended by Codec-s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible
[ https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283725#comment-13283725 ] Andrzej Bialecki commented on LUCENE-4055: --- +1, this looks very good. One comment re. SegmentInfoPerCommit. This class is not extensible and contains a fixed set of attributes. In LUCENE-3837 this or similar place would be the ideal mechanism to carry info about stacked segments, since this information is specific to a commit point. Unfortunately, there are no MapString,String attributes on this level, so I guess for now this type of aux data will have to be put in SegmentInfos.userData even though it's not index global but segment-specific. Refactor SegmentInfo / FieldInfo to make them extensible Key: LUCENE-4055 URL: https://issues.apache.org/jira/browse/LUCENE-4055 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Robert Muir Fix For: 4.0 Attachments: LUCENE-4055.patch After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes should be made abstract so that they can be extended by Codec-s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible
[ https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283738#comment-13283738 ] Robert Muir commented on LUCENE-4055: - Right, but I think this is correct: the codec should be responsible for encode/decode of inverted index segments only (the whole problem here originally was trying to have it also look after commits). So it really shouldn't be customizing things about the commit, as that creates a confusing impedance mismatch. I think things like stacked segments in LUCENE-3837 that need to do things other than implement encoding/decoding of segment should be above the codec level: since its a separate concern, if someone wants to have updatable fields thats unrelated to the integer compression algorithm used or what not. Refactor SegmentInfo / FieldInfo to make them extensible Key: LUCENE-4055 URL: https://issues.apache.org/jira/browse/LUCENE-4055 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Robert Muir Fix For: 4.0 Attachments: LUCENE-4055.patch After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes should be made abstract so that they can be extended by Codec-s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible
[ https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283755#comment-13283755 ] Andrzej Bialecki commented on LUCENE-4055: --- bq. stacked segments in LUCENE-3837 that need to do things other than implement encoding/decoding of segment should be above the codec level .. Certainly, that's why it would make sense to put this extended info in SegmentInfoPerCommit and not in any file handled by Codec. Refactor SegmentInfo / FieldInfo to make them extensible Key: LUCENE-4055 URL: https://issues.apache.org/jira/browse/LUCENE-4055 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Robert Muir Fix For: 4.0 Attachments: LUCENE-4055.patch After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes should be made abstract so that they can be extended by Codec-s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible
[ https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13283793#comment-13283793 ] Robert Muir commented on LUCENE-4055: - {quote} My comment was about the lack of easy extensibility of the codec-independent per-segment data (SegmentInfoPerCommit - info about stacked data is per-segment and per-commit), so LUCENE-3837 will need to use for now the codec-independent index-global data (SegmentInfos). {quote} Well SegmentInfos is just the list of SegmentInfoPerCommit, so I think in the case of LUCENE-3837 we would just place this data alongside the only other per-segment-per-commit data: deletes (e.g. we would add something like updatesGen and updatesCount or whatever). Sure, we have to bump the file header and what not, but the file is so simple now in the sense its just a list of segment names, the codec to decode them, with their deletes, that it wouldn't be a big deal. And I think per-segment-per-commit data is pretty rare: we only have deletes, and in the future updates, but I can't imagine lots of other stuff belonging in this category (versions the per-segment metadata in SI which has been pretty volatile in the past). Refactor SegmentInfo / FieldInfo to make them extensible Key: LUCENE-4055 URL: https://issues.apache.org/jira/browse/LUCENE-4055 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Robert Muir Fix For: 4.0 Attachments: LUCENE-4055.patch After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes should be made abstract so that they can be extended by Codec-s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible
[ https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13281105#comment-13281105 ] Robert Muir commented on LUCENE-4055: - Just some updates from the work in the branch (scary changes but proceeding nicely since Mike jumped in and did a lot of it). Here's a list of the current progress: * on disk, the segments_N is reduced to the stuff that actually is per-commit: a list of segments and deleted gens/counts, etc. * per-segment metadata (doc count, diagnostics, etc) that is write-once is encoded by the codec, e.g. for 4.0's codec this is in the .si file. * removed backwards-seeking on segments_N. so appendingcodec still works but doesn't need any special hacks. * flush/merge order is changed so that fieldinfos are written last so codecs have a chance to add metadata to it. * fieldinfo has a codec metadata api that codec components can use, and that metadata will be available on reading the segment. this metadata is for the codec to use to extend fieldinfo, its not carried along during merge or anything. * PerFieldPostingsFormat is changed to use the fieldinfo metadata api rather than a separate .per file (e.g. it records that the id field uses Pulsing). * all the hairiness involving files() is really nice now, instead we simply track which files were created, and add them to the .si file. Previously there was a lot of logic to compute this in a symmetric way at both read and write time, and if you had a bug, your punishment was FNFE. not yet done: * add metadata api to segmentinfo too, so that codec components can record per-segment information that they care about. * see if we can implement 3.x's shared doc stores support with segmentinfo metadata api. This is tricky to do and for addIndexes/indexSplitter etc which do sneaky things to still work. * see if we can implement 3.x normGen (separate norms) with segmentinfo metadata. while in 3.x lucene this was actually per-commit, since 3.x support is read-only we can effectively treat it as per-segment this way. * rename stuff so that we have a clearer naming for some of these classes. I'm also probably missing a few other things. In general I'm pretty happy with the metadata key-value attributes api versus subclassing. I tried to make subclassing work, but subclassing turned really ugly fast and made various codec components too tightly-coupled, e.g. if someone wants to combine a CompressedStoredFields with a PerFieldPostingsFormat and SpecialTermVectors, what would the impls be :). So the overly simple MapString,String avoids these issues, and hey its just metadata after all so I don't think anything more complex is really needed. Refactor SegmentInfo / FieldInfo to make them extensible Key: LUCENE-4055 URL: https://issues.apache.org/jira/browse/LUCENE-4055 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Robert Muir Fix For: 4.0 After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes should be made abstract so that they can be extended by Codec-s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible
[ https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13276363#comment-13276363 ] Robert Muir commented on LUCENE-4055: - Branch location for this issue and LUCENE-4050: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene4055 Refactor SegmentInfo / FieldInfo to make them extensible Key: LUCENE-4055 URL: https://issues.apache.org/jira/browse/LUCENE-4055 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Robert Muir Fix For: 4.0 After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes should be made abstract so that they can be extended by Codec-s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible
[ https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13274412#comment-13274412 ] Andrzej Bialecki commented on LUCENE-4055: --- +1. Per commit information could be named CommitInfo (which it really is). I like SegmentMetadata - if we left SegmentInfo as a name it would be confusing with its current functionality. Introspection could return MapString,Object to avoid converting e.g. numeric values back and forth. Refactor SegmentInfo / FieldInfo to make them extensible Key: LUCENE-4055 URL: https://issues.apache.org/jira/browse/LUCENE-4055 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Robert Muir Fix For: 4.0 After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes should be made abstract so that they can be extended by Codec-s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible
[ https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13274401#comment-13274401 ] Robert Muir commented on LUCENE-4055: - rough plan: * create branch for this and LUCENE-4050, the work is related really. * separate per commit information (SegmentList? Commit?) from per-segment metadata (SegmentInfo? SegmentMetadata?). the latter is basically si and fi. these names can change. maybe fi is still preserved mostly asis. * clean up the way per-segment metadata is used in such a way that it is abstract and minimal to what IndexWriter/MP needs, not full of codec specific stuff. * ensure codecs can privately write any metadat they need (eg hasProx) and that its accessible via all codec apis. * try to ensure codecs write things they need like hasProx to compute files() without iterating over fieldinfos. this would be a bonus, its currently a concern of mine. * add a basic introspection api (like mapstring,string) for tools like luke to be able to display codec private values. Refactor SegmentInfo / FieldInfo to make them extensible Key: LUCENE-4055 URL: https://issues.apache.org/jira/browse/LUCENE-4055 Project: Lucene - Java Issue Type: Improvement Components: core/codecs Reporter: Andrzej Bialecki Assignee: Robert Muir Fix For: 4.0 After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes should be made abstract so that they can be extended by Codec-s. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org