[jira] [Commented] (LUCENE-5992) Version should not be encoded as a String in the index
[ https://issues.apache.org/jira/browse/LUCENE-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160155#comment-14160155 ] Uwe Schindler commented on LUCENE-5992: --- About the fixed format issue (if we for example in future remove add components): We could theoretically write another prefix byte as marker for the version format? Not sure if this is really good, it would just be a possibility to change the encoding format. About the public version ctor: I don't like this. Initially I thougth CodecUtil is part of util package, too, so it could be pkg-protected, but this is not the case :-( In general I would instead prefer to have the ctor hidden again, but add some static method with good comment not to use it. A ctor always makes people try to use it, especially because of Eclipse autocomplete. So better use Version.buildFromComponents(major, minor,...). Version should not be encoded as a String in the index -- Key: LUCENE-5992 URL: https://issues.apache.org/jira/browse/LUCENE-5992 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, Trunk Attachments: LUCENE-5992.patch The version is really just 3 (maybe 4) ints under-the-hood, but today we write it as a String which then requires spooky string tokenization/parsing when we open the index. I think it should be encoded directly as ints. In LUCENE-5952 I had tried to make this change, but it was controversial, and got booted. Then in LUCENE-5969, I tried again, but that issue has morphed (nicely!) into fixing all sorts of things *except* these three ints. Maybe 3rd time's a charm ;) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5992) Version should not be encoded as a String in the index
[ https://issues.apache.org/jira/browse/LUCENE-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160167#comment-14160167 ] Michael McCandless commented on LUCENE-5992: If we add the format byte then we could move read/write back into Version.java and keep the ctor private ... I think I like that option. Version should not be encoded as a String in the index -- Key: LUCENE-5992 URL: https://issues.apache.org/jira/browse/LUCENE-5992 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, Trunk Attachments: LUCENE-5992.patch The version is really just 3 (maybe 4) ints under-the-hood, but today we write it as a String which then requires spooky string tokenization/parsing when we open the index. I think it should be encoded directly as ints. In LUCENE-5952 I had tried to make this change, but it was controversial, and got booted. Then in LUCENE-5969, I tried again, but that issue has morphed (nicely!) into fixing all sorts of things *except* these three ints. Maybe 3rd time's a charm ;) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5992) Version should not be encoded as a String in the index
[ https://issues.apache.org/jira/browse/LUCENE-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160202#comment-14160202 ] Uwe Schindler commented on LUCENE-5992: --- I am fine with both options. For me it is important that Version has no public ctor (under no circumstances), so people using Eclipse autocomplete do not naturally use the ctor to pass version constants to analyzers or other stuff in other public APIs. This will cause endless bug reports. A separate static method should be preferred here, because it has a method name that explains what it does. The eclipse user has to choose carefully and cannot automatically use the worst option. Version should not be encoded as a String in the index -- Key: LUCENE-5992 URL: https://issues.apache.org/jira/browse/LUCENE-5992 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, Trunk Attachments: LUCENE-5992.patch The version is really just 3 (maybe 4) ints under-the-hood, but today we write it as a String which then requires spooky string tokenization/parsing when we open the index. I think it should be encoded directly as ints. In LUCENE-5952 I had tried to make this change, but it was controversial, and got booted. Then in LUCENE-5969, I tried again, but that issue has morphed (nicely!) into fixing all sorts of things *except* these three ints. Maybe 3rd time's a charm ;) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5992) Version should not be encoded as a String in the index
[ https://issues.apache.org/jira/browse/LUCENE-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160240#comment-14160240 ] Robert Muir commented on LUCENE-5992: - What are you defending against? In LUCENE-5969: we fixed addIndexes(Dir) to just copy the .si file too, so old SI writers are read-only, instead files() is just always stripped like CFS. So if we need to write 4 ints, we just make a new SI that does that? But no longer must an SI write for versions in the future :) Version should not be encoded as a String in the index -- Key: LUCENE-5992 URL: https://issues.apache.org/jira/browse/LUCENE-5992 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, Trunk Attachments: LUCENE-5992.patch The version is really just 3 (maybe 4) ints under-the-hood, but today we write it as a String which then requires spooky string tokenization/parsing when we open the index. I think it should be encoded directly as ints. In LUCENE-5952 I had tried to make this change, but it was controversial, and got booted. Then in LUCENE-5969, I tried again, but that issue has morphed (nicely!) into fixing all sorts of things *except* these three ints. Maybe 3rd time's a charm ;) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5992) Version should not be encoded as a String in the index
[ https://issues.apache.org/jira/browse/LUCENE-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160283#comment-14160283 ] Jack Krupansky commented on LUCENE-5992: What about versions of an index during the development process, like each time a change to the index format is committed? Such as the alpha and beta stages in 4.0? I'd be happier with four version ints: major, minor, patch, change. Although, in theory, we shouldn't be changing the index format in either minor or patch releases, but bug fixes for indexing can be valid changes as well. Now, the question is whether change should reset to zero each time we branch, or should really just be an ever-increasing index format version number. The latter may make sense, but either is fine. The latter also makes sense from the perspective of the potential of successive releases which don't introduce index incompatibilities. I lean towards the latter, but still makes sense to defensively record which release wrote an index. Version should not be encoded as a String in the index -- Key: LUCENE-5992 URL: https://issues.apache.org/jira/browse/LUCENE-5992 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, Trunk Attachments: LUCENE-5992.patch The version is really just 3 (maybe 4) ints under-the-hood, but today we write it as a String which then requires spooky string tokenization/parsing when we open the index. I think it should be encoded directly as ints. In LUCENE-5952 I had tried to make this change, but it was controversial, and got booted. Then in LUCENE-5969, I tried again, but that issue has morphed (nicely!) into fixing all sorts of things *except* these three ints. Maybe 3rd time's a charm ;) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5992) Version should not be encoded as a String in the index
[ https://issues.apache.org/jira/browse/LUCENE-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160339#comment-14160339 ] Michael McCandless commented on LUCENE-5992: bq. What about versions of an index during the development process, like each time a change to the index format is committed? We are freely allowed to completely break the index format within one release. So it's only on releasing, that we commit to a published version of the format ... I don't think we should add the 4th change int, at least not on this issue. It's hard enough just writing and reading the 3 ints! Version should not be encoded as a String in the index -- Key: LUCENE-5992 URL: https://issues.apache.org/jira/browse/LUCENE-5992 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, Trunk Attachments: LUCENE-5992.patch The version is really just 3 (maybe 4) ints under-the-hood, but today we write it as a String which then requires spooky string tokenization/parsing when we open the index. I think it should be encoded directly as ints. In LUCENE-5952 I had tried to make this change, but it was controversial, and got booted. Then in LUCENE-5969, I tried again, but that issue has morphed (nicely!) into fixing all sorts of things *except* these three ints. Maybe 3rd time's a charm ;) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5992) Version should not be encoded as a String in the index
[ https://issues.apache.org/jira/browse/LUCENE-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160373#comment-14160373 ] Ryan Ernst commented on LUCENE-5992: Patch LGTM. Version should not be encoded as a String in the index -- Key: LUCENE-5992 URL: https://issues.apache.org/jira/browse/LUCENE-5992 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, Trunk Attachments: LUCENE-5992.patch, LUCENE-5992.patch The version is really just 3 (maybe 4) ints under-the-hood, but today we write it as a String which then requires spooky string tokenization/parsing when we open the index. I think it should be encoded directly as ints. In LUCENE-5952 I had tried to make this change, but it was controversial, and got booted. Then in LUCENE-5969, I tried again, but that issue has morphed (nicely!) into fixing all sorts of things *except* these three ints. Maybe 3rd time's a charm ;) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5992) Version should not be encoded as a String in the index
[ https://issues.apache.org/jira/browse/LUCENE-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160398#comment-14160398 ] Uwe Schindler commented on LUCENE-5992: --- The documentation of Version#write is now incorrect. It still says vint. Version should not be encoded as a String in the index -- Key: LUCENE-5992 URL: https://issues.apache.org/jira/browse/LUCENE-5992 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, Trunk Attachments: LUCENE-5992.patch, LUCENE-5992.patch The version is really just 3 (maybe 4) ints under-the-hood, but today we write it as a String which then requires spooky string tokenization/parsing when we open the index. I think it should be encoded directly as ints. In LUCENE-5952 I had tried to make this change, but it was controversial, and got booted. Then in LUCENE-5969, I tried again, but that issue has morphed (nicely!) into fixing all sorts of things *except* these three ints. Maybe 3rd time's a charm ;) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5992) Version should not be encoded as a String in the index
[ https://issues.apache.org/jira/browse/LUCENE-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160551#comment-14160551 ] Michael McCandless commented on LUCENE-5992: +1 for this patch: it's much simpler, and takes away the whole versioning of the version problem! Version should not be encoded as a String in the index -- Key: LUCENE-5992 URL: https://issues.apache.org/jira/browse/LUCENE-5992 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, Trunk Attachments: LUCENE-5992.patch, LUCENE-5992.patch, LUCENE-5992.patch, LUCENE-5992.patch The version is really just 3 (maybe 4) ints under-the-hood, but today we write it as a String which then requires spooky string tokenization/parsing when we open the index. I think it should be encoded directly as ints. In LUCENE-5952 I had tried to make this change, but it was controversial, and got booted. Then in LUCENE-5969, I tried again, but that issue has morphed (nicely!) into fixing all sorts of things *except* these three ints. Maybe 3rd time's a charm ;) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5992) Version should not be encoded as a String in the index
[ https://issues.apache.org/jira/browse/LUCENE-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160574#comment-14160574 ] Uwe Schindler commented on LUCENE-5992: --- +1, I like the static method name! Version should not be encoded as a String in the index -- Key: LUCENE-5992 URL: https://issues.apache.org/jira/browse/LUCENE-5992 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, Trunk Attachments: LUCENE-5992.patch, LUCENE-5992.patch, LUCENE-5992.patch, LUCENE-5992.patch The version is really just 3 (maybe 4) ints under-the-hood, but today we write it as a String which then requires spooky string tokenization/parsing when we open the index. I think it should be encoded directly as ints. In LUCENE-5952 I had tried to make this change, but it was controversial, and got booted. Then in LUCENE-5969, I tried again, but that issue has morphed (nicely!) into fixing all sorts of things *except* these three ints. Maybe 3rd time's a charm ;) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5992) Version should not be encoded as a String in the index
[ https://issues.apache.org/jira/browse/LUCENE-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160611#comment-14160611 ] Ryan Ernst commented on LUCENE-5992: +1 to the latest patch. Can we add {{@lucene.internal}} to docs for {{Version.fromBits}} method? Version should not be encoded as a String in the index -- Key: LUCENE-5992 URL: https://issues.apache.org/jira/browse/LUCENE-5992 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, Trunk Attachments: LUCENE-5992.patch, LUCENE-5992.patch, LUCENE-5992.patch, LUCENE-5992.patch The version is really just 3 (maybe 4) ints under-the-hood, but today we write it as a String which then requires spooky string tokenization/parsing when we open the index. I think it should be encoded directly as ints. In LUCENE-5952 I had tried to make this change, but it was controversial, and got booted. Then in LUCENE-5969, I tried again, but that issue has morphed (nicely!) into fixing all sorts of things *except* these three ints. Maybe 3rd time's a charm ;) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5992) Version should not be encoded as a String in the index
[ https://issues.apache.org/jira/browse/LUCENE-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14160983#comment-14160983 ] Michael McCandless commented on LUCENE-5992: I'll commit Rob's last patch, and add @lucene.internal... Version should not be encoded as a String in the index -- Key: LUCENE-5992 URL: https://issues.apache.org/jira/browse/LUCENE-5992 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, Trunk Attachments: LUCENE-5992.patch, LUCENE-5992.patch, LUCENE-5992.patch, LUCENE-5992.patch The version is really just 3 (maybe 4) ints under-the-hood, but today we write it as a String which then requires spooky string tokenization/parsing when we open the index. I think it should be encoded directly as ints. In LUCENE-5952 I had tried to make this change, but it was controversial, and got booted. Then in LUCENE-5969, I tried again, but that issue has morphed (nicely!) into fixing all sorts of things *except* these three ints. Maybe 3rd time's a charm ;) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5992) Version should not be encoded as a String in the index
[ https://issues.apache.org/jira/browse/LUCENE-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161009#comment-14161009 ] ASF subversion and git services commented on LUCENE-5992: - Commit 1629769 from [~mikemccand] in branch 'dev/trunk' [ https://svn.apache.org/r1629769 ] LUCENE-5992: encode version using 3 ints, not String, for Lucene 5.x indices Version should not be encoded as a String in the index -- Key: LUCENE-5992 URL: https://issues.apache.org/jira/browse/LUCENE-5992 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, Trunk Attachments: LUCENE-5992.patch, LUCENE-5992.patch, LUCENE-5992.patch, LUCENE-5992.patch The version is really just 3 (maybe 4) ints under-the-hood, but today we write it as a String which then requires spooky string tokenization/parsing when we open the index. I think it should be encoded directly as ints. In LUCENE-5952 I had tried to make this change, but it was controversial, and got booted. Then in LUCENE-5969, I tried again, but that issue has morphed (nicely!) into fixing all sorts of things *except* these three ints. Maybe 3rd time's a charm ;) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5992) Version should not be encoded as a String in the index
[ https://issues.apache.org/jira/browse/LUCENE-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161032#comment-14161032 ] Michael McCandless commented on LUCENE-5992: Thanks Rob, those tests and the removal of FIS from SIWriter look great ... I'll commit shortly backport to 5.x Version should not be encoded as a String in the index -- Key: LUCENE-5992 URL: https://issues.apache.org/jira/browse/LUCENE-5992 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, Trunk Attachments: LUCENE-5992.patch, LUCENE-5992.patch, LUCENE-5992.patch, LUCENE-5992.patch, LUCENE-5992_tests.patch The version is really just 3 (maybe 4) ints under-the-hood, but today we write it as a String which then requires spooky string tokenization/parsing when we open the index. I think it should be encoded directly as ints. In LUCENE-5952 I had tried to make this change, but it was controversial, and got booted. Then in LUCENE-5969, I tried again, but that issue has morphed (nicely!) into fixing all sorts of things *except* these three ints. Maybe 3rd time's a charm ;) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5992) Version should not be encoded as a String in the index
[ https://issues.apache.org/jira/browse/LUCENE-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161084#comment-14161084 ] ASF subversion and git services commented on LUCENE-5992: - Commit 1629774 from [~mikemccand] in branch 'dev/trunk' [ https://svn.apache.org/r1629774 ] LUCENE-5992: add SI tests; remove FieldInfos from SegmentInfosWrite.write Version should not be encoded as a String in the index -- Key: LUCENE-5992 URL: https://issues.apache.org/jira/browse/LUCENE-5992 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, Trunk Attachments: LUCENE-5992.patch, LUCENE-5992.patch, LUCENE-5992.patch, LUCENE-5992.patch, LUCENE-5992_tests.patch The version is really just 3 (maybe 4) ints under-the-hood, but today we write it as a String which then requires spooky string tokenization/parsing when we open the index. I think it should be encoded directly as ints. In LUCENE-5952 I had tried to make this change, but it was controversial, and got booted. Then in LUCENE-5969, I tried again, but that issue has morphed (nicely!) into fixing all sorts of things *except* these three ints. Maybe 3rd time's a charm ;) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5992) Version should not be encoded as a String in the index
[ https://issues.apache.org/jira/browse/LUCENE-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14161101#comment-14161101 ] ASF subversion and git services commented on LUCENE-5992: - Commit 1629775 from [~mikemccand] in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1629775 ] LUCENE-5992: encode version using 3 ints, not String, for Lucene 5.x indices; add SI tests; remove FieldInfos from SegmentInfosWrite.write Version should not be encoded as a String in the index -- Key: LUCENE-5992 URL: https://issues.apache.org/jira/browse/LUCENE-5992 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 5.0, Trunk Attachments: LUCENE-5992.patch, LUCENE-5992.patch, LUCENE-5992.patch, LUCENE-5992.patch, LUCENE-5992_tests.patch The version is really just 3 (maybe 4) ints under-the-hood, but today we write it as a String which then requires spooky string tokenization/parsing when we open the index. I think it should be encoded directly as ints. In LUCENE-5952 I had tried to make this change, but it was controversial, and got booted. Then in LUCENE-5969, I tried again, but that issue has morphed (nicely!) into fixing all sorts of things *except* these three ints. Maybe 3rd time's a charm ;) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org