[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090023#comment-13090023 ] Uwe Schindler commented on LUCENE-3218: --- I would also rename CFIndexInput to SliceIndexInput, it's private so does not matter, but wozuld be nice to have. Otherwise I agree with committing to trunk. As far as I see, the format did not change in trunk, so once we get this back into 3.x we are at the state pre-revert? Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 3.4, 4.0 Reporter: Simon Willnauer Priority: Blocker Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090320#comment-13090320 ] Simon Willnauer commented on LUCENE-3218: - I committed this to trunk. I will leave this issue open until we decide to backport to 3.x. simon Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 3.4, 4.0 Reporter: Simon Willnauer Priority: Blocker Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089631#comment-13089631 ] Michael McCandless commented on LUCENE-3218: This approach looks nice! Maybe rename IndexInputHandle to IndexInputProvider? IndexInputSlicer? SliceCreator? Maybe rename CSIndexInput - SlicedIndexInput? In SimpleFSDir we may as well move that static Descriptor class out? Rather than having to import it to itself. Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 3.4, 4.0 Reporter: Simon Willnauer Priority: Blocker Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089639#comment-13089639 ] Mark Miller commented on LUCENE-3218: - bq. this seems close, the question is if we want to backport this to 3.x too? Why don't we get it committed to trunk and let it chill for a while, let it hit random testing for a while, get used by adventurous users, and then make the decision? Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 3.4, 4.0 Reporter: Simon Willnauer Priority: Blocker Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089653#comment-13089653 ] Simon Willnauer commented on LUCENE-3218: - I don't really like the name IndexInputHandle what about * IndexInputFactory * IndexInputProducer * IndexInputSlicer more ideas? Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 3.4, 4.0 Reporter: Simon Willnauer Priority: Blocker Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089651#comment-13089651 ] Simon Willnauer commented on LUCENE-3218: - bq. Why don't we get it committed to trunk and let it chill for a while, let it hit random testing for a while, get used by adventurous users, and then make the decision? +1 Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 3.4, 4.0 Reporter: Simon Willnauer Priority: Blocker Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088640#comment-13088640 ] Simon Willnauer commented on LUCENE-3218: - FYI - I backed out the changes from 3.x Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 3.4, 4.0 Reporter: Simon Willnauer Priority: Blocker Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088734#comment-13088734 ] Uwe Schindler commented on LUCENE-3218: --- Showuld we send an email to java-user as the index format in the stable branch changed by this (indexes with new CFS files can no longer be read)? Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 3.4, 4.0 Reporter: Simon Willnauer Priority: Blocker Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088796#comment-13088796 ] Simon Willnauer commented on LUCENE-3218: - bq. Showuld we send an email to java-user as the index format in the stable branch changed by this (indexes with new CFS files can no longer be read)? I will do Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 3.4, 4.0 Reporter: Simon Willnauer Priority: Blocker Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088826#comment-13088826 ] Uwe Schindler commented on LUCENE-3218: --- Hi Simon, thanks for taking care. This looks really nice and easier to understand. I agree, the problem with the RAF open file is hard to manage (especially when to close it). One small suggestion: Currently the CFS file is opened twice: One time to read the contents and a second time to read the actual files using the handle (and for new format to read the CFE file, but thats unavoidable - once we nuke old index support in Lucene 5, we can always open the cfe first and read the contents, but until then we need to do both). Why not open the IndexInputHandle at the beginning and then simply request a full slice for the directory initialization (or ideally only that part that contains the directory)? The slice can then be closed afterwards as before. So very cool work! Greetings from Berkeley! Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 3.4, 4.0 Reporter: Simon Willnauer Priority: Blocker Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088321#comment-13088321 ] Simon Willnauer commented on LUCENE-3218: - bq. I think we should back out these CFS changes for now. -1 I think we are over reacting here, especially robert gets too crazy about this. Honestly I think CFS should be detached from directory and we should make it a delegating directory if at all. That way we would always operate on the right directory, can safely create two files and keep Directory itself clean. We can still add the ability to partially map a certain file (offset, length) into memory like we do now in the specialized CFS Dirs. This entire think is not a problem of appending at all IMO. how does that sound? I think this would solve all the problems we are having and keeps it appendable. simon Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 3.4, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Priority: Blocker Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088335#comment-13088335 ] Uwe Schindler commented on LUCENE-3218: --- Hi when thinking about the whole stuff one more time again, I may have a solution to again decouple CFS from the parent directory, so one can create any CFS using one single class (but perhaps the factory in directory is still an idea to make it customizable). There are several solutions, but most of them have customization problems: - The current approach was discussed already, nothing more to say - A possibility to make it possible for MMap to map certain parts of the file is to move the getIndexInputSlice up to the abstract Directory base class and make the default implementation the current CFIndexInput from the default CFS impl. This would be even backwards compatible. So the CFS impl can simply ask the parent directory it warps for a slice. The problem here is easy: Current CFS impl opens the CFS file exactly one time and consumes exactly one file handle. The slices work on the same file handle. If we move the slice handling up to the directory, the state is gone, so handling the all-the-time open CFS file cannot be managed anymore. When using a new file handle for each slice, we gain nothing (CFS is to reduce file handles). - Last night I had one idea that might fix this issue. Lets move the slice handling into the abstract IndexInput base class, again the default impl would simply use the current CFIndexInput to return a slice. In the case of MMapIndexInput it would simply return a remapped slice on the current file handle. The only thing that would change is that the RAF would kept open the wohle time (like MMapCFDirectory does), in contrast to curren, where th RAF is closed directly after mapping. This approach would allow it for the CFS impl to simply ask it parant directory for an IndexInput to handle the SFC file itsself and for each sub-slice ask this IndexInput for this. The last approach seems reasonable, but we need some more checks how to implement that. The last approach keeps both features of CFS: - One OS file handle - possibility for certain directory implementations to return sliced IndexInputs in an optimal way. The current IndexInput have a clone method, in this case we would need a similar method, where you can give offset and length. On the other hand, we can remove the factory for CFS files from directory, we can go back to a simple new CFSDirectory(parentDirectory, cfsName). Does this sound reasonable? Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 3.4, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Priority: Blocker Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088338#comment-13088338 ] Robert Muir commented on LUCENE-3218: - None of this is reasonable. When something goes wrong with an optimization, and multiple people ask for you to back it out, back it out. then later we can discuss how to re-implement it. Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 3.4, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Priority: Blocker Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088344#comment-13088344 ] Uwe Schindler commented on LUCENE-3218: --- But we can still consider this as solutions to solve the issue later? I just dont want to make suggestions with lots of brainwork and sleepless nights involved, if it's not considered and just be backed out with None of this is reasonable.. Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 3.4, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Priority: Blocker Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088359#comment-13088359 ] Michael McCandless commented on LUCENE-3218: I think both Simon's and Uwe's ideas are good and should be explored! With all these ideas we will find a clean way to get CFS reading/writing integrated into Directory. But I think that exploration should just be outside of trunk and 3.x, eg on a branch. Once we iterate to a good point again we can commit it back to trunk, let it bake/age, then merge back to 3.x if it seems stable. Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 3.4, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Priority: Blocker Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088379#comment-13088379 ] Mark Miller commented on LUCENE-3218: - +1 on backing out of 3.x at least - this is our stable branch...I can't imagine this optimization belongs in our stable branch given all of this discussion... Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 3.4, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Priority: Blocker Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088392#comment-13088392 ] Simon Willnauer commented on LUCENE-3218: - its all yours do whatever you think needs to be done. have fun ;) Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 3.4, 4.0 Reporter: Simon Willnauer Priority: Blocker Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088395#comment-13088395 ] Simon Willnauer commented on LUCENE-3218: - bq. None of this is reasonable. your unreasonable comments here are totally counter productive IMO. Just my $0.05 Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 3.4, 4.0 Reporter: Simon Willnauer Priority: Blocker Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088166#comment-13088166 ] Robert Muir commented on LUCENE-3218: - I think the situation here is too complicated already, we are discussing all kinds of complicated stuff and I dont think appendable CFS is worth any of this. I think we should back out these CFS changes for now. Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088206#comment-13088206 ] Michael McCandless commented on LUCENE-3218: bq. I think we should back out these CFS changes for now. +1 Generally if we add a cool optimization and it turns out that optimization risks even just apparent index corruption and/or adds scary traps / confusing complexity to the API I think we should pull the change and iterate on the issue / branch until these problems are addressed? We had a similar experience with copyBytes, but that time it was real corruption. Optimizations aren't worth such risks I think, especially if it's only an index-time opto? Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 3.4, 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Priority: Blocker Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087700#comment-13087700 ] Michael McCandless commented on LUCENE-3218: Maybe we can avoid making a separate _X.cfe file? We did this because previously the CFS stored the header in the front of the file (I think)? Could we, instead, put the header at the end of the file, but place a long pointer at the start of the file saying where the header is located (I'd rather not rely on file.length())? Then we could have a single (_X.cfs) file again and we can not use the Dir impl for delegation? Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087702#comment-13087702 ] Simon Willnauer commented on LUCENE-3218: - bq. We did this because previously the CFS stored the header in the front of the file (I think)? is this really the problem here? I mean this problem is in FileSwitch / NRT Directory. The CFS uses a directory to write files, I would expect that if we use for instance NRT directory it gets the NRT directory instead of either of of its sub directories. its not really a CFS problem IMO and we should rather fix the actual directory rather than reverting the small optimization having the header in a separate file. i think we should prevent the seek if not absolutely necessary. Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087704#comment-13087704 ] Marvin Humphrey commented on LUCENE-3218: - I don't fully grok Robert's concern, but with regards to Mike's suggestion of inlining the metadata: Why not put that file pointer at the very end of the file? So that the read-time sequence of actions is: seek to 8 bytes before the end, read the file pointer, seek back to beginning of metadata. That way you don't need to seek backwards during writing, which IIRC used to be an issue for Hadoop. Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087712#comment-13087712 ] Robert Muir commented on LUCENE-3218: - This is definitely not a bug in the directory, and its a serious issue (i think a blocker for release myself). I'll try to explain the issue again a little better than I did on https://issues.apache.org/jira/browse/LUCENE-3380?focusedCommentId=13086872page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13086872 This is just an example of the API problem, with FileSwitchDirectory. In Lucene we have FileSwitchDirectory which is a Directory that lets you switch between 2 different directory implementations based on file extension. So conceptually it looks like this: {code} FileSwitchDirectory extends Directory { Directory a; Directory b; Set extensions; // these are the file extensions that go to a, all other ones are handled by b } {code} Imagine you configure this directory to put all *.cfs in a, and everything else in b. So when FileSwitchDirectory is asked where to put 1.cfs, it forwards the request to a. But the 1.cfe file is actually wrongly created in a also, causing FileNotFoundExceptions later when the file is to be read, because its in the wrong directory. This is because of how the compound file mechanism works now, it calls a.createOutput(1.cfe) instead of fileswitchdirectory.createOutput(1.cfe). So this is a serious problem for any Directories that delegate responsibility like this, not just the ones in Lucene. Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087716#comment-13087716 ] Uwe Schindler commented on LUCENE-3218: --- Thanks Robert for explaining this again, I agree 100% with you, the current cfe/cfs discussion is really serious and heavy broken. Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087719#comment-13087719 ] Robert Muir commented on LUCENE-3218: - {quote} Maybe we can avoid making a separate _X.cfe file? {quote} +1, this sounds great (however it can be done, ideally with Marvin's idea to support appendable-only filesystems also), and would end the confusion here. Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087723#comment-13087723 ] Uwe Schindler commented on LUCENE-3218: --- bq. i think we should prevent the seek if not absolutely necessary. You have a very small optimization here that only affects opening the CFS. But because we need to fix the wrong behaviour in FileSwitch (and also NRTCaching dir, which is in my opinion more serious), FileSwitch and NRTCachingDir now use the default CompoundFileImpl. If you wrap MMapDir by FileSwitch or NRTCaching, the whole custom impl of the compound file in MMap that speeds up even further is obsolete, as not used (you can use the compound file with really no slowdown at all as we can map parts of the CFS file into memeory and need no offset calculations and can also save mapping costs). This is gone now, just because a one-time seek at opening time is prevented. Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087724#comment-13087724 ] Robert Muir commented on LUCENE-3218: - Right, the fix I applied is really a hack, but I didnt want to leave our codebase broken while we figure this out. Its not just a problem from a performance perspective, I think its just bad to make assumptions about how the inner directory works. In this case with fileswitchdirectory etc, it really should be fully delegating this stuff down, and be clueless about how its implemented by the underlying sub directory. Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087760#comment-13087760 ] Michael McCandless commented on LUCENE-3218: {quote} Why not put that file pointer at the very end of the file? So that the read-time sequence of actions is: seek to 8 bytes before the end, read the file pointer, seek back to beginning of metadata. {quote} I would rather not rely on metadata (file length) when reading, only the contents of the file. I think append-only filesystems (eg HDFS) can make their own impl that uses the file length instead (like AppendingCodecc). Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087963#comment-13087963 ] Andrzej Bialecki commented on LUCENE-3218: --- bq. I think append-only filesystems (eg HDFS) can make their own impl that uses the file length instead (like AppendingCodecc). AppendingCodec solves only one issue, that of postings and SegmentInfos. I'm worried that adding seek+rewrite tricks in other places that are not under the control of Codec or under any other configurable implementation (such as CFS) will ultimately prevent the efficient use of Lucene on Hadoop. Unless we put those places under the control of a Codec (or some other configurable interface). Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087972#comment-13087972 ] Uwe Schindler commented on LUCENE-3218: --- The trick with the latest updates to compound files is that the CompoundFileWriter/Reader is returned by the directory implementation - and this is broken and the discussion is about this. So this would be the place, where you theoretically could completely make another CFS on-disk format or e.g. write the stuff to a ZIP file :-) Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087987#comment-13087987 ] Simon Willnauer commented on LUCENE-3218: - personally I think we should try to be append only on general. So eventually this is about creating the cfe and cfs file from the right directory. What we could do to use the parent ie. FileSwitchDir etc. is add a protected method that allows passing the parent dir to the createCompoundOutput / openCompoundInput which is then in turn used to create the actual files. We can call this method from the public createCompoundOutput / openCompoundInput versions with this as the directory to create files. How does that sound? Lemme know if I miss something... Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087993#comment-13087993 ] Robert Muir commented on LUCENE-3218: - I disagree, we don't need to compensate for hadoop's problems. Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087998#comment-13087998 ] Uwe Schindler commented on LUCENE-3218: --- If we want append only, we should also remove seek methods from IndexOutput... I DISAGREE, too! Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088005#comment-13088005 ] Robert Muir commented on LUCENE-3218: - {quote} So eventually this is about creating the cfe and cfs file from the right directory. {quote} That's not the only issue: while that is the primary reason I reopened this issue I also have concerns about the API being complicated and non-intuitive. Making the API even more complicated because Filesystem X can only write WingDings or cannot seek doesn't seem to be a good solution to me. Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088067#comment-13088067 ] Uwe Schindler commented on LUCENE-3218: --- When looking into the CompoundFileDirectory code I also found a small bug in version handling. readEntries() reads the first VInt and uses it for version checking (if negative). This check has 2 problems: - if the VInt is smaller then FORMAT_CURRENT it should throw IndexTooNewException - the comparison should not be against FORMAT_CURRENT itsself (this constant should only be used for writing CFS files), it should compare against real version numbers. This would otherwise break on later additions of new formats. Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086896#comment-13086896 ] Robert Muir commented on LUCENE-3218: - Can CFS reading/writing not take a parent directory, instead of: CompoundFileDirectory(Directory parent, ) I think it should be CompoundFileDirectory(IndexInput cfs, IndexInput cfe) And directory.createOutput etc should take *both* filenames, this would remove this backdooring completely. Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 3.4, 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053181#comment-13053181 ] Robert Muir commented on LUCENE-3218: - Thanks Simon, I feel better now that we get our open-files-for-write tracking back. Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052597#comment-13052597 ] Michael McCandless commented on LUCENE-3218: Patch looks great! Can we name it createCompoundOutput? Emphasizes that we are write-once (this file shouldn't exist), and matches createOutput. On checkAbort... we could not send that to the CFW and instead call checkAbort in the outer loops? (Ie, where we .copy the files in). The existing CFW already only checks once-per-file anyway... Maybe instead of asserts for the mis-use of the CFD API (eg no entries, something is still open), we should make these real exceptions (ie, thrown even when assertions are off)? This comment looks stale (in CFW.java)?: {noformat} // Close the output stream. Set the os to null before trying to // close so that if an exception occurs during the close, the // finally clause below will not attempt to close the stream // the second time. {noformat} openCompoundOutput needs javadoc. CFD.createOutput's jdoc says Not Implememented but it is. The new test cases in TestCompoundFile names its file d.csf ;) Column stride fields lives on!! Too many tlas... Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052633#comment-13052633 ] Simon Willnauer commented on LUCENE-3218: - Committed in revision 1138063. I will try to backport this to 3.x if possible Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3218) Make CFS appendable
[ https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1305#comment-1305 ] Michael McCandless commented on LUCENE-3218: Patch looks cool! So the CFW will take the first output opened against it and let it write directly into the actual CFS file, and then if another file is opened while that first one is still open, the 2nd file will write to separate file and then will copy in on close. We may want to delegate the separate files too? So that on close they copy themselves into the CFS and remove the original? This way IW won't have to separately create CFS in the end. Somehow we need IW to add the biggest sub-file first... s/compund/compound CFW.close should assert currentOutput != null (and, if we delegate sep entries, that they are also all closed)? You might need to sync the CompoundFileWriter.this.currentOutput test / setting to null? Though... Lucene is always single threaded in writing files for the same segment, today anyway. Can we make a separate createCompoundOutput? (Ie, instaed of passing OpenMode to openCompoundInput). And: I'm assuming a given compound output can only be opened once, appended to / separate files copied into, closed and then never opened again for writing? (Ie, still write once at the file level). Make CFS appendable - Key: LUCENE-3218 URL: https://issues.apache.org/jira/browse/LUCENE-3218 Project: Lucene - Java Issue Type: Improvement Components: core/index Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-3218.patch Currently CFS is created once all files are written during a flush / merge. Once on disk the files are copied into the CFS format which is basically a unnecessary for some of the files. We can at any time write at least one file directly into the CFS which can save a reasonable amount of IO. For instance stored fields could be written directly during indexing and during a Codec Flush one of the written files can be appended directly. This optimization is a nice sideeffect for lucene indexing itself but more important for DocValues and LUCENE-3216 we could transparently pack per field files into a single file only for docvalues without changing any code once LUCENE-3216 is resolved. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org