[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-24 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090023#comment-13090023
 ] 

Uwe Schindler commented on LUCENE-3218:
---

I would also rename CFIndexInput to SliceIndexInput, it's private so does not 
matter, but wozuld be nice to have.

Otherwise I agree with committing to trunk. As far as I see, the format did not 
change in trunk, so once we get this back into 3.x we are at the state 
pre-revert?

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Priority: Blocker
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-24 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13090320#comment-13090320
 ] 

Simon Willnauer commented on LUCENE-3218:
-

I committed this to trunk. I will leave this issue open until we decide to 
backport to 3.x.

simon

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Priority: Blocker
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089631#comment-13089631
 ] 

Michael McCandless commented on LUCENE-3218:


This approach looks nice!  Maybe rename IndexInputHandle to
IndexInputProvider?  IndexInputSlicer?  SliceCreator?

Maybe rename CSIndexInput - SlicedIndexInput?

In SimpleFSDir we may as well move that static Descriptor class out?
Rather than having to import it to itself.


 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Priority: Blocker
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-23 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089639#comment-13089639
 ] 

Mark Miller commented on LUCENE-3218:
-

bq. this seems close, the question is if we want to backport this to 3.x too?

Why don't we get it committed to trunk and let it chill for a while, let it hit 
random testing for a while, get used by adventurous users, and then make the 
decision?

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Priority: Blocker
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-23 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089653#comment-13089653
 ] 

Simon Willnauer commented on LUCENE-3218:
-

I don't really like the name IndexInputHandle what about
 * IndexInputFactory
 * IndexInputProducer
 * IndexInputSlicer
 
more ideas?

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Priority: Blocker
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-23 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089651#comment-13089651
 ] 

Simon Willnauer commented on LUCENE-3218:
-

bq. Why don't we get it committed to trunk and let it chill for a while, let it 
hit random testing for a while, get used by adventurous users, and then make 
the decision?
+1

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Priority: Blocker
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-22 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088640#comment-13088640
 ] 

Simon Willnauer commented on LUCENE-3218:
-

FYI - I backed out the changes from 3.x 

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Priority: Blocker
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-22 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088734#comment-13088734
 ] 

Uwe Schindler commented on LUCENE-3218:
---

Showuld we send an email to java-user as the index format in the stable branch 
changed by this (indexes with new CFS files can no longer be read)?

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Priority: Blocker
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-22 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088796#comment-13088796
 ] 

Simon Willnauer commented on LUCENE-3218:
-

bq. Showuld we send an email to java-user as the index format in the stable 
branch changed by this (indexes with new CFS files can no longer be read)?
I will do

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Priority: Blocker
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-22 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088826#comment-13088826
 ] 

Uwe Schindler commented on LUCENE-3218:
---

Hi Simon,

thanks for taking care. This looks really nice and easier to understand. I 
agree, the problem with the RAF open file is hard to manage (especially when to 
close it).

One small suggestion: Currently the CFS file is opened twice: One time to read 
the contents and a second time to read the actual files using the handle (and 
for new format to read the CFE file, but thats unavoidable - once we nuke old 
index support in Lucene 5, we can always open the cfe first and read the 
contents, but until then we need to do both). Why not open the IndexInputHandle 
at the beginning and then simply request a full slice for the directory 
initialization (or ideally only that part that contains the directory)? The 
slice can then be closed afterwards as before.

So very cool work!
Greetings from Berkeley!

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Priority: Blocker
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218_3x.patch, 
 LUCENE-3218_test_fix.patch, LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-21 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088321#comment-13088321
 ] 

Simon Willnauer commented on LUCENE-3218:
-

bq. I think we should back out these CFS changes for now.

-1 

I think we are over reacting here, especially robert gets too crazy about this. 
Honestly I think CFS should be detached from directory and we should make it a 
delegating directory if at all. That way we would always operate on the right 
directory, can safely create two files and keep Directory itself clean. We can 
still add the ability to partially map a certain file (offset, length) into 
memory like we do now in the specialized CFS Dirs. This entire think is not a 
problem of appending at all IMO. 

how does that sound? I think this would solve all the problems we are having 
and keeps it appendable.

simon

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Blocker
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-21 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088335#comment-13088335
 ] 

Uwe Schindler commented on LUCENE-3218:
---

Hi when thinking about the whole stuff one more time again, I may have a 
solution to again decouple CFS from the parent directory, so one can create any 
CFS using one single class (but perhaps the factory in directory is still an 
idea to make it customizable). There are several solutions, but most of them 
have customization problems:
- The current approach was discussed already, nothing more to say
- A possibility to make it possible for MMap to map certain parts of the file 
is to move the getIndexInputSlice up to the abstract Directory base class and 
make the default implementation the current CFIndexInput from the default CFS 
impl. This would be even backwards compatible. So the CFS impl can simply ask 
the parent directory it warps for a slice. The problem here is easy: Current 
CFS impl opens the CFS file exactly one time and consumes exactly one file 
handle. The slices work on the same file handle. If we move the slice handling 
up to the directory, the state is gone, so handling the all-the-time open CFS 
file cannot be managed anymore. When using a new file handle for each slice, we 
gain nothing (CFS is to reduce file handles).
- Last night I had one idea that might fix this issue. Lets move the slice 
handling into the abstract IndexInput base class, again the default impl would 
simply use the current CFIndexInput to return a slice. In the case of 
MMapIndexInput it would simply return a remapped slice on the current file 
handle. The only thing that would change is that the RAF would kept open the 
wohle time (like MMapCFDirectory does), in contrast to curren, where th RAF is 
closed directly after mapping. This approach would allow it for the CFS impl to 
simply ask it parant directory for an IndexInput to handle the SFC file itsself 
and for each sub-slice ask this IndexInput for this.

The last approach seems reasonable, but we need some more checks how to 
implement that. The last approach keeps both features of CFS:
- One OS file handle
- possibility for certain directory implementations to return sliced 
IndexInputs in an optimal way. The current IndexInput have a clone method, in 
this case we would need a similar method, where you can give offset and length.

On the other hand, we can remove the factory for CFS files from directory, we 
can go back to a simple new CFSDirectory(parentDirectory, cfsName).

Does this sound reasonable?

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Blocker
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-21 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088338#comment-13088338
 ] 

Robert Muir commented on LUCENE-3218:
-

None of this is reasonable.

When something goes wrong with an optimization, and multiple people ask for you 
to back it out, back it out.

then later we can discuss how to re-implement it.



 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Blocker
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-21 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088344#comment-13088344
 ] 

Uwe Schindler commented on LUCENE-3218:
---

But we can still consider this as solutions to solve the issue later? I just 
dont want to make suggestions with lots of brainwork and sleepless nights 
involved, if it's not considered and just be backed out with None of this is 
reasonable..

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Blocker
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088359#comment-13088359
 ] 

Michael McCandless commented on LUCENE-3218:


I think both Simon's and Uwe's ideas are good and should be explored!  With all 
these ideas we will find a clean way to get CFS reading/writing integrated into 
Directory.

But I think that exploration should just be outside of trunk and 3.x, eg on a 
branch.  Once we iterate to a good point again we can commit it back to trunk, 
let it bake/age, then merge back to 3.x if it seems stable.

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Blocker
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-21 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088379#comment-13088379
 ] 

Mark Miller commented on LUCENE-3218:
-

+1 on backing out of 3.x at least - this is our stable branch...I can't imagine 
this optimization belongs in our stable branch given all of this discussion...

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Blocker
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-21 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088392#comment-13088392
 ] 

Simon Willnauer commented on LUCENE-3218:
-

its all yours do whatever you think needs to be done. have fun ;)

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Priority: Blocker
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-21 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088395#comment-13088395
 ] 

Simon Willnauer commented on LUCENE-3218:
-

bq. None of this is reasonable.
your unreasonable comments here are totally counter productive IMO. Just my 
$0.05 

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Priority: Blocker
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-20 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088166#comment-13088166
 ] 

Robert Muir commented on LUCENE-3218:
-

I think the situation here is too complicated already, we are discussing all 
kinds of complicated stuff and I dont think appendable CFS is worth any of 
this.

I think we should back out these CFS changes for now.

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-20 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088206#comment-13088206
 ] 

Michael McCandless commented on LUCENE-3218:


bq. I think we should back out these CFS changes for now.

+1

Generally if we add a cool optimization and it turns out that optimization 
risks even just apparent index corruption and/or adds scary traps / confusing 
complexity to the API I think we should pull the change and iterate on the 
issue / branch until these problems are addressed?

We had a similar experience with copyBytes, but that time it was real 
corruption.

Optimizations aren't worth such risks I think, especially if it's only an 
index-time opto?

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 3.4, 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Blocker
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087700#comment-13087700
 ] 

Michael McCandless commented on LUCENE-3218:


Maybe we can avoid making a separate _X.cfe file?

We did this because previously the CFS stored the header in the front of the 
file (I think)?

Could we, instead, put the header at the end of the file, but place a long 
pointer at the start of the file saying where the header is located (I'd rather 
not rely on file.length())?  Then we could have a single (_X.cfs) file again 
and we can not use the Dir impl for delegation?

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-19 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087702#comment-13087702
 ] 

Simon Willnauer commented on LUCENE-3218:
-

bq. We did this because previously the CFS stored the header in the front of 
the file (I think)?
is this really the problem here? I mean this problem is in FileSwitch / NRT 
Directory. The CFS uses a directory to write files, I would expect that if we 
use for instance NRT directory it gets the NRT directory instead of either of 
of its sub directories. its not really a CFS problem IMO and we should rather 
fix the actual directory rather than reverting the small optimization having 
the header in a separate file. i think we should prevent the seek if not 
absolutely necessary.

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-19 Thread Marvin Humphrey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087704#comment-13087704
 ] 

Marvin Humphrey commented on LUCENE-3218:
-

I don't fully grok Robert's concern, but with regards to Mike's suggestion of
inlining the metadata: Why not put that file pointer at the very end of the
file?  So that the read-time sequence of actions is: seek to 8 bytes before the
end, read the file pointer, seek back to beginning of metadata.

That way you don't need to seek backwards during writing, which IIRC used to
be an issue for Hadoop.


 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-19 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087712#comment-13087712
 ] 

Robert Muir commented on LUCENE-3218:
-

This is definitely not a bug in the directory, and its a serious issue (i think 
a blocker for release myself).

I'll try to explain the issue again a little better than I did on 
https://issues.apache.org/jira/browse/LUCENE-3380?focusedCommentId=13086872page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13086872

This is just an example of the API problem, with FileSwitchDirectory.

In Lucene we have FileSwitchDirectory which is a Directory that lets you 
switch between 2 different directory implementations based on file extension.
So conceptually it looks like this:

{code}
FileSwitchDirectory extends Directory {
  Directory a;
  Directory b;
  Set extensions; // these are the file extensions that go to a, all other 
ones are handled by b
}
{code}

Imagine you configure this directory to put all *.cfs in a, and everything 
else in b.

So when FileSwitchDirectory is asked where to put 1.cfs, it forwards the 
request to a.

But the 1.cfe file is actually wrongly created in a also, causing 
FileNotFoundExceptions later when the file is to be read, because its in the 
wrong directory. This is because of how the compound file mechanism works now, 
it calls a.createOutput(1.cfe) instead of 
fileswitchdirectory.createOutput(1.cfe).

So this is a serious problem for any Directories that delegate responsibility 
like this, not just the ones in Lucene.


 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087716#comment-13087716
 ] 

Uwe Schindler commented on LUCENE-3218:
---

Thanks Robert for explaining this again, I agree 100% with you, the current 
cfe/cfs discussion is really serious and heavy broken.

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-19 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087719#comment-13087719
 ] 

Robert Muir commented on LUCENE-3218:
-

{quote}
Maybe we can avoid making a separate _X.cfe file?
{quote}

+1, this sounds great (however it can be done, ideally with Marvin's idea to 
support appendable-only filesystems also), and would end the confusion here.

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087723#comment-13087723
 ] 

Uwe Schindler commented on LUCENE-3218:
---

bq. i think we should prevent the seek if not absolutely necessary.

You have a very small optimization here that only affects opening the CFS.

But because we need to fix the wrong behaviour in FileSwitch (and also 
NRTCaching dir, which is in my opinion more serious), FileSwitch and 
NRTCachingDir now use the default CompoundFileImpl. If you wrap MMapDir by 
FileSwitch or NRTCaching, the whole custom impl of the compound file in MMap 
that speeds up even further is obsolete, as not used (you can use the compound 
file with really no slowdown at all as we can map parts of the CFS file into 
memeory and need no offset calculations and can also save mapping costs).

This is gone now, just because a one-time seek at opening time is prevented.

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-19 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087724#comment-13087724
 ] 

Robert Muir commented on LUCENE-3218:
-

Right, the fix I applied is really a hack, but I didnt want to leave our 
codebase broken while we figure this out.

Its not just a problem from a performance perspective, I think its just bad to 
make assumptions about how the inner directory works.
In this case with fileswitchdirectory etc, it really should be fully delegating 
this stuff down, and be clueless about how its implemented by the underlying 
sub directory.

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-19 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087760#comment-13087760
 ] 

Michael McCandless commented on LUCENE-3218:


{quote}
Why not put that file pointer at the very end of the
file? So that the read-time sequence of actions is: seek to 8 bytes before the
end, read the file pointer, seek back to beginning of metadata.
{quote}

I would rather not rely on metadata (file length) when reading, only the 
contents of the file.

I think append-only filesystems (eg HDFS) can make their own impl that uses the 
file length instead (like AppendingCodecc).

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-19 Thread Andrzej Bialecki (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087963#comment-13087963
 ] 

Andrzej Bialecki  commented on LUCENE-3218:
---

bq. I think append-only filesystems (eg HDFS) can make their own impl that uses 
the file length instead (like AppendingCodecc).

AppendingCodec solves only one issue, that of postings and SegmentInfos. I'm 
worried that adding seek+rewrite tricks in other places that are not under the 
control of Codec or under any other configurable implementation (such as CFS) 
will ultimately prevent the efficient use of Lucene on Hadoop. Unless we put 
those places under the control of a Codec (or some other configurable 
interface).

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087972#comment-13087972
 ] 

Uwe Schindler commented on LUCENE-3218:
---

The trick with the latest updates to compound files is that the 
CompoundFileWriter/Reader is returned by the directory implementation - and 
this is broken and the discussion is about this.
So this would be the place, where you theoretically could completely make 
another CFS on-disk format or e.g. write the stuff to a ZIP file :-)

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-19 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087987#comment-13087987
 ] 

Simon Willnauer commented on LUCENE-3218:
-

personally I think we should try to be append only on general. So eventually 
this is about creating the cfe and cfs file from the right directory. What we 
could do to use the parent ie. FileSwitchDir etc. is add a protected method 
that allows passing the parent dir to the createCompoundOutput / 
openCompoundInput which is then in turn used to create the actual files. We can 
call this method from the public createCompoundOutput / openCompoundInput 
versions with this as the directory to create files. How does that sound? 
Lemme know if I miss something...

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-19 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087993#comment-13087993
 ] 

Robert Muir commented on LUCENE-3218:
-

I disagree, we don't need to compensate for hadoop's problems.


 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13087998#comment-13087998
 ] 

Uwe Schindler commented on LUCENE-3218:
---

If we want append only, we should also remove seek methods from IndexOutput... 
I DISAGREE, too!

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-19 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088005#comment-13088005
 ] 

Robert Muir commented on LUCENE-3218:
-

{quote}
So eventually this is about creating the cfe and cfs file from the right 
directory.
{quote}

That's not the only issue: while that is the primary reason I reopened this 
issue I also have concerns about the API being complicated and non-intuitive.

Making the API even more complicated because Filesystem X can only write 
WingDings or cannot seek doesn't seem to be a good solution to me.

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-19 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13088067#comment-13088067
 ] 

Uwe Schindler commented on LUCENE-3218:
---

When looking into the CompoundFileDirectory code I also found a small bug in 
version handling.
readEntries() reads the first VInt and uses it for version checking (if 
negative). This check has 2 problems:
- if the VInt is smaller then FORMAT_CURRENT it should throw 
IndexTooNewException
- the comparison should not be against FORMAT_CURRENT itsself (this constant 
should only be used for writing CFS files), it should compare against real 
version numbers. This would otherwise break on later additions of new formats.

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-08-18 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13086896#comment-13086896
 ] 

Robert Muir commented on LUCENE-3218:
-

Can CFS reading/writing not take a parent directory, instead of:

CompoundFileDirectory(Directory parent, )

I think it should be
CompoundFileDirectory(IndexInput cfs, IndexInput cfe)

And directory.createOutput etc should take *both* filenames, this would remove 
this backdooring completely.

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-06-22 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13053181#comment-13053181
 ] 

Robert Muir commented on LUCENE-3218:
-

Thanks Simon, I feel better now that we get our open-files-for-write tracking 
back.

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch, LUCENE-3218_3x.patch, LUCENE-3218_test_fix.patch, 
 LUCENE-3218_tests.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-06-21 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052597#comment-13052597
 ] 

Michael McCandless commented on LUCENE-3218:


Patch looks great!

Can we name it createCompoundOutput?  Emphasizes that we are
write-once (this file shouldn't exist), and matches createOutput.

On checkAbort... we could not send that to the CFW and instead call
checkAbort in the outer loops?  (Ie, where we .copy the files in).
The existing CFW already only checks once-per-file anyway...

Maybe instead of asserts for the mis-use of the CFD API (eg no
entries, something is still open), we should make these real
exceptions (ie, thrown even when assertions are off)?

This comment looks stale (in CFW.java)?:
{noformat}
  // Close the output stream. Set the os to null before trying to
  // close so that if an exception occurs during the close, the
  // finally clause below will not attempt to close the stream
  // the second time.
{noformat}

openCompoundOutput needs javadoc.

CFD.createOutput's jdoc says Not Implememented but it is.

The new test cases in TestCompoundFile names its file d.csf ;) Column
stride fields lives on!!  Too many tlas...


 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-06-21 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052633#comment-13052633
 ] 

Simon Willnauer commented on LUCENE-3218:
-

Committed in revision 1138063.
I will try to backport this to 3.x if possible

 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3218.patch, LUCENE-3218.patch, LUCENE-3218.patch, 
 LUCENE-3218.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3218) Make CFS appendable

2011-06-20 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1305#comment-1305
 ] 

Michael McCandless commented on LUCENE-3218:


Patch looks cool!

So the CFW will take the first output opened against it and let it write
directly into the actual CFS file, and then if another file is
opened while that first one is still open, the 2nd file will write to
separate file and then will copy in on close.  We may want to delegate
the separate files too?  So that on close they copy themselves into
the CFS and remove the original?  This way IW won't have to separately
create CFS in the end.

Somehow we need IW to add the biggest sub-file first...

s/compund/compound

CFW.close should assert currentOutput != null (and, if we delegate sep
entries, that they are also all closed)?

You might need to sync the CompoundFileWriter.this.currentOutput test
/ setting to null?  Though... Lucene is always single threaded in
writing files for the same segment, today anyway.

Can we make a separate createCompoundOutput?  (Ie, instaed of passing
OpenMode to openCompoundInput).  And: I'm assuming a given compound
output can only be opened once, appended to / separate files copied
into, closed and then never opened again for writing?  (Ie, still
write once at the file level).


 Make CFS appendable  
 -

 Key: LUCENE-3218
 URL: https://issues.apache.org/jira/browse/LUCENE-3218
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 4.0

 Attachments: LUCENE-3218.patch


 Currently CFS is created once all files are written during a flush / merge. 
 Once on disk the files are copied into the CFS format which is basically a 
 unnecessary for some of the files. We can at any time write at least one file 
 directly into the CFS which can save a reasonable amount of IO. For instance 
 stored fields could be written directly during indexing and during a Codec 
 Flush one of the written files can be appended directly. This optimization is 
 a nice sideeffect for lucene indexing itself but more important for DocValues 
 and LUCENE-3216 we could transparently pack per field files into a single 
 file only for docvalues without changing any code once LUCENE-3216 is 
 resolved.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org