subject:"\[jira\] \[Commented\] \(LUCENE\-2793\) Directory createOutput and openInput should take an IOContext"

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-07-08 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061836#comment-13061836
 ] 

Simon Willnauer commented on LUCENE-2793:
-

I fixed the two minor things from above, created two followup issues 
(LUCENE-3292  LUCENE-3293) for the remaining TODOs and will go ahead 
reintegrating the branch now.


 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793_final.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-07-07 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13061439#comment-13061439
 ] 

Michael McCandless commented on LUCENE-2793:


Looks good!  +1 to land it!

Just a few things:

  * Shouldn't WindowsDirectory also call BII.bufferSize(context) and
do the same Math.max it used to do?

  * Should VarGapTermsIndexReader should pass READONCE context down when it
opens/reads the FST?  Hmm, though, it should just replace the ctx
passed in, ie if we are merging vs reading we want to
differentiate.  Let's open separate issue for this and address
post merge?

  * Can you open an issue for this one: // TODO: context should be
part of the key used to cache that reader in the pool.?  This is
pretty important, else you can get NRT readers with too-large
buffer sizes because the readers had been opened for merging
first.

  * Extra space in SegmentInfo.java: IOContext.READONCE );


 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793_final.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-07-06 Thread Simon Willnauer (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060537#comment-13060537
]

Simon Willnauer commented on LUCENE-2793:
-

bq. Made the necessary changes and hopefully addressed all the nocommits.
varun, I still see lots of nocommits here. Would be good if you could address
them this week. You don't need to solve them but discuss them here with us. you
can do that in a patch and add your comments to the parts where you are not
sure how to resolve.

I would like to commit the patches this week so we can merge to trunk soonish.

Simon

Directory createOutput and openInput should take an IOContext
-

Key: LUCENE-2793
URL: https://issues.apache.org/jira/browse/LUCENE-2793
Project: Lucene - Java
Issue Type: Improvement
Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
Labels: gsoc2011, lucene-gsoc-11, mentor
Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch,
LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch,
LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch,
LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch,
LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch,
LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch,
LUCENE-2793.patch

Today for merging we pass down a larger readBufferSize than for searching
because we get better performance.
I think we should generalize this to a class (IOContext), which would hold
the buffer size, but then could hold other flags like DIRECT (bypass OS's
buffer cache), SEQUENTIAL, etc.
Then, we can make the DirectIOLinuxDirectory fully usable because we would
only use DIRECT/SEQUENTIAL during merging.
This will require fixing how IW pools readers, so that a reader opened for
merging is not then used for searching, and vice/versa. Really, it's only
all the open file handles that need to be different -- we could in theory
share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-07-06 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13060538#comment-13060538
 ] 

Michael McCandless commented on LUCENE-2793:


+1

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-07-04 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13059527#comment-13059527
 ] 

Michael McCandless commented on LUCENE-2793:


I think BufferedIndexInput doesn't need a set/getMergeBufferSize?  Ie, BII only 
knows its bufferSize, regardless of the context from its parent.

Otherwise I think your patch is good: today on trunk we hardwire the 4 KB 
buffer size for merges, which is the same thing your patch is doing; the only 
difference is the constant MERGE_BUFFER_SIZE has moved from IW to BII, and each 
Dir impl now has the if.  As a future improvement we can add a 
set/getMergeBufferSize to each Dir impl...

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-30 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057903#comment-13057903
 ] 

Simon Willnauer commented on LUCENE-2793:
-

Varun this patch looks great. I am about to commit it. Can you now work through 
the nocommits, fix them or post questions here?

simon

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-30 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058116#comment-13058116
 ] 

Michael McCandless commented on LUCENE-2793:


To address the nocommits about losing the larger buffer size during merging, 
should we add set/getMergeBufferSize and set/getDefaultBufferSize to those Dir 
impls that do buffering?  (And default to what they are today on trunk, I think 
1 KB and 4 KB?)

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-28 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056449#comment-13056449
 ] 

Simon Willnauer commented on LUCENE-2793:
-

Varun, patch looks great!

here are some comments:

* in LuceneTestCase#newIOContext(Random) your switch statement misses necessary 
break; statements no matter what you will always get a flush 
context. I think you should also randominze the numbers for numDocs and 
sizeInBytes. nobody should rely on these number they are just best effort.
* in  o.a.l.i.values.Bytes#getValues, 
o.a.l.i.StoredFieldsWriter#initFieldsWriter, 
o.a.l.i.SegmentMerger#mergeVectors, MemoryCodec#fieldsProducer, 
MockSingleIntIndexOutput#MockSingleIntIndexOutput you can remove the nocommit
* the public members in FlushInfo should be final
* the MergeInfo ctors javadoc should either describe the arguments or omit 
them. In this case I think you can simply omit them since they are pretty self 
explained. I think what we need on MergeInfo as well as on FlushInfo is a 
javadoc comment that says that the values are estimates not necessarily real / 
correct values.
* on private IOContext (Context context, MergeInfo mergeInfo ) I don't 
understand the assert why is there a context != Flush?
* in MockDirectoryWrapper there is a nocommit that says randomize the 
IOContext. Maybe we should put the newIOContext to _TestUtils and delegate to 
this from LuceneTestCase?

Overall I think we are really close here. Once those comments are fixed I will 
commit the patch to the branch and we go through the remaining nocommits. Once 
this is done we should close this and create a new issue to fix all the buffer 
sizes just like LUCENE-3248. 

good job varun

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-27 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13055649#comment-13055649
 ] 

Simon Willnauer commented on LUCENE-2793:
-

bq. Should IOContext and MergeInfo be in oal.store not .index?
+1

bq. I think SegmentMerger should receive an IOCtx from its caller, and
yeah I think we should pass the IOContext in via the ctor. Yet, for 
IW#addIndexes you can simply build a best effort IOContext like:
{code}
 for (IndexReader indexReader : readers) {
   numDocs += indexReader.numDocs();
 }
 final IOContext context = new IOContext(new MergeInfo(numDocs, -1, true, 
false));
}

bq. I think on flush IOContext should include num docs and estimated
+1 I think that is good no?

bq. Somehow, lucene/contrib/demo/data is deleted on the branch. We should check 
if anything else is missing!
oh man... I will check

you use new IOContext(Context.FLUSH) and new IOContext(Context.READ) in your 
patch but we have some static like IOContext.READ maybe we need FLUSH too?

for the tests I think we should start randomizing the IOContext. I think you 
should add a newIOContext(Random random) to LuceneTestCase and get the context 
from there in a unit test. At the end of the day we should see same behavior 
whatever context you pass in right?

simon






 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-24 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13054359#comment-13054359
 ] 

Michael McCandless commented on LUCENE-2793:


I took quick look @ the branch -- it's looking good!  Some small stuff:

  


  * Should IOContext and MergeInfo be in oal.store not .index?

  * I think SegmentMerger should receive an IOCtx from its caller, and
then apss that to all the IO ops it invokes?  But the code has a
nocommit about tripping an assert -- which one?

  * I think on flush IOContext should include num docs and estimated
segment size (we can roughly pull this from RAM used for the
segment), but we should include comment that this is only approx.

  * Somehow, lucene/contrib/demo/data is deleted on the branch.  We
should check if anything else is missing!


 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793-nrt.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-20 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051857#comment-13051857
 ] 

Simon Willnauer commented on LUCENE-2793:
-

I created a branch for this issue and follow up issues here: 
https://svn.apache.org/repos/asf/lucene/dev/branches/LUCENE2793/

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-19 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13051665#comment-13051665
 ] 

Simon Willnauer commented on LUCENE-2793:
-

hey varun thanks for the new patch, some comments:

* the nocommit in IW you should maybe add a second ctor to MergeInfo that takes 
arguments and then use something like this in IW: final IOContext context = new 
IOContext(new MergeInfo(info.docCount, info.sizeInBytes(true), true, false));
* It seems kind of odd to always prefix MergeInfo with OneMerge so maybe move 
it into its own file
* regarding the read buffer problem, can you simply use 
BufferedIndexInput.BUFFER_SIZE to initialize it and put a TODO / nocommit on top
* You should look into contrib/misc there are some compile errors in 
AppendingCodec as well as in solr land.
* I think in Directory the openInput method should be abstract:   public 
abstract IndexInput openInput(String name, IOContext context) throws 
IOException; and FSDirectory should not specify this method at all. Currently 
your code would produce a stack overflow since it calls itself. 

If I fix the nocommits in your patch test-core passes without problems. Looking 
good man we are getting closer!

Simon


 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-16 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13050444#comment-13050444
 ] 

Simon Willnauer commented on LUCENE-2793:
-

hey varun

patch looks close!

here are some comments:

* the assert context == Context.MERGE should be assert context != Context.MERGE 
|| mergeInfo != null;
* can you move that assert into IOContext(Context, MergeInfo) and let other 
related constructors call this(context, mergeInfo) instead of initializing all 
members themself?
* I think there should be a public static final IOContext READONCE = new 
IOContext(true); then you can make the corresponding constructor private. I 
think the context should be Context.READ instead of default in that case right?
* IOContext(MergePolicy.OneMerge) seems to be unnecessary. I think you should 
add a method to OneMerge to get a MergeInfo from it and only have a MergeInfo 
ctor. Then you can move MergeInfo into OneMerge too.
* PerFieldCodecWrapper still seems to be deleted
* In IndexReader IOContext context=null; should be IOContext context= new 
IOContext(READ); no?
*  no commit should be nocommit - we have a script on jenkins that checks this 
:)
* I still see some whitespace problems in SegmentWriteState.java 
* I think IOContext.DEFAULT_IOCONTEXT should be IOContext.DEFAULT since 
IOContext is implicit


I am waiting for you fixing the tests before I review further. Yet, what is 
missing is still the decision what buffer size to used down in direcotries etc.

good work so far!



 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-14 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13049293#comment-13049293
 ] 

Michael McCandless commented on LUCENE-2793:


LUCENE-3203 is another example where a Dir needs the IOContext so it can 
optionally rate limit the bytes/second if it's a merge.

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-12 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13048285#comment-13048285
 ] 

Michael McCandless commented on LUCENE-2793:


bq. Can OneMerge extend MergeInfo?

I think this is risky because it then makes IOContext an unexpectedly
heavy object, since OneMerge holds SegmentReaders used for merging.

There are already some places that save away an IOContext instance as
a member (to be used later), and if something ever does this for a MERGE
then we can unexpectedly hold keep a lot of garbage alive.

I think, instead, when the context is MERGE, we should make a new
(lightweight) MergeInfo instance that pulls the necessary details from
the OneMerge, ie fully decouples from the OneMerge instance.


 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-11 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047970#comment-13047970
 ] 

Simon Willnauer commented on LUCENE-2793:
-

hi varun,

looks good, some quick comments, 

* Can OneMerge extend MergeInfo?
* Can we introduce a static IOContext instance like public static final 
IOContext DEFAULT = new IOContext() and use this everywhere where we need to 
use the default. All those object creations are unnecessary.
* Can IOContext() call this(false) instead of manually setting the values
* I think IOContext should take MergeInfo instead of OneMerge if you extend 
MergeInfo in OneMerge you can just pass OneMerge and you are done.

I will have a closer look tomorrow

simon

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-10 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047234#comment-13047234
 ] 

Simon Willnauer commented on LUCENE-2793:
-

Varun, your patch doesn't apply cleanly to my latest trunk. i think you should 
update your local copy again!

I have trouble to understand how you use MergeInfo etc. I figure there might be 
a misunderstanding so here is what I had in mind roughly: 

{code}

Index: lucene/src/java/org/apache/lucene/index/MergePolicy.java
===
--- lucene/src/java/org/apache/lucene/index/MergePolicy.java(revision 
1134335)
+++ lucene/src/java/org/apache/lucene/index/MergePolicy.java(working copy)
@@ -64,7 +64,7 @@
*  subset of segments to be merged as well as whether the
*  new segment should use the compound file format. */
 
-  public static class OneMerge {
+  public static class OneMerge extends MergeInfo {
 
 SegmentInfo info;   // used by IndexWriter
 boolean optimize;   // used by IndexWriter
@@ -72,25 +72,26 @@
 long mergeGen;  // used by IndexWriter
 boolean isExternal; // used by IndexWriter
 int maxNumSegmentsOptimize; // used by IndexWriter
-public long estimatedMergeBytes;   // used by IndexWriter
 ListSegmentReader readers;// used by IndexWriter
 ListSegmentReader readerClones;   // used by IndexWriter
-public final ListSegmentInfo segments;
-public final int totalDocCount;
+public final ListSegmentInfo segments = new ArrayListSegmentInfo();
 boolean aborted;
 Throwable error;
 boolean paused;
 
 public OneMerge(ListSegmentInfo segments) {
+  super(getSegments(segments));
+}
+
+private static int getSegments(ListSegmentInfo segments) {
   if (0 == segments.size())
 throw new RuntimeException(segments must include at least one 
segment);
   // clone the list, as the in list may be based off original SegmentInfos 
and may be modified
-  this.segments = new ArrayListSegmentInfo(segments);
   int count = 0;
   for(SegmentInfo info : segments) {
 count += info.docCount;
   }
-  totalDocCount = count;
+  return count;
 }
 
 /** Record that an exception occurred while executing
{code}

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-09 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046403#comment-13046403
 ] 

Simon Willnauer commented on LUCENE-2793:
-

Hey varun,

here are some more comments for the latest complete patch:

* We should have a static instance for IOContext with Context.Other which you 
can use in BitVector / CheckIndex for instance Maybe IOContext#DEFAULT_CONTEXT
* It seems that we don't need to provide IOContext to FieldInfos and 
SegmentInfo since we are reading them into memory anyway. I think you can just 
use a default context here without changing the constructors. Same is true for 
SegmentInfos 
* This is unrelated to your patch but in PreFlexFields we should use 
IndexFileNames.segmentFileName(info.name, , PreFlexCodec.FREQ_EXTENSION) and 
IndexFileNames.segmentFileName(info.name, , PreFlexCodec.PROX_EXTENSION) 
instead of info.name + .frq  and info.name + .prx
* it seems that we should communicate the IOContext to the codec somehow. I 
suggest we put IOContext to SegmentWriteState and SegmentReadState that way we 
don't need to change the Codec interface and clutter it with internals. This 
would also fix mikes comment for FieldsConsumer etc.
* TermVectorsWriter is only used in Merges so maybe it should also get a 
Context.Merge for consistency?
* I really don't like OneMerge :) I think we should add an abstract class  
(maybe MergeInfo) that exposes the estimatedMergeBytes, totalDocCount for now.
* small typo in RamDirectory, there is a space missing after the second file 
here: dir.copy(this, file, file,context);
* SegmentReader should also use the static Default IOContext - make sure its 
used where needed :)

Regarding the IOContext class I think we should design for what we have right 
now and since SegementInfo is not used anywhere (as far as I can see) we should 
add it once we need it. OneMerge should not go in there but rather the 
interface / abstract class I talked about above. 

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-09 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046635#comment-13046635
 ] 

Michael McCandless commented on LUCENE-2793:


bq. It seems that we don't need to provide IOContext to FieldInfos and 
SegmentInfo since we are reading them into memory anyway. I think you can just 
use a default context here without changing the constructors. Same is true for 
SegmentInfo

I think we should pass down readOnce=true for these cases?  EG some
kind of caching dir (or something) would know not to bother caching
such files...

Same for del docs, terms index, doc values (well, sometimes), etc.

bq. it seems that we should communicate the IOContext to the codec somehow. I 
suggest we put IOContext to SegmentWriteState and SegmentReadState that way we 
don't need to change the Codec interface and clutter it with internals. This 
would also fix mikes comment for FieldsConsumer etc.

+1 that's great.

bq. I really don't like OneMerge  I think we should add an abstract class 
(maybe MergeInfo) that exposes the estimatedMergeBytes, totalDocCount for now.

If we can't include OneMerge, and I agree it'd be nice not to, I think
we should try hard to pull stuff out of OneMerge that may be of
interest to a Dir impl?  Maybe:

  * estimatedTotalSegmentSizeBytes

  * docCount

  * optimize/expungeDeletes

  * isExternal (so Dir can know if this is addIndexes vs normal merging)

bq. Regarding the IOContext class I think we should design for what we have 
right now and since SegementInfo is not used anywhere (as far as I can see) we 
should add it once we need it. OneMerge should not go in there but rather the 
interface / abstract class I talked about above.

I agree, let's wait until we have a need.

In fact... SegmentInfo for flush won't work: we go and open all files
for flushing, write to them, close them, and only then do we make the
SegmentInfo.

So it seems like we should also have some abtracted stuff about the
to-be-flushed segment?  Maybe for starters the
estimatedSegmentSizeBytes?  EG, NRTCachingDir could use this to decide
whether to cache the new segment (today it fragile-ly relies on the
app to open new NRT reader frequently enough).


 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-07 Thread Simon Willnauer (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045322#comment-13045322
]

Simon Willnauer commented on LUCENE-2793:
-

Very very quick review:

* I think OneMerge should not be required for createing a IOContext we maybe
should add a default ctor.
* IOContext.Other is confusing I think. If a IOContext doesn't make lots of
sense somewhere we should not need to pass it in. Can't we simply have
overloaded methods? maybe I just don't like the name, maybe use DEFAULT?
* IOContext seem to pretty straight forward you either read or write but it
seems to be confused with high level operations like Merge and Flush. Either
with go on a high level or we only have read and write here. Since read / write
is implicit (you either pull input or output) we should make this high-level
only. So maybe we have Query or Search instead of Read here? Maybe it makes
sense to specify stuff like Consume or Sequential here too some high level APIs
define sequential access so I think it does not conflict?

Directory createOutput and openInput should take an IOContext
-

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-07 Thread Michael McCandless (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045354#comment-13045354
]

Michael McCandless commented on LUCENE-2793:

Patch looks great! Comments:

* I think IOCtx should have ctor taking only OneMerge, which would
set the OneMerge and set the context as Merge?

* Likewise, a ctor taking a SegmentInfo to mean context = Flush?

* And finally a default ctor that maps to Other (or Default or
Unknown or Unspecified or something)

* You don't need to create the IOCtx in IW.maybeMerge?

* Maybe we want a readOnce boolean in IOCtx? When we read del
docs, norms, terms index, doc values, segments file /
segments.gen, we would set this? (And UnixDir would send eg
NO_REUSE down to the OS).

* I think we'll need a NativeMMapDir as well as NativeDir (or
NativeUnix/WindowsDir), because mmap can also take flags giving
hints about access patterns. I'll open a new issue...

* Why does SegmentCoreReaders hang onto the IOCtx? Seems like
classes shouldn't hang onto it... (also: PreFlexFields).

* Hmm does createOutput even need an IOCtx...? What would a dir
do with this? I suppose if it's a merge and we had io
prioritization (someday) we could set lower prio... OK let's keep
it.

* I think Codec.fieldsProducer/Consumer should take an IOCtx?

* Still need to fix IW's ReaderPool to key off of IOCtx.Context plus
the info. Maybe put a // nocommit in there so we remember...?
Eg, where you commented out the readBufferSize = ... inside
ReaderPool.get is a good place.

Directory createOutput and openInput should take an IOContext
-

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-07 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045455#comment-13045455
 ] 

Michael McCandless commented on LUCENE-2793:


{quote}
I think we'll need a NativeMMapDir as well as NativeDir (or
NativeUnix/WindowsDir), because mmap can also take flags giving
hints about access patterns. I'll open a new issue...
{quote}

I opened LUCENE-3178.

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Varun Thacker
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-06-02 Thread Michael McCandless (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042824#comment-13042824
]

Michael McCandless commented on LUCENE-2793:

Patch is looking good! Some feedback:

* I think CompoundFileReader should not store the IOCtx passed to
the ctor? It should just fwd to the dir.openInput...? It used to
have to store the readBufferSize in case openInput was called w/o
a readBufferSize, but we now require IOCtx to openInput.

* Is IOCtx never allowed to be null? (I think so?). In which case
we should jdoc this, and Dir impls can rely on it.

* I don't think NRTCachingDir.unCache should take an IOCtx?

* I think NRTCachineDir.doCacheWrite should take the IOCtx, and, we
should remove all the merge thread hacks that it now does.

* The super(bufferSize) call is needed in SimpleFSDir (and others).
I think what must happen here is SimpleFSDir look @ the IOCtx and
decides what buffer size to use (the logic in IW should be moved
down into here), with getters/setters to be able to change
read/write buffer size for merging vs reading/flushing.

* IOContext.Context should also have Flush? And maybe rename Search
- Read?

* IndexWriter ctor shouldn't take an incoming IOCtx; instead,
IndexWriter should create a new IOCtx each time it opens reader /
flushes segments / does merging.

* The ReaderPool.get* methods (inside IndexWriter) should take an
IOCtx, and the .context from that IOCtx should be part of the key
used to cache that reader in the pool.

Directory createOutput and openInput should take an IOContext
-

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-05-31 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041560#comment-13041560
 ] 

Simon Willnauer commented on LUCENE-2793:
-

bq. Why not use a normal struct/bean like class here?

What I had in mind here is a more sophisticated IOContext. I'd like to see some 
default IOCtx for Merge, Search, Write etc. that folks can simply modify if 
needed. For that to work it should be immutable. The question here is if we go 
the high level way like you suggested and make the most of the properties 
private to the dir (like buffersize etc) or is we should make it available to 
the user of the API. Some Dirs might have different defaults than others 
though. 

I can see why this should be private to the directory and it might make more 
sense to simply indicate what the input / output is used for and let the 
directory figure out the details. So yeah +1 to keep it high level. If bean or 
copy-on-write builder I need to see what properties should be on it. so lets 
keep that for later.


{quote}Ideally, I would also like to see details about the
merge/reader/writer context, eg for merging I'd like to see the
OneMerge instance, for Reader/Writer maybe a SegmentInfo instance?
{quote}

I really don't like the idea of exploiting OneMerge to the Directory. We might 
introduce some other internface that holds enough metadata to get the info and 
let OneMerge implement this but we should not pass it down to the dir. Same is 
true for SI I don't want directory mess around with those classes. 


 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Simon Willnauer
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-05-31 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041617#comment-13041617
 ] 

Michael McCandless commented on LUCENE-2793:


bq. If bean or copy-on-write builder I need to see what properties should be on 
it. so lets keep that for later.

OK let's defer this decision for now.

I just see builder pattern being too much of a hammer in search of
nails...

{quote}
I really don't like the idea of exploiting OneMerge to the Directory. We might 
introduce some other internface that holds enough metadata to get the info and 
let OneMerge implement this but we should not pass it down to the dir. Same is 
true for SI I don't want directory mess around with those classes.
{quote}

I agree, it's spooky letting such a low level class (Dir) have access
to high level stuff (OneMerge, SegmentInfo)... but it's really the
only way to ensure we don't miss useful context about this file?

Ie these high level classes already encompass the full context for a
given file.

So, if we really don't want to include these classes, and instead pick 
choose what digested properties to include... then we risk some Dir
impl not having access to something interesting because we forgot to
include it, which means they'll have to hack something up, like
NRTCachingDir does now in coupling threads to OneMerge instances.

But, maybe the restrictions won't be so bad in practice... if we can
try to brainstorm what Dirs may want to know about the full context
and include that in IOCtx.

For example, just by adding estimatedFullSegmentSizeBytes, for both
Merge and Write (hmm: maybe Flush?), NRTCachingDir can cutover to
IOCtx.


 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Simon Willnauer
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-05-31 Thread Chris Male (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041642#comment-13041642
]

Chris Male commented on LUCENE-2793:

{quote}
So, if we really don't want to include these classes, and instead pick
choose what digested properties to include... then we risk some Dir
impl not having access to something interesting because we forgot to
include it, which means they'll have to hack something up, like
NRTCachingDir does now in coupling threads to OneMerge instances.

But, maybe the restrictions won't be so bad in practice... if we can
try to brainstorm what Dirs may want to know about the full context
and include that in IOCtx.
{quote}

I think the alternatives to this are definitely worse. I've been trying to
think of some sort of intermediary component that could decouple the interests
of Directories from the sources of that info, but it just seems to make
everything way more complicated. I agree it seems best to stick to what Dirs
want to know, present that as the IOCtx, and then work out the sources of that
information.

Directory createOutput and openInput should take an IOContext
-

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-05-30 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041224#comment-13041224
 ] 

Michael McCandless commented on LUCENE-2793:


Great to see your patch here Varun!

I think we should start with only high level details in the IOContext?
Ie, the Merge/Search(Reader?)/Writer is great, but I think low level
stuff (bufferSize, sequential/direct) should stay private to the Dir
impl?

Ideally, I would also like to see details about the
merge/reader/writer context, eg for merging I'd like to see the
OneMerge instance, for Reader/Writer maybe a SegmentInfo instance?

This would then make Dir impls like NRTCachingDirectory (LUCENE-3092)
clean (vs the sneaky ConcurrentMergeScheduler entangling it now must
do), though, in that particular case we could accomplish this by only
adding an estimatedSegmentSizeBytes to the IOCtx.

We should remove the bufferSize that now plumbs all up and down
the APIs, and replace it with IOCtx?


 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Simon Willnauer
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-05-30 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041225#comment-13041225
 ] 

Michael McCandless commented on LUCENE-2793:


bq. Some kind of a copy-on-write builder pattern

Why not use a normal struct/bean like class here?

bq. I wonder if it makes sense to bind the IO Context instance to the directory 
and add a factory to the directory like 
Directory#getIOContext(Merge|Search|Index) to enable the directory impl to set 
platform specific defaults

Hmm, but, if the IOCtxt is 'high level', and then details (bufferSize, 
seq/direct flags, etc.) are private to the dir impl, I think we (Lucene) can 
create the IOCtx and pass it down?  Ie this is a one-way communication, I think 
(Lucene - Dir impl).

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Simon Willnauer
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-05-30 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13041248#comment-13041248
 ] 

Michael McCandless commented on LUCENE-2793:


bq. I removed bufferSize from all the Directory implementations and made it 
part of IOContext.

I think it shouldn't even be part of IOCtx?  Ie this is private to the Dir impl?

Another thing we have to do is add IOCtx into the cache key used in 
IndexWriter's ReaderPool inner class.  Ie, a reader opened for merging cannot 
be later shared with a reader opened for searching.

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Simon Willnauer
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch, LUCENE-2793.patch, 
 LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-05-29 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13040845#comment-13040845
 ] 

Jason Rutherglen commented on LUCENE-2793:
--

I already posted a patch to this issue a while back, 
https://issues.apache.org/jira/secure/attachment/12468030/LUCENE-2793.patch  It 
seems we're looping here.

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Simon Willnauer
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-05-29 Thread Varun Thacker (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13040850#comment-13040850
]

Varun Thacker commented on LUCENE-2793:
---

I should have seen the patch! I have taken this up as a GSoC project. I'll try
to use that patch too if it's ok.

Directory createOutput and openInput should take an IOContext
-

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-05-29 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13040906#comment-13040906
 ] 

Simon Willnauer commented on LUCENE-2793:
-

bq. I already posted a patch to this issue a while back...

Jason I appreciate that you are still around on this issue. Varun is doing his 
GSoC project on this issue and others so there might be some duplication here 
and there or similarities with you patch but in this case this is ok though. He 
needs to get started as well as getting a feeling how things work here. So I 
hope you don't mind. I can only encourage you to help reviewing and commenting!


bq. If this is the right way to go ?

Thanks for the patch man! Good to get started eventually. here are some 
comments for you patch:

* When I look at IOContext I think it should be a little more sophisticated. 
What I have in mind is something similar to ScorerContext in 
org.apache.lucene.search.Weight.java. Some kind of a copy-on-write builder 
pattern that lets us provide some defaults for Merging, Searching and Indexing 
so by default we can reuse the same instance in many places. If we follow a 
this copy on write model we could also make this class non-final to let people 
customize the context if they needs to.

* I am not sure if we really need the enumeration in IO Context unless me make 
decisions based on what an input / output is used for. IMO it might make more 
sense to have default instances for Searching Indexing and Merging and set the 
flags like SEQUENTIAL and the buffers to good defaults and only tweak them when 
we are aware of the context ie. when we pull the input / output. The most of 
the usecases need good defaults and only some need tweaks but if we hold the 
use-case information on IOContext some directories might want to be very smart 
and this might duplicate code. 
* I wonder if it makes sense to bind the IO Context instance to the directory 
and add a factory to the directory like 
Directory#getIOContext(Merge|Search|Index) to enable the directory impl to set 
platform specific defaults. I think that would make things easier and 
customization would be straight forward.
* You should add a space after a comma in the arguments list like here: int 
bufferSize,IOContext context) 
* Enum elements should be CamelCase and start with a leading Uppercase character




 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Reporter: Michael McCandless
Assignee: Simon Willnauer
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch, LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-05-13 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032992#comment-13032992
 ] 

Michael McCandless commented on LUCENE-2793:


For LUCENE-3092, it would be nice if the IOContext would optionally include the 
OneMerge if in fact the file is being opened for merging... this lets the Dir 
impl be smart if it wants to customize its behavior according to overall 
properties of the merge (eg total merged bytes).

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Reporter: Michael McCandless
Assignee: Simon Willnauer
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Attachments: LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-04-05 Thread Varun Thacker (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13015785#comment-13015785
]

Varun Thacker commented on LUCENE-2793:
---

Hi. I would be interested in taking this up as a GSOC project . Are there any
resource that I can read to understand the problem in depth ?

Directory createOutput and openInput should take an IOContext
-

Key: LUCENE-2793
URL: https://issues.apache.org/jira/browse/LUCENE-2793
Project: Lucene - Java
Issue Type: Improvement
Components: Store
Reporter: Michael McCandless
Attachments: LUCENE-2793.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-04-05 Thread Simon Willnauer (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13015790#comment-13015790
]

Simon Willnauer commented on LUCENE-2793:
-

bq. Hi. I would be interested in taking this up as a GSOC project . Are there
any resource that I can read to understand the problem in depth ?
Hey, I just marked it as GSoC-able :) so you are free to take it. I would also
volunteer to mentor on this issue if mike doesn't want to take it though. Let
me try to explain you quickly what this issue is about:

Lucene uses Directory as an abstraction on top of the filesystem or RAM or any
other storage to write the index data. Yet, Lucene has different stages where
we have different requirements to the underlying storage / directory. When you
index documents you eventually flush the index to disk and continue indexing
until you created enough segments on disk that you need to merged them. This
operations should if possible not pollute the FS cache since its really just
housekeeping. With Java such its currently not possible to use some flags like
DIRECT / SEQUENTIAL, this is what we have the native DirectIODirectory
implementations in contrib for. Yet, for reading stuff from the index while
searching we want to have the FS cache helping us as much as possible so this
has again different requriements. What we currently do is that we pass
different read buffer sizes to the directory to improve performance. All those
kinds of information should be passed to the directory on a IndexInput /
IndexOutput (similar to Input and OutputStream just tailored enhanced for
Lucene) basis and this is what this issue is about.

hope that helps.

Directory createOutput and openInput should take an IOContext
-

Key: LUCENE-2793
URL: https://issues.apache.org/jira/browse/LUCENE-2793
Project: Lucene - Java
Issue Type: Improvement
Components: Store
Reporter: Michael McCandless
Labels: gsoc2011, lucene-gsoc-11, mentor
Attachments: LUCENE-2793.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-12 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980625#action_12980625
 ] 

Shai Erera commented on LUCENE-2793:


bq. No you dont... you can also call the .setBufferSize like the skiplist 
reader does.

I'd still need to impl my Directory though right? That's the overkill I'm 
trying to avoid.

But, I've been thinking about how would a merge/search code, which can only 
tell the Directory it's in SEARCH / MERGE context, get the buffer size the 
application wanted to use in that context. I don't think it has a way to do so 
without either using a static class, something we try to avoid, or propagating 
those settings down everywhere, which does not make sense either.

So, Robert, I think you're right -- bufferSize should not exist on IOContext. A 
custom Directory impl seems unavoided. So I think it'd be good if we can create 
a BufferedDirectoryWrapper which wraps a Directory and offers a 
BufferedIndexInput/OutputWrapper which delegate the necessary calls to the 
wrapped Directory. We've found ourselves needing to implement that kind of 
Directory several times already, and it'd be good if Lucene can offer one.

That Directory would then take IOContext - bufferSize map/properties/whatever 
and can take that into account in openInput/createOutput.

If users will need to impl Directory wrapping, if they want to control buffer 
sizes, I suggest we make that as painless as possible.

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Reporter: Michael McCandless
 Attachments: LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-12 Thread Earwin Burrfoot (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980649#action_12980649
]

Earwin Burrfoot commented on LUCENE-2793:
-

What's with ongoing crazyness? :)

bq. DirectIOLinuxDirectory
First you introduce a kind of directory that is utterly useless except certain
special situations. Then, instead of fixing the directory/folding its code
somewhere normal, you try to workaround by switching between directories.
What's the point of using abstract classes or interfaces, if you leak their
implementation's logic all over the place?
Or making DIOLD wrap something. Yeah! Wrap my RAMDir!

bq. bufferSize
This value is only meaningful to a certain subset of Directory implementations.
So the only logical place we want to see this value set - is these very impls.
Sample code:
{code}
Directory ramDir = new RAMDirectory();
ramDir.createIndexInput(name, context);
// See, ma? No bufferSizes, they are pointless for RAMDir

Directory fsDir = new NIOFSDirectory();
fsDir.setBufferSize(IOContext.NORMAL_READ, 1024);
fsDir.setBufferSize(IOContext.MERGE, 4096);
fsDir.createIndexInput(name, context)
// See, ma? The only one who's really concerned with 'actual' buffer size is
this concrete Directory impl
// All client code is only concerned with the context.
// It's NIOFSDirectory's business to give meaningful interpretation for
IOContext and assign the buffer sizes.
{code}

You don't need custom Directory impls to make DIOLD work, you should freakin'
fix it.
The proper way is to test out the things, and then move DirectIO code to the
only place it makes sense in - FSDir? Probably make it switch on/off-able,
maybe not.

You don't need custom Directory impls to set buffer sizes (neither cast to
BufferedIndexInput!), you should add the setting to these Directories, which
make sense of it.

Directory createOutput and openInput should take an IOContext
-

Key: LUCENE-2793
URL: https://issues.apache.org/jira/browse/LUCENE-2793
Project: Lucene - Java
Issue Type: Improvement
Components: Store
Reporter: Michael McCandless
Attachments: LUCENE-2793.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-12 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980727#action_12980727
]

Shai Erera commented on LUCENE-2793:

I assume you mean setBufferSize(IOContext, size) should be added to specific
Directory impls, and not Directory? Because in your example code above, it
looks like it's added to Directory itself. Though we can add it to Directory as
well, and do nothing there. It simplifies matters as you don't need to check
whether the Dir you receive supports setting buffer size (in case you're not
the one creating it).

At any rate, this looks like it can work too.

Directory createOutput and openInput should take an IOContext
-

Key: LUCENE-2793
URL: https://issues.apache.org/jira/browse/LUCENE-2793
Project: Lucene - Java
Issue Type: Improvement
Components: Store
Reporter: Michael McCandless
Attachments: LUCENE-2793.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-12 Thread Robert Muir (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980731#action_12980731
]

Robert Muir commented on LUCENE-2793:
-

bq. The proper way is to test out the things, and then move DirectIO code to
the only place it makes sense in - FSDir? Probably make it switch on/off-able,
maybe not.

I'm not sure it should be there... at least not soon. its not even something
you can implement in pure java?
we definitely have to keep it still simple and possible for people to use the
java library in a platform-indepedent way.
its also a bit dangerous, whenever JNI is involvedeven if its working.

So I think its craziness, to put this direct-io stuff in fsdirectory itself.

As I said before though, i wouldn't mind if we had something more like a
'modules/native' and FSDirectory checked, if this was available and
automagically used it...
but I can't see myself thinking that we should put this logic into fsdir
itself, sorry.

bq. Sample code

My problem with your sample code is that it appears that the .setBufferSize
method is on Directory itself.
Again i disagree with this because:
* its useless to certain directories like MMapDirectory
* its dangerous in the direct-io case (different platforms have strict
requirements that things be sector-aligned etc, see the mac case where it
actually 'works' if the buffer isnt, but is just slow).

I definitely don't like the confusion regarding buffersizes now. A very small %
of the time its actually meaningful and should be respected,
but most of the time the value is completely bogus.

Directory createOutput and openInput should take an IOContext
-

Key: LUCENE-2793
URL: https://issues.apache.org/jira/browse/LUCENE-2793
Project: Lucene - Java
Issue Type: Improvement
Components: Store
Reporter: Michael McCandless
Attachments: LUCENE-2793.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-12 Thread Earwin Burrfoot (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980732#action_12980732
 ] 

Earwin Burrfoot commented on LUCENE-2793:
-

bq. Because in your example code above, it looks like it's added to Directory 
itself.
bq. My problem with your sample code is that it appears that the .setBufferSize 
method is on Directory itself. 

Ohoho. My fault, sorry. It should look like:
{code}
RAMDirectory ramDir = new RAMDirectory();
ramDir.setBufferSize(whatever) // Compilation error!
ramDir.createIndexInput(name, context);

NIOFSDirectory fsDir = new NIOFSDirectory();
fsDir.setBufferSize(IOContext.NORMAL_READ, 1024);
fsDir.setBufferSize(IOContext.MERGE, 4096);
fsDir.createIndexInput(name, context)
{code}

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Reporter: Michael McCandless
 Attachments: LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-12 Thread Earwin Burrfoot (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980736#action_12980736
]

Earwin Burrfoot commented on LUCENE-2793:
-

{quote}
As I said before though, i wouldn't mind if we had something more like a
'modules/native' and FSDirectory checked, if this was available and
automagically used it...
but I can't see myself thinking that we should put this logic into fsdir
itself, sorry.
{quote}
I'm perfectly OK with that approach (having some module FSDir checks). I also
feel uneasy having JNI in core.
What I don't want to see, is Directory impls that you can't use on their own.
If you can only use it for merging, then it's not a Directory, it breaks the
contract! - move the code elsewhere.

Directory createOutput and openInput should take an IOContext
-

Key: LUCENE-2793
URL: https://issues.apache.org/jira/browse/LUCENE-2793
Project: Lucene - Java
Issue Type: Improvement
Components: Store
Reporter: Michael McCandless
Attachments: LUCENE-2793.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-12 Thread Robert Muir (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980740#action_12980740
]

Robert Muir commented on LUCENE-2793:
-

{quote}
I'm perfectly OK with that approach (having some module FSDir checks). I also
feel uneasy having JNI in core.
What I don't want to see, is Directory impls that you can't use on their own.
If you can only use it for merging, then it's not a Directory, it breaks the
contract! - move the code elsewhere.
{quote}

Right, i think we all agree we want to fix the DirectIOLinuxDirectory into
being a 'real' directory?

As i said before, from a practical perspective, it could be named
LinuxDirectory, extend NIOFS,
and when openInput(IOContext=Merge) it opens its special input. but personally
i don't care how
we actually implement it 'becoming a real directory'. this is another issue,
unrelated to this one really.

this issue is enough and should stand on its own... we should be able to do
enough nice things
here without dealing with JNI: improving our existing directory impls to use
larger buffer sizes by
default when merging, etc (like in your example).

Directory createOutput and openInput should take an IOContext
-

Key: LUCENE-2793
URL: https://issues.apache.org/jira/browse/LUCENE-2793
Project: Lucene - Java
Issue Type: Improvement
Components: Store
Reporter: Michael McCandless
Attachments: LUCENE-2793.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-11 Thread Earwin Burrfoot (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980400#action_12980400
]

Earwin Burrfoot commented on LUCENE-2793:
-

Looks crazy. In a -bad- tangled way.
You get IOFactory from Directory, put into IOContext, and then invoke it,
passing it (wow!) an IOContext and a Directory. What if you pass totally
different Directory? Different IOContext? It blows up eerily.

And there's no justification for this - we already have an IOFactory, it's
called Directory! It just needs an extra parameter on its factory methods
(createInput/Output), that's all.

Directory createOutput and openInput should take an IOContext
-

Key: LUCENE-2793
URL: https://issues.apache.org/jira/browse/LUCENE-2793
Project: Lucene - Java
Issue Type: Improvement
Components: Store
Reporter: Michael McCandless
Attachments: LUCENE-2793.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-11 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980435#action_12980435
 ] 

Jason Rutherglen commented on LUCENE-2793:
--

bq. You get IOFactory from Directory

That's for the default, the main use is the static IOFactory class.

bq. we already have an IOFactory, it's called Directory

Right, however we're basically trying to intermix Directory's, which doesn't 
work when pointed at the same underlying File.  I thought about a 
meta-Directory that routes based on the IOContext, however we'd still need a 
way to create an IndexInput and IndexOutput, from different Directory 
implementations.  

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Reporter: Michael McCandless
 Attachments: LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-11 Thread Earwin Burrfoot (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980454#action_12980454
]

Earwin Burrfoot commented on LUCENE-2793:
-

{quote}
bq. You get IOFactory from Directory
That's for the default, the main use is the static IOFactory class.
{quote}
You lost me here. If you got A from B, you don't have to pass B again to invoke
A, if you do - that's 99% a design mistake.
But still, my point was that you don't need IOFactory at all.

bq. Right, however we're basically trying to intermix Directory's, which
doesn't work when pointed at the same underlying File. I thought about a
meta-Directory that routes based on the IOContext, however we'd still need a
way to create an IndexInput and IndexOutput, from different Directory
implementations.
What Directories are you trying to intermix? What for?

I thought the only thing done in that issue is an attempt to give Directory
hints as to why we're going to open its streams.
A simple enum IOContext and extra parameter on createOutput/Input would
suffice. But with Lucene's micromanagement attitude, an enum turns into
slightly more complex thing, with bufferSizes and whatnot.
Still - no need for mixing Directories.

Directory createOutput and openInput should take an IOContext
-

Key: LUCENE-2793
URL: https://issues.apache.org/jira/browse/LUCENE-2793
Project: Lucene - Java
Issue Type: Improvement
Components: Store
Reporter: Michael McCandless
Attachments: LUCENE-2793.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-11 Thread Earwin Burrfoot (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980458#action_12980458
]

Earwin Burrfoot commented on LUCENE-2793:
-

In fact, I suggest dropping bufferSize altogether. As far as I can recall, it
was introduced as a precursor to IOContext and can now be safely replaced.

Even if we want to give user control over buffer size for all streams, or only
those opened in specific IOContext, he can pass these numbers as config
parameters to his Directory impl.
That makes total sense, as:
1. IndexWriter/IndexReader couldn't care less about buffer sizes, it just
passes them to the Directory. It's not their concern.
2. A bunch of Directories doesn't use said bufferSize at all, making this
parameter not only private Directory affairs, but even further -
implementation-specific.

So my bet is - introduce IOContext as a simple Enum, change bufferSize
parameter on createInput/Output to IOContext, done.

Directory createOutput and openInput should take an IOContext
-

Key: LUCENE-2793
URL: https://issues.apache.org/jira/browse/LUCENE-2793
Project: Lucene - Java
Issue Type: Improvement
Components: Store
Reporter: Michael McCandless
Attachments: LUCENE-2793.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-11 Thread Robert Muir (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980470#action_12980470
]

Robert Muir commented on LUCENE-2793:
-

bq. In fact, I suggest dropping bufferSize altogether.

https://issues.apache.org/jira/browse/LUCENE-2793?focusedCommentId=12966963page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12966963

bq. So my bet is - introduce IOContext as a simple Enum, change bufferSize
parameter on createInput/Output to IOContext, done.

I agree, its the directories job to then take that IOContext and create an
appropriate IndexOutput.

Directory createOutput and openInput should take an IOContext
-

Key: LUCENE-2793
URL: https://issues.apache.org/jira/browse/LUCENE-2793
Project: Lucene - Java
Issue Type: Improvement
Components: Store
Reporter: Michael McCandless
Attachments: LUCENE-2793.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-11 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980533#action_12980533
]

Jason Rutherglen commented on LUCENE-2793:
--

Ok, lets remove bufferSize.

bq. its the directories job to then take that IOContext and create an
appropriate IndexOutput.

That's how the patch works. If that doesn't look good, the other option is
placing the logic in FSDirectory and removing the FSDirectory subclasses?

Directory createOutput and openInput should take an IOContext
-

Key: LUCENE-2793
URL: https://issues.apache.org/jira/browse/LUCENE-2793
Project: Lucene - Java
Issue Type: Improvement
Components: Store
Reporter: Michael McCandless
Attachments: LUCENE-2793.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-11 Thread Robert Muir (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980543#action_12980543
]

Robert Muir commented on LUCENE-2793:
-

bq. That's how the patch works. If that doesn't look good, the other option is
placing the logic in FSDirectory and removing the FSDirectory subclasses?

We don't need to make any changes to Directory itself here, other than
openInput/createOutput taking IOContext instead of bufferSize.
The directory is the factory for opening the correct openInput/createOutput
based on its parameters, this just needs to be one.

As Mike said, the important thing is to not re-use an input that was opened
with IOContext=Merging for search.
If you ask the directory to openInput with IOContext=merging, thats all it
should be used for, because that Directory
could be implementing that with O_DIRECT or similar, which is slow for
searching.

we can muck with DirectIOLinuxDirectory later on another issue (e.g. make it a
directory wrapper that just overrides
openInput and only does something interesting when IOContext=Merging, or
NIOFSDirectory subclass, or whatever
we end up deciding).

Directory createOutput and openInput should take an IOContext
-

Key: LUCENE-2793
URL: https://issues.apache.org/jira/browse/LUCENE-2793
Project: Lucene - Java
Issue Type: Improvement
Components: Store
Reporter: Michael McCandless
Attachments: LUCENE-2793.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-11 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980546#action_12980546
]

Jason Rutherglen commented on LUCENE-2793:
--

bq. The directory is the factory for opening the correct openInput/createOutput
based on its parameters, this just needs to be one.

Right, however which Directory impl is going to do this? Today they're tied to
their specific implementations of IndexInputs and IndexOutputs, eg,
NIOFSIndexInput, DirectIOLinuxIndexInput, etc. In order to get an
NIOFSIndexInput we need to instantiate NIOFSDirectory, however if we also want
to use DirectIOLinuxDirectory, then (in today's model) we need to instantiate
it as well. To create an index or output, all we need is a parent FSDIrectory
and the rest is encapsulated in the underlying input/output implementation.

Directory createOutput and openInput should take an IOContext
-

Key: LUCENE-2793
URL: https://issues.apache.org/jira/browse/LUCENE-2793
Project: Lucene - Java
Issue Type: Improvement
Components: Store
Reporter: Michael McCandless
Attachments: LUCENE-2793.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-11 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980551#action_12980551
 ] 

Shai Erera commented on LUCENE-2793:


If we drop bufferSize, how will I be able to tell Lucene I'm willing to spare, 
say, 1 MB buffer for this IndexInput/Output? Am I supposed to create my own 
Directory and open II/IO, depending on the IOContext.Mode and decide on the 
buffer size then?

If so, then I don't understand how this would work (think that's what Jason is 
asking) - if I impl MyDirectory (extending Directory? FSDirectory?) which is 
probably going to be a wrapper Directory, what II/IO impl should I invoke? I'll 
need to extend BufferedIndexInput/Output, impl its abstract methods, just for 
delegating to the wrapped Directory's II/IO?

If I misunderstood the intentions, I'd appreciate if you can clarify them. But 
if not, I think IOContext should include a bufferSize hint. If the Directory 
does not intend to do anything special for 'merging', then it can take 
bufferSize into account. If however it's a Linux Directory that wants to set 
the O_DIRECT, then it can ignore bufferSize. But I think it's a useful hint.

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Reporter: Michael McCandless
 Attachments: LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-11 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980555#action_12980555
 ] 

Jason Rutherglen commented on LUCENE-2793:
--

In the patch the bufferSize is an int member of IOContext.  We could feasibly 
go to a map of attributes model if we think there'll be a lot more options in 
the future?  Or the IOContext class can be subclassed.

Also, in the patch the IOContext for searching and merging is set in IWC, then 
should be reused.  This way the bufferSize is settable by the user, which I 
don't think is easily possible today (perhaps the main motivation for this 
issue?).

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Reporter: Michael McCandless
 Attachments: LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-11 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980557#action_12980557
 ] 

Robert Muir commented on LUCENE-2793:
-

bq. If we drop bufferSize, how will I be able to tell Lucene I'm willing to 
spare, say, 1 MB buffer for this IndexInput/Output? Am I supposed to create my 
own Directory and open II/IO, depending on the IOContext.Mode and decide on the 
buffer size then?

with .setBufferSize. The only place in lucene that intelligently uses 
buffersize (Skiplist reading) sets it this way, not with the ctor param.

otherwise, i think you are confused if you think lucene actually passes any 
intelligent value for this ctor parameter.
it doesn't... it usually passes the static BufferedIndexInput.BUFFER_SIZE

bq. If so, then I don't understand how this would work (think that's what Jason 
is asking) - if I impl MyDirectory (extending Directory? FSDirectory?) which is 
probably going to be a wrapper Directory, what II/IO impl should I invoke? I'll 
need to extend BufferedIndexInput/Output, impl its abstract methods, just for 
delegating to the wrapped Directory's II/IO?

I don't understand whats so terribly difficult about:
{noformat}
if (context == MERGING)
 return mySpecialmergingInput();
else
 return super.openInput(...);
{noformat}


 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Reporter: Michael McCandless
 Attachments: LUCENE-2793.patch


 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-11 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980559#action_12980559
]

Shai Erera commented on LUCENE-2793:

bq. This way the bufferSize is settable by the user, which I don't think is
easily possible today (perhaps the main motivation for this issue?)

Exactly ! That's what I thought of this issue in the first place - allow the
app to more easily control the size of the buffers used by Lucene. That that
IOContext will allow me to set different bufferSizes per the work that Lucene's
doing (i.e. merge, search) is a bonus for me :).

I still think bufferSize can be used as a hint by the specific Directory impl,
but it should be there.

If we want to allow control of different buffer sizes per operation, then maybe
we can do the following:
* Set a default buffer size on IOContext.
* Add a setBufferSize(OpType type, int/long bufferSize)
* Add getBufferSize(OpType) -- returns the set buffer size, or the default if
not set.

I don't think it's too complicated and gives enough control to the application?

Directory createOutput and openInput should take an IOContext
-

Key: LUCENE-2793
URL: https://issues.apache.org/jira/browse/LUCENE-2793
Project: Lucene - Java
Issue Type: Improvement
Components: Store
Reporter: Michael McCandless
Attachments: LUCENE-2793.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-11 Thread Shai Erera (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980561#action_12980561
]

Shai Erera commented on LUCENE-2793:

Robert, nothing is too difficult to understand. But your code example tells me
that if I want to control the buffer size used by Lucene (whether it's done
intelligently or not today is what we try to fix here), I need to create my own
Dir impl. That seems an overkill to me, for just controlling the bufferSize !?

Directory createOutput and openInput should take an IOContext
-

Key: LUCENE-2793
URL: https://issues.apache.org/jira/browse/LUCENE-2793
Project: Lucene - Java
Issue Type: Improvement
Components: Store
Reporter: Michael McCandless
Attachments: LUCENE-2793.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-11 Thread Robert Muir (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12980564#action_12980564
]

Robert Muir commented on LUCENE-2793:
-

bq. But your code example tells me that if I want to control the buffer size
used by Lucene (whether it's done intelligently or not today is what we try to
fix here), I need to create my own Dir impl. That seems an overkill to me, for
just controlling the bufferSize !?

No you dont... you can also call the .setBufferSize like the skiplist reader
does.

{noformat}
IndexInput input = dir.openInput()
if (input instanceof BufferedIndexInput)
...setBufferSize(whatever_you_want);
{noformat}

Because after all, buffersize only makes sense on Buffered*.

It makes no sense to be on Directory's openInput, e.g. it makes no sense for
directories like MMapDirectory.

Directory createOutput and openInput should take an IOContext
-

Key: LUCENE-2793
URL: https://issues.apache.org/jira/browse/LUCENE-2793
Project: Lucene - Java
Issue Type: Improvement
Components: Store
Reporter: Michael McCandless
Attachments: LUCENE-2793.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-10 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979728#action_12979728
]

Jason Rutherglen commented on LUCENE-2793:
--

Shall I take this one? With the plan being to add config options to IWC so
that IW uses the DirectIOLinuxDirectory (and it's variants) only for merging?

Directory createOutput and openInput should take an IOContext
-

Key: LUCENE-2793
URL: https://issues.apache.org/jira/browse/LUCENE-2793
Project: Lucene - Java
Issue Type: Improvement
Components: Store
Reporter: Michael McCandless

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-10 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979731#action_12979731
 ] 

Michael McCandless commented on LUCENE-2793:


Yes please!

But, this issue only adds the IOContext, threading it down to when you open an 
input / create an output.  That context should hold enough information to 
allow the Dir impl to make decisions like buffer sizes and avoiding buffer 
cache, etc.


 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Reporter: Michael McCandless

 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-10 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979736#action_12979736
 ] 

Jason Rutherglen commented on LUCENE-2793:
--

{quote}this issue only adds the IOContext, threading it down to when you open 
an input / create an output{quote}

Does this mean we're not implementing this part?

{quote}This will require fixing how IW pools readers, so that a reader opened 
for merging is not then used for searching, and vice/versa. Really, it's only 
all the open file handles that need to be different - we could in theory share 
del docs, norms, etc, if that were somehow possible.{quote}

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Reporter: Michael McCandless

 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-10 Thread Michael McCandless (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979738#action_12979738
]

Michael McCandless commented on LUCENE-2793:

Sorry, I think we should also do that as part of this issue. Basically the
IOContext needs to become part of the cache key uses in IW's ReaderPool?

Directory createOutput and openInput should take an IOContext
-

Key: LUCENE-2793
URL: https://issues.apache.org/jira/browse/LUCENE-2793
Project: Lucene - Java
Issue Type: Improvement
Components: Store
Reporter: Michael McCandless

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2011-01-10 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12979740#action_12979740
]

Jason Rutherglen commented on LUCENE-2793:
--

bq. Basically the IOContext needs to become part of the cache key uses in IW's
ReaderPool?

Great, I'll implement this.

Directory createOutput and openInput should take an IOContext
-

Key: LUCENE-2793
URL: https://issues.apache.org/jira/browse/LUCENE-2793
Project: Lucene - Java
Issue Type: Improvement
Components: Store
Reporter: Michael McCandless

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2010-12-11 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12970530#action_12970530
 ] 

Jason Rutherglen commented on LUCENE-2793:
--

Perhaps we can add the ability to throttle Directory level IO to this issue.   

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Reporter: Michael McCandless

 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2010-12-05 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966963#action_12966963
 ] 

Robert Muir commented on LUCENE-2793:
-

There is another problem we should solve here, and that is the buffersize 
problem.

This is totally broken at the moment for custom directories, here's an example.
I wanted to set the buffersize by default to 4096 (since i measured this is 
like a 20% improvement for my directory impl).

looking at the apis you would think that you simply override the openInput that 
takes no buffer size like this:
{noformat}
  @Override
  public IndexInput openInput(String name) throws IOException {
return openInput(name, 4096);
  }
{noformat}

unfortunately this doesnt work at all! instead you have to do something like 
this for it to actually work:
{noformat}
   @Override
   public IndexInput openInput(String name, int bufferSize) throws IOException {
  ensureOpen();
  return new IndexInput(name, Math.max(bufferSize, 4096));
   }
{noformat}

The problem is, throughout lucene's APIs, the directory's default is never 
used, instead the static BufferedIndexInput.BUFFER_SIZE is used everywhere... 
eg SegmentReader.get:

{noformat}
  public static SegmentReader get(boolean readOnly, SegmentInfo si, int 
termInfosIndexDivisor) throws CorruptIndexException, IOException {
return get(readOnly, si.dir, si, BufferedIndexInput.BUFFER_SIZE, true, 
termInfosIndexDivisor);
  }
{noformat}

So I think lucene's apis should never specify buffersize, we should remove it 
completely from the codecs api, and it should be *replaced* with IOContext.


 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Reporter: Michael McCandless

 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2010-12-03 Thread Robert Muir (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966564#action_12966564
]

Robert Muir commented on LUCENE-2793:
-

Sounds great, at least in theory we could do the DIRECT thing on windows too
for this case.
and we can certainly give the sequential hint.

in general hints like this are probably good for custom dir impls too, we never
know what they are doing
but its good to somehow inform them of our intentions so they have a chance to
be optimal.

Directory createOutput and openInput should take an IOContext
-

Key: LUCENE-2793
URL: https://issues.apache.org/jira/browse/LUCENE-2793
Project: Lucene - Java
Issue Type: Improvement
Components: Store
Reporter: Michael McCandless

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2793) Directory createOutput and openInput should take an IOContext

2010-12-03 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12966566#action_12966566
 ] 

Uwe Schindler commented on LUCENE-2793:
---

And MMapDir should return a simple/optimized direct FSIndexInput also. Because 
MMapping for merging is just wasted resources/address space.

 Directory createOutput and openInput should take an IOContext
 -

 Key: LUCENE-2793
 URL: https://issues.apache.org/jira/browse/LUCENE-2793
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Reporter: Michael McCandless

 Today for merging we pass down a larger readBufferSize than for searching 
 because we get better performance.
 I think we should generalize this to a class (IOContext), which would hold 
 the buffer size, but then could hold other flags like DIRECT (bypass OS's 
 buffer cache), SEQUENTIAL, etc.
 Then, we can make the DirectIOLinuxDirectory fully usable because we would 
 only use DIRECT/SEQUENTIAL during merging.
 This will require fixing how IW pools readers, so that a reader opened for 
 merging is not then used for searching, and vice/versa.  Really, it's only 
 all the open file handles that need to be different -- we could in theory 
 share del docs, norms, etc, if that were somehow possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

65 matches

Mail list logo