[jira] [Updated] (LUCENE-3201) improved compound file handling

2011-10-07 Thread Robert Muir (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3201:


Priority: Minor  (was: Blocker)

not a blocker, it was pulled from 3.x (and fixed in trunk)

 improved compound file handling
 ---

 Key: LUCENE-3201
 URL: https://issues.apache.org/jira/browse/LUCENE-3201
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
Assignee: Simon Willnauer
Priority: Minor
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3201.patch, LUCENE-3201.patch


 Currently CompoundFileReader could use some improvements, i see the following 
 problems
 * its CSIndexInput extends bufferedindexinput, which is stupid for 
 directories like mmap.
 * it seeks on every readInternal
 * its not possible for a directory to override or improve the handling of 
 compound files.
 for example: it seems if you were impl'ing this thing from scratch, you would 
 just wrap the II directly (not extend BufferedIndexInput,
 and add compound file offset X to seek() calls, and override length(). But of 
 course, then you couldnt throw read past EOF always when you should,
 as a user could read into the next file and be left unaware.
 however, some directories could handle this better. for example MMapDirectory 
 could return an indexinput that simply mmaps the 'slice' of the CFS file.
 its underlying bytebuffer etc naturally does bounds checks already etc, so it 
 wouldnt need to be buffered, not even needing to add any offsets to seek(),
 as its position would just work.
 So I think we should try to refactor this so that a Directory can customize 
 how compound files are handled, the simplest 
 case for the least code change would be to add this to Directory.java:
 {code}
   public Directory openCompoundInput(String filename) {
 return new CompoundFileReader(this, filename);
   }
 {code}
 Because most code depends upon the fact compound files are implemented as a 
 Directory and transparent. at least then a subclass could override...
 but the 'recursion' is a little ugly... we could still label it 
 expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3201) improved compound file handling

2011-09-14 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3201:
---

Fix Version/s: (was: 3.4)
   3.5

 improved compound file handling
 ---

 Key: LUCENE-3201
 URL: https://issues.apache.org/jira/browse/LUCENE-3201
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
Assignee: Simon Willnauer
Priority: Blocker
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3201.patch, LUCENE-3201.patch


 Currently CompoundFileReader could use some improvements, i see the following 
 problems
 * its CSIndexInput extends bufferedindexinput, which is stupid for 
 directories like mmap.
 * it seeks on every readInternal
 * its not possible for a directory to override or improve the handling of 
 compound files.
 for example: it seems if you were impl'ing this thing from scratch, you would 
 just wrap the II directly (not extend BufferedIndexInput,
 and add compound file offset X to seek() calls, and override length(). But of 
 course, then you couldnt throw read past EOF always when you should,
 as a user could read into the next file and be left unaware.
 however, some directories could handle this better. for example MMapDirectory 
 could return an indexinput that simply mmaps the 'slice' of the CFS file.
 its underlying bytebuffer etc naturally does bounds checks already etc, so it 
 wouldnt need to be buffered, not even needing to add any offsets to seek(),
 as its position would just work.
 So I think we should try to refactor this so that a Directory can customize 
 how compound files are handled, the simplest 
 case for the least code change would be to add this to Directory.java:
 {code}
   public Directory openCompoundInput(String filename) {
 return new CompoundFileReader(this, filename);
   }
 {code}
 Because most code depends upon the fact compound files are implemented as a 
 Directory and transparent. at least then a subclass could override...
 but the 'recursion' is a little ugly... we could still label it 
 expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3201) improved compound file handling

2011-06-26 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3201:


Fix Version/s: (was: 3.3)
   3.4

 improved compound file handling
 ---

 Key: LUCENE-3201
 URL: https://issues.apache.org/jira/browse/LUCENE-3201
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
Assignee: Simon Willnauer
 Fix For: 3.4, 4.0

 Attachments: LUCENE-3201.patch, LUCENE-3201.patch


 Currently CompoundFileReader could use some improvements, i see the following 
 problems
 * its CSIndexInput extends bufferedindexinput, which is stupid for 
 directories like mmap.
 * it seeks on every readInternal
 * its not possible for a directory to override or improve the handling of 
 compound files.
 for example: it seems if you were impl'ing this thing from scratch, you would 
 just wrap the II directly (not extend BufferedIndexInput,
 and add compound file offset X to seek() calls, and override length(). But of 
 course, then you couldnt throw read past EOF always when you should,
 as a user could read into the next file and be left unaware.
 however, some directories could handle this better. for example MMapDirectory 
 could return an indexinput that simply mmaps the 'slice' of the CFS file.
 its underlying bytebuffer etc naturally does bounds checks already etc, so it 
 wouldnt need to be buffered, not even needing to add any offsets to seek(),
 as its position would just work.
 So I think we should try to refactor this so that a Directory can customize 
 how compound files are handled, the simplest 
 case for the least code change would be to add this to Directory.java:
 {code}
   public Directory openCompoundInput(String filename) {
 return new CompoundFileReader(this, filename);
   }
 {code}
 Because most code depends upon the fact compound files are implemented as a 
 Directory and transparent. at least then a subclass could override...
 but the 'recursion' is a little ugly... we could still label it 
 expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3201) improved compound file handling

2011-06-14 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3201:


Attachment: LUCENE-3201.patch

Initial patch for review. In this patch I only cut over MMapDirectory to using 
a special CompoundFileDirectory, all others use the default as before (but i 
cleaned up some things about it).

Pretty sure i can easily improve SimpleFS and NIOFS, i'll take a look at that 
now, but I wanted to get this up for review.


 improved compound file handling
 ---

 Key: LUCENE-3201
 URL: https://issues.apache.org/jira/browse/LUCENE-3201
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Attachments: LUCENE-3201.patch


 Currently CompoundFileReader could use some improvements, i see the following 
 problems
 * its CSIndexInput extends bufferedindexinput, which is stupid for 
 directories like mmap.
 * it seeks on every readInternal
 * its not possible for a directory to override or improve the handling of 
 compound files.
 for example: it seems if you were impl'ing this thing from scratch, you would 
 just wrap the II directly (not extend BufferedIndexInput,
 and add compound file offset X to seek() calls, and override length(). But of 
 course, then you couldnt throw read past EOF always when you should,
 as a user could read into the next file and be left unaware.
 however, some directories could handle this better. for example MMapDirectory 
 could return an indexinput that simply mmaps the 'slice' of the CFS file.
 its underlying bytebuffer etc naturally does bounds checks already etc, so it 
 wouldnt need to be buffered, not even needing to add any offsets to seek(),
 as its position would just work.
 So I think we should try to refactor this so that a Directory can customize 
 how compound files are handled, the simplest 
 case for the least code change would be to add this to Directory.java:
 {code}
   public Directory openCompoundInput(String filename) {
 return new CompoundFileReader(this, filename);
   }
 {code}
 Because most code depends upon the fact compound files are implemented as a 
 Directory and transparent. at least then a subclass could override...
 but the 'recursion' is a little ugly... we could still label it 
 expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3201) improved compound file handling

2011-06-14 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3201:


Fix Version/s: 4.0
   3.3

setting 3.3/4.0 as fix version, as the changes are backwards compatible 
(compoundfilereader is pkg-private still in 3.x)


 improved compound file handling
 ---

 Key: LUCENE-3201
 URL: https://issues.apache.org/jira/browse/LUCENE-3201
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3201.patch


 Currently CompoundFileReader could use some improvements, i see the following 
 problems
 * its CSIndexInput extends bufferedindexinput, which is stupid for 
 directories like mmap.
 * it seeks on every readInternal
 * its not possible for a directory to override or improve the handling of 
 compound files.
 for example: it seems if you were impl'ing this thing from scratch, you would 
 just wrap the II directly (not extend BufferedIndexInput,
 and add compound file offset X to seek() calls, and override length(). But of 
 course, then you couldnt throw read past EOF always when you should,
 as a user could read into the next file and be left unaware.
 however, some directories could handle this better. for example MMapDirectory 
 could return an indexinput that simply mmaps the 'slice' of the CFS file.
 its underlying bytebuffer etc naturally does bounds checks already etc, so it 
 wouldnt need to be buffered, not even needing to add any offsets to seek(),
 as its position would just work.
 So I think we should try to refactor this so that a Directory can customize 
 how compound files are handled, the simplest 
 case for the least code change would be to add this to Directory.java:
 {code}
   public Directory openCompoundInput(String filename) {
 return new CompoundFileReader(this, filename);
   }
 {code}
 Because most code depends upon the fact compound files are implemented as a 
 Directory and transparent. at least then a subclass could override...
 but the 'recursion' is a little ugly... we could still label it 
 expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3201) improved compound file handling

2011-06-14 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3201:


Attachment: LUCENE-3201.patch

here is an updated patch, including impls for SimpleFS and NIOFS, fixing the 
FileSwitchDirectory thing uwe mentioned, and also mockdirectorywrapper and 
NRTCachingDirectory.

all the tests pass with Simple/NIO/MMap but we need to benchmark. haven't had 
good luck today with luceneutil

 improved compound file handling
 ---

 Key: LUCENE-3201
 URL: https://issues.apache.org/jira/browse/LUCENE-3201
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Robert Muir
 Fix For: 3.3, 4.0

 Attachments: LUCENE-3201.patch, LUCENE-3201.patch


 Currently CompoundFileReader could use some improvements, i see the following 
 problems
 * its CSIndexInput extends bufferedindexinput, which is stupid for 
 directories like mmap.
 * it seeks on every readInternal
 * its not possible for a directory to override or improve the handling of 
 compound files.
 for example: it seems if you were impl'ing this thing from scratch, you would 
 just wrap the II directly (not extend BufferedIndexInput,
 and add compound file offset X to seek() calls, and override length(). But of 
 course, then you couldnt throw read past EOF always when you should,
 as a user could read into the next file and be left unaware.
 however, some directories could handle this better. for example MMapDirectory 
 could return an indexinput that simply mmaps the 'slice' of the CFS file.
 its underlying bytebuffer etc naturally does bounds checks already etc, so it 
 wouldnt need to be buffered, not even needing to add any offsets to seek(),
 as its position would just work.
 So I think we should try to refactor this so that a Directory can customize 
 how compound files are handled, the simplest 
 case for the least code change would be to add this to Directory.java:
 {code}
   public Directory openCompoundInput(String filename) {
 return new CompoundFileReader(this, filename);
   }
 {code}
 Because most code depends upon the fact compound files are implemented as a 
 Directory and transparent. at least then a subclass could override...
 but the 'recursion' is a little ugly... we could still label it 
 expert+internal+experimental or whatever.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org