[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

2012-09-06 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449931#comment-13449931
 ] 

Michael McCandless commented on LUCENE-4123:


bq. I am not sure if we really need that directory. With my changes in 
LUCENE-3659 we can handle that easily (also for files  2 GiB). LUCENE-3659 
makes the buf size of RAMDir configureable (depending on IOContext while 
writing) and when you do new RAMDirectory(otherDir) - to cache the whole dir in 
RAM - it will use the maximum possible buffer size for the underlying file (2 
GiB) - as we dont write and need no smaller buf size.

Actually I think the two dirs have different use cases.

So I think we should do both: 1) fix RAMDir to do better buffering
(LUCENE-3659) and 2) add this new dir.

RAMDir is good for pure in-memory indices (for testing, or transient
usage, etc.) or for pulling in a read-only index from disk, while
CachingRAMDir (I think we should rename it to CachingDirWrapper) is
good if you want to write to the index but also want persistence,
since all writes go straight to the wrapped directory.

I don't think the limitations of this dir (max 2.1 GB file size) need
to block committing ... the javadocs call this out, and we can improve
it later.  It could be wrapping the byte[] in ByteBuffer and using
ByteBufferII doesn't lose any perf: that would be great. But we can
explore that after committing.

But definitely +1 to get LUCENE-3659 in...


 Add CachingRAMDirectory
 ---

 Key: LUCENE-4123
 URL: https://issues.apache.org/jira/browse/LUCENE-4123
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/store
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4123.patch, LUCENE-4123.patch, LUCENE-4123.patch, 
 LUCENE-4123.patch


 The directory is very simple and useful if you have an index that you
 know fully fits into available RAM.  You could also use FileSwitchDir if
 you want to leave some files (eg stored fields or term vectors) on disk.
 It wraps any other Directory and delegates all writing (IndexOutput) to
 it, but for reading (IndexInput), it allocates a single byte[] and fully
 reads the file in and then serves requests off that single byte[].  It's
 more GC friendly than RAMDir since it only allocates a single array per
 file.
 It has a few nocommits still, but all tests pass if I wrap the delegate
 inside MockDirectoryWrapper using this.
 I tested with 1M Wikipedia english index (would like to test w/ 10M docs
 but I don't have enough RAM...); it seems to give a nice speedup:
 {noformat}
 TaskQPS base StdDev base  QPS cachedStdDev cached  
 Pct diff
  Respell  197.007.27  203.198.17   -4% -  
  11%
 PKLookup  121.122.80  125.463.20   -1% -  
   8%
   Fuzzy2   66.622.62   69.912.85   -3% -  
  13%
   Fuzzy1  206.206.47  222.216.521% -  
  14%
TermGroup100K  160.146.62  175.713.793% -  
  16%
   Phrase   34.850.40   38.750.618% -  
  14%
   TermBGroup100K  363.75   15.74  406.98   13.233% -  
  20%
 SpanNear   53.081.11   59.532.944% -  
  20%
 TermBGroup100K1P  222.539.78  252.865.966% -  
  21%
 SloppyPhrase   70.362.05   79.954.484% -  
  23%
 Wildcard  238.104.29  272.784.97   10% -  
  18%
OrHighMed  123.494.85  149.324.66   12% -  
  29%
  Prefix3  288.468.10  350.405.38   16% -  
  26%
   OrHighHigh   76.463.27   93.132.96   13% -  
  31%
   IntNRQ   92.252.12  113.475.74   14% -  
  32%
 Term  757.12   39.03  958.62   22.68   17% -  
  36%
  AndHighHigh  103.034.48  133.893.76   21% -  
  39%
   AndHighMed  376.36   16.58  493.99   10.00   23% -  
  40%
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

2012-09-05 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448829#comment-13448829
 ] 

Shai Erera commented on LUCENE-4123:


Besides Uwe's ideas for improvements, is this Directory operable? I.e., if you 
chose to commit what you have accomplished so far, do tests fail? Is it safe to 
use?

I'm thinking progress, not perfection -- we can always introduce improvements 
later.

 Add CachingRAMDirectory
 ---

 Key: LUCENE-4123
 URL: https://issues.apache.org/jira/browse/LUCENE-4123
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/store
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4123.patch


 The directory is very simple and useful if you have an index that you
 know fully fits into available RAM.  You could also use FileSwitchDir if
 you want to leave some files (eg stored fields or term vectors) on disk.
 It wraps any other Directory and delegates all writing (IndexOutput) to
 it, but for reading (IndexInput), it allocates a single byte[] and fully
 reads the file in and then serves requests off that single byte[].  It's
 more GC friendly than RAMDir since it only allocates a single array per
 file.
 It has a few nocommits still, but all tests pass if I wrap the delegate
 inside MockDirectoryWrapper using this.
 I tested with 1M Wikipedia english index (would like to test w/ 10M docs
 but I don't have enough RAM...); it seems to give a nice speedup:
 {noformat}
 TaskQPS base StdDev base  QPS cachedStdDev cached  
 Pct diff
  Respell  197.007.27  203.198.17   -4% -  
  11%
 PKLookup  121.122.80  125.463.20   -1% -  
   8%
   Fuzzy2   66.622.62   69.912.85   -3% -  
  13%
   Fuzzy1  206.206.47  222.216.521% -  
  14%
TermGroup100K  160.146.62  175.713.793% -  
  16%
   Phrase   34.850.40   38.750.618% -  
  14%
   TermBGroup100K  363.75   15.74  406.98   13.233% -  
  20%
 SpanNear   53.081.11   59.532.944% -  
  20%
 TermBGroup100K1P  222.539.78  252.865.966% -  
  21%
 SloppyPhrase   70.362.05   79.954.484% -  
  23%
 Wildcard  238.104.29  272.784.97   10% -  
  18%
OrHighMed  123.494.85  149.324.66   12% -  
  29%
  Prefix3  288.468.10  350.405.38   16% -  
  26%
   OrHighHigh   76.463.27   93.132.96   13% -  
  31%
   IntNRQ   92.252.12  113.475.74   14% -  
  32%
 Term  757.12   39.03  958.62   22.68   17% -  
  36%
  AndHighHigh  103.034.48  133.893.76   21% -  
  39%
   AndHighMed  376.36   16.58  493.99   10.00   23% -  
  40%
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

2012-09-05 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448842#comment-13448842
 ] 

Michael McCandless commented on LUCENE-4123:


I believe it is safe ... eg all tests pass if I wrap MDW's delegate w/ this in 
newDirectory ...

I'll update the patch w/ Uwe and Robert's suggestions ...

 Add CachingRAMDirectory
 ---

 Key: LUCENE-4123
 URL: https://issues.apache.org/jira/browse/LUCENE-4123
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/store
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4123.patch


 The directory is very simple and useful if you have an index that you
 know fully fits into available RAM.  You could also use FileSwitchDir if
 you want to leave some files (eg stored fields or term vectors) on disk.
 It wraps any other Directory and delegates all writing (IndexOutput) to
 it, but for reading (IndexInput), it allocates a single byte[] and fully
 reads the file in and then serves requests off that single byte[].  It's
 more GC friendly than RAMDir since it only allocates a single array per
 file.
 It has a few nocommits still, but all tests pass if I wrap the delegate
 inside MockDirectoryWrapper using this.
 I tested with 1M Wikipedia english index (would like to test w/ 10M docs
 but I don't have enough RAM...); it seems to give a nice speedup:
 {noformat}
 TaskQPS base StdDev base  QPS cachedStdDev cached  
 Pct diff
  Respell  197.007.27  203.198.17   -4% -  
  11%
 PKLookup  121.122.80  125.463.20   -1% -  
   8%
   Fuzzy2   66.622.62   69.912.85   -3% -  
  13%
   Fuzzy1  206.206.47  222.216.521% -  
  14%
TermGroup100K  160.146.62  175.713.793% -  
  16%
   Phrase   34.850.40   38.750.618% -  
  14%
   TermBGroup100K  363.75   15.74  406.98   13.233% -  
  20%
 SpanNear   53.081.11   59.532.944% -  
  20%
 TermBGroup100K1P  222.539.78  252.865.966% -  
  21%
 SloppyPhrase   70.362.05   79.954.484% -  
  23%
 Wildcard  238.104.29  272.784.97   10% -  
  18%
OrHighMed  123.494.85  149.324.66   12% -  
  29%
  Prefix3  288.468.10  350.405.38   16% -  
  26%
   OrHighHigh   76.463.27   93.132.96   13% -  
  31%
   IntNRQ   92.252.12  113.475.74   14% -  
  32%
 Term  757.12   39.03  958.62   22.68   17% -  
  36%
  AndHighHigh  103.034.48  133.893.76   21% -  
  39%
   AndHighMed  376.36   16.58  493.99   10.00   23% -  
  40%
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

2012-09-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448870#comment-13448870
 ] 

Robert Muir commented on LUCENE-4123:
-

looks good... i dont really like that close() is a no-op and that seek() has no 
checks (since its deferred, if you seek somewhere negative you wont know until 
later).

you could probably fix both of these, e.g. keep the byte[] final but let 
close() turn set the position negative, catch NegativeArray and throw ACE.
then just throw IAE on seek if the incoming long is negative at least, since 
you reserve it to mean closed.

I also don't like that its a delegator.

should the underlying read check for BufferedII and pass useBuffer=false?




 Add CachingRAMDirectory
 ---

 Key: LUCENE-4123
 URL: https://issues.apache.org/jira/browse/LUCENE-4123
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/store
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4123.patch, LUCENE-4123.patch


 The directory is very simple and useful if you have an index that you
 know fully fits into available RAM.  You could also use FileSwitchDir if
 you want to leave some files (eg stored fields or term vectors) on disk.
 It wraps any other Directory and delegates all writing (IndexOutput) to
 it, but for reading (IndexInput), it allocates a single byte[] and fully
 reads the file in and then serves requests off that single byte[].  It's
 more GC friendly than RAMDir since it only allocates a single array per
 file.
 It has a few nocommits still, but all tests pass if I wrap the delegate
 inside MockDirectoryWrapper using this.
 I tested with 1M Wikipedia english index (would like to test w/ 10M docs
 but I don't have enough RAM...); it seems to give a nice speedup:
 {noformat}
 TaskQPS base StdDev base  QPS cachedStdDev cached  
 Pct diff
  Respell  197.007.27  203.198.17   -4% -  
  11%
 PKLookup  121.122.80  125.463.20   -1% -  
   8%
   Fuzzy2   66.622.62   69.912.85   -3% -  
  13%
   Fuzzy1  206.206.47  222.216.521% -  
  14%
TermGroup100K  160.146.62  175.713.793% -  
  16%
   Phrase   34.850.40   38.750.618% -  
  14%
   TermBGroup100K  363.75   15.74  406.98   13.233% -  
  20%
 SpanNear   53.081.11   59.532.944% -  
  20%
 TermBGroup100K1P  222.539.78  252.865.966% -  
  21%
 SloppyPhrase   70.362.05   79.954.484% -  
  23%
 Wildcard  238.104.29  272.784.97   10% -  
  18%
OrHighMed  123.494.85  149.324.66   12% -  
  29%
  Prefix3  288.468.10  350.405.38   16% -  
  26%
   OrHighHigh   76.463.27   93.132.96   13% -  
  31%
   IntNRQ   92.252.12  113.475.74   14% -  
  32%
 Term  757.12   39.03  958.62   22.68   17% -  
  36%
  AndHighHigh  103.034.48  133.893.76   21% -  
  39%
   AndHighMed  376.36   16.58  493.99   10.00   23% -  
  40%
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

2012-09-05 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448884#comment-13448884
 ] 

Robert Muir commented on LUCENE-4123:
-

also readBytes should not catch ArrayIndexOutOfBoundsException. it must be the 
more general IndexOutOfBoundsException.

 Add CachingRAMDirectory
 ---

 Key: LUCENE-4123
 URL: https://issues.apache.org/jira/browse/LUCENE-4123
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/store
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4123.patch, LUCENE-4123.patch


 The directory is very simple and useful if you have an index that you
 know fully fits into available RAM.  You could also use FileSwitchDir if
 you want to leave some files (eg stored fields or term vectors) on disk.
 It wraps any other Directory and delegates all writing (IndexOutput) to
 it, but for reading (IndexInput), it allocates a single byte[] and fully
 reads the file in and then serves requests off that single byte[].  It's
 more GC friendly than RAMDir since it only allocates a single array per
 file.
 It has a few nocommits still, but all tests pass if I wrap the delegate
 inside MockDirectoryWrapper using this.
 I tested with 1M Wikipedia english index (would like to test w/ 10M docs
 but I don't have enough RAM...); it seems to give a nice speedup:
 {noformat}
 TaskQPS base StdDev base  QPS cachedStdDev cached  
 Pct diff
  Respell  197.007.27  203.198.17   -4% -  
  11%
 PKLookup  121.122.80  125.463.20   -1% -  
   8%
   Fuzzy2   66.622.62   69.912.85   -3% -  
  13%
   Fuzzy1  206.206.47  222.216.521% -  
  14%
TermGroup100K  160.146.62  175.713.793% -  
  16%
   Phrase   34.850.40   38.750.618% -  
  14%
   TermBGroup100K  363.75   15.74  406.98   13.233% -  
  20%
 SpanNear   53.081.11   59.532.944% -  
  20%
 TermBGroup100K1P  222.539.78  252.865.966% -  
  21%
 SloppyPhrase   70.362.05   79.954.484% -  
  23%
 Wildcard  238.104.29  272.784.97   10% -  
  18%
OrHighMed  123.494.85  149.324.66   12% -  
  29%
  Prefix3  288.468.10  350.405.38   16% -  
  26%
   OrHighHigh   76.463.27   93.132.96   13% -  
  31%
   IntNRQ   92.252.12  113.475.74   14% -  
  32%
 Term  757.12   39.03  958.62   22.68   17% -  
  36%
  AndHighHigh  103.034.48  133.893.76   21% -  
  39%
   AndHighMed  376.36   16.58  493.99   10.00   23% -  
  40%
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

2012-09-05 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449225#comment-13449225
 ] 

Uwe Schindler commented on LUCENE-4123:
---

Mike,
I am not sure if we really need that directory. With my changes in LUCENE-3659 
we can handle that easily (also for files  2 GiB). LUCENE-3659 makes the buf 
size of RAMDir configureable (depending on IOContext while writing) and when 
you do new RAMDirectory(otherDir) - to cache the whole dir in RAM - it will use 
the maximum possible buffer size for the underlying file (2 GiB) - as we dont 
write and need no smaller buf size.

We should really get LUCENE-3659 in. The only missing parts are:
- make RAMFile visible to ConcurrentMap after IndexOutput is closed, so we dont 
need synchronization on RAMFile
- use maybe Robert's cool ByteBufferIndexInput from LUCENE-4364

 Add CachingRAMDirectory
 ---

 Key: LUCENE-4123
 URL: https://issues.apache.org/jira/browse/LUCENE-4123
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/store
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4123.patch, LUCENE-4123.patch, LUCENE-4123.patch, 
 LUCENE-4123.patch


 The directory is very simple and useful if you have an index that you
 know fully fits into available RAM.  You could also use FileSwitchDir if
 you want to leave some files (eg stored fields or term vectors) on disk.
 It wraps any other Directory and delegates all writing (IndexOutput) to
 it, but for reading (IndexInput), it allocates a single byte[] and fully
 reads the file in and then serves requests off that single byte[].  It's
 more GC friendly than RAMDir since it only allocates a single array per
 file.
 It has a few nocommits still, but all tests pass if I wrap the delegate
 inside MockDirectoryWrapper using this.
 I tested with 1M Wikipedia english index (would like to test w/ 10M docs
 but I don't have enough RAM...); it seems to give a nice speedup:
 {noformat}
 TaskQPS base StdDev base  QPS cachedStdDev cached  
 Pct diff
  Respell  197.007.27  203.198.17   -4% -  
  11%
 PKLookup  121.122.80  125.463.20   -1% -  
   8%
   Fuzzy2   66.622.62   69.912.85   -3% -  
  13%
   Fuzzy1  206.206.47  222.216.521% -  
  14%
TermGroup100K  160.146.62  175.713.793% -  
  16%
   Phrase   34.850.40   38.750.618% -  
  14%
   TermBGroup100K  363.75   15.74  406.98   13.233% -  
  20%
 SpanNear   53.081.11   59.532.944% -  
  20%
 TermBGroup100K1P  222.539.78  252.865.966% -  
  21%
 SloppyPhrase   70.362.05   79.954.484% -  
  23%
 Wildcard  238.104.29  272.784.97   10% -  
  18%
OrHighMed  123.494.85  149.324.66   12% -  
  29%
  Prefix3  288.468.10  350.405.38   16% -  
  26%
   OrHighHigh   76.463.27   93.132.96   13% -  
  31%
   IntNRQ   92.252.12  113.475.74   14% -  
  32%
 Term  757.12   39.03  958.62   22.68   17% -  
  36%
  AndHighHigh  103.034.48  133.893.76   21% -  
  39%
   AndHighMed  376.36   16.58  493.99   10.00   23% -  
  40%
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

2012-07-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404743#comment-13404743
 ] 

Michael McCandless commented on LUCENE-4123:


bq. You should make the II correctly throw IOExceptions like MMap does, so 
catch the AIOOBE and rethrow as EOFException (just copy the code).

+1.  Are we sure the catch + rethrow adds no cost?

Though, I think tests don't actually fail as is, because I intentionally skip 
caching segments_N.  Probably we should improve that to skip any file that's 
opened with readOnce=true.

bq. Can we make this IndexInput impl extend ByteArrayDataInput somehow?

+1

I won't have time for this any time soon so if you want to work on it Uwe feel 
free!

 Add CachingRAMDirectory
 ---

 Key: LUCENE-4123
 URL: https://issues.apache.org/jira/browse/LUCENE-4123
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/store
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4123.patch


 The directory is very simple and useful if you have an index that you
 know fully fits into available RAM.  You could also use FileSwitchDir if
 you want to leave some files (eg stored fields or term vectors) on disk.
 It wraps any other Directory and delegates all writing (IndexOutput) to
 it, but for reading (IndexInput), it allocates a single byte[] and fully
 reads the file in and then serves requests off that single byte[].  It's
 more GC friendly than RAMDir since it only allocates a single array per
 file.
 It has a few nocommits still, but all tests pass if I wrap the delegate
 inside MockDirectoryWrapper using this.
 I tested with 1M Wikipedia english index (would like to test w/ 10M docs
 but I don't have enough RAM...); it seems to give a nice speedup:
 {noformat}
 TaskQPS base StdDev base  QPS cachedStdDev cached  
 Pct diff
  Respell  197.007.27  203.198.17   -4% -  
  11%
 PKLookup  121.122.80  125.463.20   -1% -  
   8%
   Fuzzy2   66.622.62   69.912.85   -3% -  
  13%
   Fuzzy1  206.206.47  222.216.521% -  
  14%
TermGroup100K  160.146.62  175.713.793% -  
  16%
   Phrase   34.850.40   38.750.618% -  
  14%
   TermBGroup100K  363.75   15.74  406.98   13.233% -  
  20%
 SpanNear   53.081.11   59.532.944% -  
  20%
 TermBGroup100K1P  222.539.78  252.865.966% -  
  21%
 SloppyPhrase   70.362.05   79.954.484% -  
  23%
 Wildcard  238.104.29  272.784.97   10% -  
  18%
OrHighMed  123.494.85  149.324.66   12% -  
  29%
  Prefix3  288.468.10  350.405.38   16% -  
  26%
   OrHighHigh   76.463.27   93.132.96   13% -  
  31%
   IntNRQ   92.252.12  113.475.74   14% -  
  32%
 Term  757.12   39.03  958.62   22.68   17% -  
  36%
  AndHighHigh  103.034.48  133.893.76   21% -  
  39%
   AndHighMed  376.36   16.58  493.99   10.00   23% -  
  40%
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

2012-07-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404742#comment-13404742
 ] 

Michael McCandless commented on LUCENE-4123:


bq. You should make the II correctly throw IOExceptions like MMap does, so 
catch the AIOOBE and rethrow as EOFException (just copy the code).

+1.  Are we sure the catch + rethrow adds no cost?

Though, I think tests don't actually fail as is, because I intentionally skip 
caching segments_N.  Probably we should improve that to skip any file that's 
opened with readOnce=true.

bq. Can we make this IndexInput impl extend ByteArrayDataInput somehow?

+1

I won't have time for this any time soon so if you want to work on it Uwe feel 
free!

 Add CachingRAMDirectory
 ---

 Key: LUCENE-4123
 URL: https://issues.apache.org/jira/browse/LUCENE-4123
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/store
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4123.patch


 The directory is very simple and useful if you have an index that you
 know fully fits into available RAM.  You could also use FileSwitchDir if
 you want to leave some files (eg stored fields or term vectors) on disk.
 It wraps any other Directory and delegates all writing (IndexOutput) to
 it, but for reading (IndexInput), it allocates a single byte[] and fully
 reads the file in and then serves requests off that single byte[].  It's
 more GC friendly than RAMDir since it only allocates a single array per
 file.
 It has a few nocommits still, but all tests pass if I wrap the delegate
 inside MockDirectoryWrapper using this.
 I tested with 1M Wikipedia english index (would like to test w/ 10M docs
 but I don't have enough RAM...); it seems to give a nice speedup:
 {noformat}
 TaskQPS base StdDev base  QPS cachedStdDev cached  
 Pct diff
  Respell  197.007.27  203.198.17   -4% -  
  11%
 PKLookup  121.122.80  125.463.20   -1% -  
   8%
   Fuzzy2   66.622.62   69.912.85   -3% -  
  13%
   Fuzzy1  206.206.47  222.216.521% -  
  14%
TermGroup100K  160.146.62  175.713.793% -  
  16%
   Phrase   34.850.40   38.750.618% -  
  14%
   TermBGroup100K  363.75   15.74  406.98   13.233% -  
  20%
 SpanNear   53.081.11   59.532.944% -  
  20%
 TermBGroup100K1P  222.539.78  252.865.966% -  
  21%
 SloppyPhrase   70.362.05   79.954.484% -  
  23%
 Wildcard  238.104.29  272.784.97   10% -  
  18%
OrHighMed  123.494.85  149.324.66   12% -  
  29%
  Prefix3  288.468.10  350.405.38   16% -  
  26%
   OrHighHigh   76.463.27   93.132.96   13% -  
  31%
   IntNRQ   92.252.12  113.475.74   14% -  
  32%
 Term  757.12   39.03  958.62   22.68   17% -  
  36%
  AndHighHigh  103.034.48  133.893.76   21% -  
  39%
   AndHighMed  376.36   16.58  493.99   10.00   23% -  
  40%
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

2012-07-01 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404747#comment-13404747
 ] 

Uwe Schindler commented on LUCENE-4123:
---

bq. Are we sure the catch + rethrow adds no cost?

Yes! It is definitely also less work for hotspot than asserts :-)

In general, throwing exceptions instead of if statements is used because of 
this. The exception matrix is in the metadata of a method and just defines the 
goto statements in the exceptional case. If you dont catch exception, this 
matrix only contains the bubble-up entry, otherwise the jvm goes to our 
bytecode that simply rethrows. this rethrowing is seldom, so overhead 
bywrapping the inner exception is neglectible.

Exceptions for array indexes are implemented by traps on most processors, so in 
the exceptional case (AIOOBE) is not happening it does not exist.

bq. I won't have time for this any time soon so if you want to work on it Uwe 
feel free!

I have some time. The RAMDir issue is also open (it should also use index 
exceptions instead of if statements), so can look into it next week.

 Add CachingRAMDirectory
 ---

 Key: LUCENE-4123
 URL: https://issues.apache.org/jira/browse/LUCENE-4123
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/store
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4123.patch


 The directory is very simple and useful if you have an index that you
 know fully fits into available RAM.  You could also use FileSwitchDir if
 you want to leave some files (eg stored fields or term vectors) on disk.
 It wraps any other Directory and delegates all writing (IndexOutput) to
 it, but for reading (IndexInput), it allocates a single byte[] and fully
 reads the file in and then serves requests off that single byte[].  It's
 more GC friendly than RAMDir since it only allocates a single array per
 file.
 It has a few nocommits still, but all tests pass if I wrap the delegate
 inside MockDirectoryWrapper using this.
 I tested with 1M Wikipedia english index (would like to test w/ 10M docs
 but I don't have enough RAM...); it seems to give a nice speedup:
 {noformat}
 TaskQPS base StdDev base  QPS cachedStdDev cached  
 Pct diff
  Respell  197.007.27  203.198.17   -4% -  
  11%
 PKLookup  121.122.80  125.463.20   -1% -  
   8%
   Fuzzy2   66.622.62   69.912.85   -3% -  
  13%
   Fuzzy1  206.206.47  222.216.521% -  
  14%
TermGroup100K  160.146.62  175.713.793% -  
  16%
   Phrase   34.850.40   38.750.618% -  
  14%
   TermBGroup100K  363.75   15.74  406.98   13.233% -  
  20%
 SpanNear   53.081.11   59.532.944% -  
  20%
 TermBGroup100K1P  222.539.78  252.865.966% -  
  21%
 SloppyPhrase   70.362.05   79.954.484% -  
  23%
 Wildcard  238.104.29  272.784.97   10% -  
  18%
OrHighMed  123.494.85  149.324.66   12% -  
  29%
  Prefix3  288.468.10  350.405.38   16% -  
  26%
   OrHighHigh   76.463.27   93.132.96   13% -  
  31%
   IntNRQ   92.252.12  113.475.74   14% -  
  32%
 Term  757.12   39.03  958.62   22.68   17% -  
  36%
  AndHighHigh  103.034.48  133.893.76   21% -  
  39%
   AndHighMed  376.36   16.58  493.99   10.00   23% -  
  40%
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

2012-07-01 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404856#comment-13404856
 ] 

Michael McCandless commented on LUCENE-4123:


OK thanks Uwe!

 Add CachingRAMDirectory
 ---

 Key: LUCENE-4123
 URL: https://issues.apache.org/jira/browse/LUCENE-4123
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/store
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4123.patch


 The directory is very simple and useful if you have an index that you
 know fully fits into available RAM.  You could also use FileSwitchDir if
 you want to leave some files (eg stored fields or term vectors) on disk.
 It wraps any other Directory and delegates all writing (IndexOutput) to
 it, but for reading (IndexInput), it allocates a single byte[] and fully
 reads the file in and then serves requests off that single byte[].  It's
 more GC friendly than RAMDir since it only allocates a single array per
 file.
 It has a few nocommits still, but all tests pass if I wrap the delegate
 inside MockDirectoryWrapper using this.
 I tested with 1M Wikipedia english index (would like to test w/ 10M docs
 but I don't have enough RAM...); it seems to give a nice speedup:
 {noformat}
 TaskQPS base StdDev base  QPS cachedStdDev cached  
 Pct diff
  Respell  197.007.27  203.198.17   -4% -  
  11%
 PKLookup  121.122.80  125.463.20   -1% -  
   8%
   Fuzzy2   66.622.62   69.912.85   -3% -  
  13%
   Fuzzy1  206.206.47  222.216.521% -  
  14%
TermGroup100K  160.146.62  175.713.793% -  
  16%
   Phrase   34.850.40   38.750.618% -  
  14%
   TermBGroup100K  363.75   15.74  406.98   13.233% -  
  20%
 SpanNear   53.081.11   59.532.944% -  
  20%
 TermBGroup100K1P  222.539.78  252.865.966% -  
  21%
 SloppyPhrase   70.362.05   79.954.484% -  
  23%
 Wildcard  238.104.29  272.784.97   10% -  
  18%
OrHighMed  123.494.85  149.324.66   12% -  
  29%
  Prefix3  288.468.10  350.405.38   16% -  
  26%
   OrHighHigh   76.463.27   93.132.96   13% -  
  31%
   IntNRQ   92.252.12  113.475.74   14% -  
  32%
 Term  757.12   39.03  958.62   22.68   17% -  
  36%
  AndHighHigh  103.034.48  133.893.76   21% -  
  39%
   AndHighMed  376.36   16.58  493.99   10.00   23% -  
  40%
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

2012-06-30 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404429#comment-13404429
 ] 

Uwe Schindler commented on LUCENE-4123:
---

You should make the II correctly throw IOExceptions like MMap does, so catch 
the AIOOBE and rethrow as EOFException (just copy the code). This does not have 
any speed effect. Otherwise some tests will definitely fail.

 Add CachingRAMDirectory
 ---

 Key: LUCENE-4123
 URL: https://issues.apache.org/jira/browse/LUCENE-4123
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/store
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4123.patch


 The directory is very simple and useful if you have an index that you
 know fully fits into available RAM.  You could also use FileSwitchDir if
 you want to leave some files (eg stored fields or term vectors) on disk.
 It wraps any other Directory and delegates all writing (IndexOutput) to
 it, but for reading (IndexInput), it allocates a single byte[] and fully
 reads the file in and then serves requests off that single byte[].  It's
 more GC friendly than RAMDir since it only allocates a single array per
 file.
 It has a few nocommits still, but all tests pass if I wrap the delegate
 inside MockDirectoryWrapper using this.
 I tested with 1M Wikipedia english index (would like to test w/ 10M docs
 but I don't have enough RAM...); it seems to give a nice speedup:
 {noformat}
 TaskQPS base StdDev base  QPS cachedStdDev cached  
 Pct diff
  Respell  197.007.27  203.198.17   -4% -  
  11%
 PKLookup  121.122.80  125.463.20   -1% -  
   8%
   Fuzzy2   66.622.62   69.912.85   -3% -  
  13%
   Fuzzy1  206.206.47  222.216.521% -  
  14%
TermGroup100K  160.146.62  175.713.793% -  
  16%
   Phrase   34.850.40   38.750.618% -  
  14%
   TermBGroup100K  363.75   15.74  406.98   13.233% -  
  20%
 SpanNear   53.081.11   59.532.944% -  
  20%
 TermBGroup100K1P  222.539.78  252.865.966% -  
  21%
 SloppyPhrase   70.362.05   79.954.484% -  
  23%
 Wildcard  238.104.29  272.784.97   10% -  
  18%
OrHighMed  123.494.85  149.324.66   12% -  
  29%
  Prefix3  288.468.10  350.405.38   16% -  
  26%
   OrHighHigh   76.463.27   93.132.96   13% -  
  31%
   IntNRQ   92.252.12  113.475.74   14% -  
  32%
 Term  757.12   39.03  958.62   22.68   17% -  
  36%
  AndHighHigh  103.034.48  133.893.76   21% -  
  39%
   AndHighMed  376.36   16.58  493.99   10.00   23% -  
  40%
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

2012-06-30 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404452#comment-13404452
 ] 

Uwe Schindler commented on LUCENE-4123:
---

When thinking more about the patch:
Can we make this IndexInput impl extend ByteArrayDataInput somehow? I would 
also like to fix ByteArrayDataInput to correctly rethrow AIOOBE and remove the 
vint methods. We already did tests with FSTs that showed that the code 
duplication is not helpful.

 Add CachingRAMDirectory
 ---

 Key: LUCENE-4123
 URL: https://issues.apache.org/jira/browse/LUCENE-4123
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/store
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4123.patch


 The directory is very simple and useful if you have an index that you
 know fully fits into available RAM.  You could also use FileSwitchDir if
 you want to leave some files (eg stored fields or term vectors) on disk.
 It wraps any other Directory and delegates all writing (IndexOutput) to
 it, but for reading (IndexInput), it allocates a single byte[] and fully
 reads the file in and then serves requests off that single byte[].  It's
 more GC friendly than RAMDir since it only allocates a single array per
 file.
 It has a few nocommits still, but all tests pass if I wrap the delegate
 inside MockDirectoryWrapper using this.
 I tested with 1M Wikipedia english index (would like to test w/ 10M docs
 but I don't have enough RAM...); it seems to give a nice speedup:
 {noformat}
 TaskQPS base StdDev base  QPS cachedStdDev cached  
 Pct diff
  Respell  197.007.27  203.198.17   -4% -  
  11%
 PKLookup  121.122.80  125.463.20   -1% -  
   8%
   Fuzzy2   66.622.62   69.912.85   -3% -  
  13%
   Fuzzy1  206.206.47  222.216.521% -  
  14%
TermGroup100K  160.146.62  175.713.793% -  
  16%
   Phrase   34.850.40   38.750.618% -  
  14%
   TermBGroup100K  363.75   15.74  406.98   13.233% -  
  20%
 SpanNear   53.081.11   59.532.944% -  
  20%
 TermBGroup100K1P  222.539.78  252.865.966% -  
  21%
 SloppyPhrase   70.362.05   79.954.484% -  
  23%
 Wildcard  238.104.29  272.784.97   10% -  
  18%
OrHighMed  123.494.85  149.324.66   12% -  
  29%
  Prefix3  288.468.10  350.405.38   16% -  
  26%
   OrHighHigh   76.463.27   93.132.96   13% -  
  31%
   IntNRQ   92.252.12  113.475.74   14% -  
  32%
 Term  757.12   39.03  958.62   22.68   17% -  
  36%
  AndHighHigh  103.034.48  133.893.76   21% -  
  39%
   AndHighMed  376.36   16.58  493.99   10.00   23% -  
  40%
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

2012-06-08 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291857#comment-13291857
 ] 

Simon Willnauer commented on LUCENE-4123:
-

bq.I tested with 1M Wikipedia english index (would like to test w/ 10M docs
but I don't have enough RAM...); it seems to give a nice speedup:

#fail! :)

 Add CachingRAMDirectory
 ---

 Key: LUCENE-4123
 URL: https://issues.apache.org/jira/browse/LUCENE-4123
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/store
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4123.patch


 The directory is very simple and useful if you have an index that you
 know fully fits into available RAM.  You could also use FileSwitchDir if
 you want to leave some files (eg stored fields or term vectors) on disk.
 It wraps any other Directory and delegates all writing (IndexOutput) to
 it, but for reading (IndexInput), it allocates a single byte[] and fully
 reads the file in and then serves requests off that single byte[].  It's
 more GC friendly than RAMDir since it only allocates a single array per
 file.
 It has a few nocommits still, but all tests pass if I wrap the delegate
 inside MockDirectoryWrapper using this.
 I tested with 1M Wikipedia english index (would like to test w/ 10M docs
 but I don't have enough RAM...); it seems to give a nice speedup:
 {noformat}
 TaskQPS base StdDev base  QPS cachedStdDev cached  
 Pct diff
  Respell  197.007.27  203.198.17   -4% -  
  11%
 PKLookup  121.122.80  125.463.20   -1% -  
   8%
   Fuzzy2   66.622.62   69.912.85   -3% -  
  13%
   Fuzzy1  206.206.47  222.216.521% -  
  14%
TermGroup100K  160.146.62  175.713.793% -  
  16%
   Phrase   34.850.40   38.750.618% -  
  14%
   TermBGroup100K  363.75   15.74  406.98   13.233% -  
  20%
 SpanNear   53.081.11   59.532.944% -  
  20%
 TermBGroup100K1P  222.539.78  252.865.966% -  
  21%
 SloppyPhrase   70.362.05   79.954.484% -  
  23%
 Wildcard  238.104.29  272.784.97   10% -  
  18%
OrHighMed  123.494.85  149.324.66   12% -  
  29%
  Prefix3  288.468.10  350.405.38   16% -  
  26%
   OrHighHigh   76.463.27   93.132.96   13% -  
  31%
   IntNRQ   92.252.12  113.475.74   14% -  
  32%
 Term  757.12   39.03  958.62   22.68   17% -  
  36%
  AndHighHigh  103.034.48  133.893.76   21% -  
  39%
   AndHighMed  376.36   16.58  493.99   10.00   23% -  
  40%
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

2012-06-08 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291860#comment-13291860
 ] 

Robert Muir commented on LUCENE-4123:
-

I dont think it buys anything to code dup the readVint/vlong here. it should be 
compiled to the same code. e.g. mmapdir doesnt do this.

 Add CachingRAMDirectory
 ---

 Key: LUCENE-4123
 URL: https://issues.apache.org/jira/browse/LUCENE-4123
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/store
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4123.patch


 The directory is very simple and useful if you have an index that you
 know fully fits into available RAM.  You could also use FileSwitchDir if
 you want to leave some files (eg stored fields or term vectors) on disk.
 It wraps any other Directory and delegates all writing (IndexOutput) to
 it, but for reading (IndexInput), it allocates a single byte[] and fully
 reads the file in and then serves requests off that single byte[].  It's
 more GC friendly than RAMDir since it only allocates a single array per
 file.
 It has a few nocommits still, but all tests pass if I wrap the delegate
 inside MockDirectoryWrapper using this.
 I tested with 1M Wikipedia english index (would like to test w/ 10M docs
 but I don't have enough RAM...); it seems to give a nice speedup:
 {noformat}
 TaskQPS base StdDev base  QPS cachedStdDev cached  
 Pct diff
  Respell  197.007.27  203.198.17   -4% -  
  11%
 PKLookup  121.122.80  125.463.20   -1% -  
   8%
   Fuzzy2   66.622.62   69.912.85   -3% -  
  13%
   Fuzzy1  206.206.47  222.216.521% -  
  14%
TermGroup100K  160.146.62  175.713.793% -  
  16%
   Phrase   34.850.40   38.750.618% -  
  14%
   TermBGroup100K  363.75   15.74  406.98   13.233% -  
  20%
 SpanNear   53.081.11   59.532.944% -  
  20%
 TermBGroup100K1P  222.539.78  252.865.966% -  
  21%
 SloppyPhrase   70.362.05   79.954.484% -  
  23%
 Wildcard  238.104.29  272.784.97   10% -  
  18%
OrHighMed  123.494.85  149.324.66   12% -  
  29%
  Prefix3  288.468.10  350.405.38   16% -  
  26%
   OrHighHigh   76.463.27   93.132.96   13% -  
  31%
   IntNRQ   92.252.12  113.475.74   14% -  
  32%
 Term  757.12   39.03  958.62   22.68   17% -  
  36%
  AndHighHigh  103.034.48  133.893.76   21% -  
  39%
   AndHighMed  376.36   16.58  493.99   10.00   23% -  
  40%
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

2012-06-08 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291884#comment-13291884
 ] 

Michael McCandless commented on LUCENE-4123:


Results for 5M doc index:

{noformat}
TaskQPS base StdDev base  QPS cachedStdDev cached  Pct 
diff
 Respell  104.067.63  108.597.55   -9% -   
20%
 TermGroup1M   57.941.59   60.700.301% -
8%
TermBGroup1M  103.282.54  108.512.540% -   
10%
  Fuzzy2   43.072.96   45.323.06   -8% -   
20%
  Fuzzy1   72.644.73   76.924.38   -6% -   
19%
  TermBGroup1M1P   90.143.03   95.953.81   -1% -   
14%
  IntNRQ   16.010.95   17.170.330% -   
16%
PKLookup   86.212.51   92.552.591% -   
13%
Wildcard   65.513.13   71.001.451% -   
16%
   OrHighMed   21.641.83   23.561.24   -4% -   
25%
 Prefix3  105.334.94  114.752.461% -   
16%
  OrHighHigh   17.391.45   18.970.95   -4% -   
24%
 AndHighHigh   30.051.14   33.420.884% -   
18%
Term  243.139.03  273.928.265% -   
20%
SloppyPhrase   15.800.28   17.840.786% -   
19%
SpanNear   10.520.14   11.970.259% -   
17%
  AndHighMed  117.603.54  135.912.49   10% -   
21%
  Phrase   20.150.78   24.220.26   14% -   
26%
{noformat}


 Add CachingRAMDirectory
 ---

 Key: LUCENE-4123
 URL: https://issues.apache.org/jira/browse/LUCENE-4123
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/store
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4123.patch


 The directory is very simple and useful if you have an index that you
 know fully fits into available RAM.  You could also use FileSwitchDir if
 you want to leave some files (eg stored fields or term vectors) on disk.
 It wraps any other Directory and delegates all writing (IndexOutput) to
 it, but for reading (IndexInput), it allocates a single byte[] and fully
 reads the file in and then serves requests off that single byte[].  It's
 more GC friendly than RAMDir since it only allocates a single array per
 file.
 It has a few nocommits still, but all tests pass if I wrap the delegate
 inside MockDirectoryWrapper using this.
 I tested with 1M Wikipedia english index (would like to test w/ 10M docs
 but I don't have enough RAM...); it seems to give a nice speedup:
 {noformat}
 TaskQPS base StdDev base  QPS cachedStdDev cached  
 Pct diff
  Respell  197.007.27  203.198.17   -4% -  
  11%
 PKLookup  121.122.80  125.463.20   -1% -  
   8%
   Fuzzy2   66.622.62   69.912.85   -3% -  
  13%
   Fuzzy1  206.206.47  222.216.521% -  
  14%
TermGroup100K  160.146.62  175.713.793% -  
  16%
   Phrase   34.850.40   38.750.618% -  
  14%
   TermBGroup100K  363.75   15.74  406.98   13.233% -  
  20%
 SpanNear   53.081.11   59.532.944% -  
  20%
 TermBGroup100K1P  222.539.78  252.865.966% -  
  21%
 SloppyPhrase   70.362.05   79.954.484% -  
  23%
 Wildcard  238.104.29  272.784.97   10% -  
  18%
OrHighMed  123.494.85  149.324.66   12% -  
  29%
  Prefix3  288.468.10  350.405.38   16% -  
  26%
   OrHighHigh   76.463.27   93.132.96   13% -  
  31%
   IntNRQ   92.252.12  113.475.74   14% -  
  32%
 Term  757.12   39.03  958.62   22.68   17% -  
  36%
  AndHighHigh  103.034.48  133.893.76   21% -  
  39%
   AndHighMed  376.36   16.58  493.99   10.00   23% -  
  40%
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: 

[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory

2012-06-08 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291886#comment-13291886
 ] 

Michael McCandless commented on LUCENE-4123:


bq. I dont think it buys anything to code dup the readVint/vlong here. it 
should be compiled to the same code. e.g. mmapdir doesnt do this.

I think you're right!  Here are the results w/ the code dup removed (same 
static seed as previous 5M doc results):

{noformat}
TaskQPS base StdDev base  QPS cachedStdDev cached  Pct 
diff
  IntNRQ   16.360.86   16.920.75   -6% -   
14%
  TermBGroup1M1P   91.713.03   95.073.94   -3% -   
11%
 TermGroup1M   58.141.00   60.381.530% -
8%
TermBGroup1M  103.111.76  108.142.630% -
9%
 Prefix3  108.830.97  115.052.892% -
9%
Wildcard   67.270.72   71.221.712% -
9%
 Respell  102.297.78  109.087.22   -7% -   
23%
  Fuzzy2   42.462.95   45.513.31   -7% -   
23%
  Fuzzy1   72.463.55   77.964.51   -3% -   
19%
Term  247.45   17.73  268.17   12.28   -3% -   
22%
   OrHighMed   22.381.19   24.471.64   -3% -   
23%
  OrHighHigh   18.010.92   19.711.20   -2% -   
22%
 AndHighHigh   30.790.35   33.800.377% -   
12%
PKLookup   84.712.40   93.952.325% -   
16%
SpanNear   10.540.13   12.020.13   11% -   
16%
  AndHighMed  119.181.05  136.641.80   12% -   
17%
SloppyPhrase   15.500.15   18.260.30   14% -   
20%
  Phrase   20.640.12   24.940.48   17% -   
23%
{noformat}

So I'll remove the code dup.

 Add CachingRAMDirectory
 ---

 Key: LUCENE-4123
 URL: https://issues.apache.org/jira/browse/LUCENE-4123
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/store
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-4123.patch


 The directory is very simple and useful if you have an index that you
 know fully fits into available RAM.  You could also use FileSwitchDir if
 you want to leave some files (eg stored fields or term vectors) on disk.
 It wraps any other Directory and delegates all writing (IndexOutput) to
 it, but for reading (IndexInput), it allocates a single byte[] and fully
 reads the file in and then serves requests off that single byte[].  It's
 more GC friendly than RAMDir since it only allocates a single array per
 file.
 It has a few nocommits still, but all tests pass if I wrap the delegate
 inside MockDirectoryWrapper using this.
 I tested with 1M Wikipedia english index (would like to test w/ 10M docs
 but I don't have enough RAM...); it seems to give a nice speedup:
 {noformat}
 TaskQPS base StdDev base  QPS cachedStdDev cached  
 Pct diff
  Respell  197.007.27  203.198.17   -4% -  
  11%
 PKLookup  121.122.80  125.463.20   -1% -  
   8%
   Fuzzy2   66.622.62   69.912.85   -3% -  
  13%
   Fuzzy1  206.206.47  222.216.521% -  
  14%
TermGroup100K  160.146.62  175.713.793% -  
  16%
   Phrase   34.850.40   38.750.618% -  
  14%
   TermBGroup100K  363.75   15.74  406.98   13.233% -  
  20%
 SpanNear   53.081.11   59.532.944% -  
  20%
 TermBGroup100K1P  222.539.78  252.865.966% -  
  21%
 SloppyPhrase   70.362.05   79.954.484% -  
  23%
 Wildcard  238.104.29  272.784.97   10% -  
  18%
OrHighMed  123.494.85  149.324.66   12% -  
  29%
  Prefix3  288.468.10  350.405.38   16% -  
  26%
   OrHighHigh   76.463.27   93.132.96   13% -  
  31%
   IntNRQ   92.252.12  113.475.74   14% -  
  32%
 Term  757.12   39.03  958.62   22.68   17% -  
  36%
  AndHighHigh  103.034.48  133.893.76   21% -  
  39%
   AndHighMed  376.36   16.58  493.99   10.00   23% -  
  40%