[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449931#comment-13449931 ] Michael McCandless commented on LUCENE-4123: bq. I am not sure if we really need that directory. With my changes in LUCENE-3659 we can handle that easily (also for files 2 GiB). LUCENE-3659 makes the buf size of RAMDir configureable (depending on IOContext while writing) and when you do new RAMDirectory(otherDir) - to cache the whole dir in RAM - it will use the maximum possible buffer size for the underlying file (2 GiB) - as we dont write and need no smaller buf size. Actually I think the two dirs have different use cases. So I think we should do both: 1) fix RAMDir to do better buffering (LUCENE-3659) and 2) add this new dir. RAMDir is good for pure in-memory indices (for testing, or transient usage, etc.) or for pulling in a read-only index from disk, while CachingRAMDir (I think we should rename it to CachingDirWrapper) is good if you want to write to the index but also want persistence, since all writes go straight to the wrapped directory. I don't think the limitations of this dir (max 2.1 GB file size) need to block committing ... the javadocs call this out, and we can improve it later. It could be wrapping the byte[] in ByteBuffer and using ByteBufferII doesn't lose any perf: that would be great. But we can explore that after committing. But definitely +1 to get LUCENE-3659 in... Add CachingRAMDirectory --- Key: LUCENE-4123 URL: https://issues.apache.org/jira/browse/LUCENE-4123 Project: Lucene - Core Issue Type: Bug Components: core/store Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4123.patch, LUCENE-4123.patch, LUCENE-4123.patch, LUCENE-4123.patch The directory is very simple and useful if you have an index that you know fully fits into available RAM. You could also use FileSwitchDir if you want to leave some files (eg stored fields or term vectors) on disk. It wraps any other Directory and delegates all writing (IndexOutput) to it, but for reading (IndexInput), it allocates a single byte[] and fully reads the file in and then serves requests off that single byte[]. It's more GC friendly than RAMDir since it only allocates a single array per file. It has a few nocommits still, but all tests pass if I wrap the delegate inside MockDirectoryWrapper using this. I tested with 1M Wikipedia english index (would like to test w/ 10M docs but I don't have enough RAM...); it seems to give a nice speedup: {noformat} TaskQPS base StdDev base QPS cachedStdDev cached Pct diff Respell 197.007.27 203.198.17 -4% - 11% PKLookup 121.122.80 125.463.20 -1% - 8% Fuzzy2 66.622.62 69.912.85 -3% - 13% Fuzzy1 206.206.47 222.216.521% - 14% TermGroup100K 160.146.62 175.713.793% - 16% Phrase 34.850.40 38.750.618% - 14% TermBGroup100K 363.75 15.74 406.98 13.233% - 20% SpanNear 53.081.11 59.532.944% - 20% TermBGroup100K1P 222.539.78 252.865.966% - 21% SloppyPhrase 70.362.05 79.954.484% - 23% Wildcard 238.104.29 272.784.97 10% - 18% OrHighMed 123.494.85 149.324.66 12% - 29% Prefix3 288.468.10 350.405.38 16% - 26% OrHighHigh 76.463.27 93.132.96 13% - 31% IntNRQ 92.252.12 113.475.74 14% - 32% Term 757.12 39.03 958.62 22.68 17% - 36% AndHighHigh 103.034.48 133.893.76 21% - 39% AndHighMed 376.36 16.58 493.99 10.00 23% - 40% {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448829#comment-13448829 ] Shai Erera commented on LUCENE-4123: Besides Uwe's ideas for improvements, is this Directory operable? I.e., if you chose to commit what you have accomplished so far, do tests fail? Is it safe to use? I'm thinking progress, not perfection -- we can always introduce improvements later. Add CachingRAMDirectory --- Key: LUCENE-4123 URL: https://issues.apache.org/jira/browse/LUCENE-4123 Project: Lucene - Core Issue Type: Bug Components: core/store Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4123.patch The directory is very simple and useful if you have an index that you know fully fits into available RAM. You could also use FileSwitchDir if you want to leave some files (eg stored fields or term vectors) on disk. It wraps any other Directory and delegates all writing (IndexOutput) to it, but for reading (IndexInput), it allocates a single byte[] and fully reads the file in and then serves requests off that single byte[]. It's more GC friendly than RAMDir since it only allocates a single array per file. It has a few nocommits still, but all tests pass if I wrap the delegate inside MockDirectoryWrapper using this. I tested with 1M Wikipedia english index (would like to test w/ 10M docs but I don't have enough RAM...); it seems to give a nice speedup: {noformat} TaskQPS base StdDev base QPS cachedStdDev cached Pct diff Respell 197.007.27 203.198.17 -4% - 11% PKLookup 121.122.80 125.463.20 -1% - 8% Fuzzy2 66.622.62 69.912.85 -3% - 13% Fuzzy1 206.206.47 222.216.521% - 14% TermGroup100K 160.146.62 175.713.793% - 16% Phrase 34.850.40 38.750.618% - 14% TermBGroup100K 363.75 15.74 406.98 13.233% - 20% SpanNear 53.081.11 59.532.944% - 20% TermBGroup100K1P 222.539.78 252.865.966% - 21% SloppyPhrase 70.362.05 79.954.484% - 23% Wildcard 238.104.29 272.784.97 10% - 18% OrHighMed 123.494.85 149.324.66 12% - 29% Prefix3 288.468.10 350.405.38 16% - 26% OrHighHigh 76.463.27 93.132.96 13% - 31% IntNRQ 92.252.12 113.475.74 14% - 32% Term 757.12 39.03 958.62 22.68 17% - 36% AndHighHigh 103.034.48 133.893.76 21% - 39% AndHighMed 376.36 16.58 493.99 10.00 23% - 40% {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448842#comment-13448842 ] Michael McCandless commented on LUCENE-4123: I believe it is safe ... eg all tests pass if I wrap MDW's delegate w/ this in newDirectory ... I'll update the patch w/ Uwe and Robert's suggestions ... Add CachingRAMDirectory --- Key: LUCENE-4123 URL: https://issues.apache.org/jira/browse/LUCENE-4123 Project: Lucene - Core Issue Type: Bug Components: core/store Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4123.patch The directory is very simple and useful if you have an index that you know fully fits into available RAM. You could also use FileSwitchDir if you want to leave some files (eg stored fields or term vectors) on disk. It wraps any other Directory and delegates all writing (IndexOutput) to it, but for reading (IndexInput), it allocates a single byte[] and fully reads the file in and then serves requests off that single byte[]. It's more GC friendly than RAMDir since it only allocates a single array per file. It has a few nocommits still, but all tests pass if I wrap the delegate inside MockDirectoryWrapper using this. I tested with 1M Wikipedia english index (would like to test w/ 10M docs but I don't have enough RAM...); it seems to give a nice speedup: {noformat} TaskQPS base StdDev base QPS cachedStdDev cached Pct diff Respell 197.007.27 203.198.17 -4% - 11% PKLookup 121.122.80 125.463.20 -1% - 8% Fuzzy2 66.622.62 69.912.85 -3% - 13% Fuzzy1 206.206.47 222.216.521% - 14% TermGroup100K 160.146.62 175.713.793% - 16% Phrase 34.850.40 38.750.618% - 14% TermBGroup100K 363.75 15.74 406.98 13.233% - 20% SpanNear 53.081.11 59.532.944% - 20% TermBGroup100K1P 222.539.78 252.865.966% - 21% SloppyPhrase 70.362.05 79.954.484% - 23% Wildcard 238.104.29 272.784.97 10% - 18% OrHighMed 123.494.85 149.324.66 12% - 29% Prefix3 288.468.10 350.405.38 16% - 26% OrHighHigh 76.463.27 93.132.96 13% - 31% IntNRQ 92.252.12 113.475.74 14% - 32% Term 757.12 39.03 958.62 22.68 17% - 36% AndHighHigh 103.034.48 133.893.76 21% - 39% AndHighMed 376.36 16.58 493.99 10.00 23% - 40% {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448870#comment-13448870 ] Robert Muir commented on LUCENE-4123: - looks good... i dont really like that close() is a no-op and that seek() has no checks (since its deferred, if you seek somewhere negative you wont know until later). you could probably fix both of these, e.g. keep the byte[] final but let close() turn set the position negative, catch NegativeArray and throw ACE. then just throw IAE on seek if the incoming long is negative at least, since you reserve it to mean closed. I also don't like that its a delegator. should the underlying read check for BufferedII and pass useBuffer=false? Add CachingRAMDirectory --- Key: LUCENE-4123 URL: https://issues.apache.org/jira/browse/LUCENE-4123 Project: Lucene - Core Issue Type: Bug Components: core/store Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4123.patch, LUCENE-4123.patch The directory is very simple and useful if you have an index that you know fully fits into available RAM. You could also use FileSwitchDir if you want to leave some files (eg stored fields or term vectors) on disk. It wraps any other Directory and delegates all writing (IndexOutput) to it, but for reading (IndexInput), it allocates a single byte[] and fully reads the file in and then serves requests off that single byte[]. It's more GC friendly than RAMDir since it only allocates a single array per file. It has a few nocommits still, but all tests pass if I wrap the delegate inside MockDirectoryWrapper using this. I tested with 1M Wikipedia english index (would like to test w/ 10M docs but I don't have enough RAM...); it seems to give a nice speedup: {noformat} TaskQPS base StdDev base QPS cachedStdDev cached Pct diff Respell 197.007.27 203.198.17 -4% - 11% PKLookup 121.122.80 125.463.20 -1% - 8% Fuzzy2 66.622.62 69.912.85 -3% - 13% Fuzzy1 206.206.47 222.216.521% - 14% TermGroup100K 160.146.62 175.713.793% - 16% Phrase 34.850.40 38.750.618% - 14% TermBGroup100K 363.75 15.74 406.98 13.233% - 20% SpanNear 53.081.11 59.532.944% - 20% TermBGroup100K1P 222.539.78 252.865.966% - 21% SloppyPhrase 70.362.05 79.954.484% - 23% Wildcard 238.104.29 272.784.97 10% - 18% OrHighMed 123.494.85 149.324.66 12% - 29% Prefix3 288.468.10 350.405.38 16% - 26% OrHighHigh 76.463.27 93.132.96 13% - 31% IntNRQ 92.252.12 113.475.74 14% - 32% Term 757.12 39.03 958.62 22.68 17% - 36% AndHighHigh 103.034.48 133.893.76 21% - 39% AndHighMed 376.36 16.58 493.99 10.00 23% - 40% {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448884#comment-13448884 ] Robert Muir commented on LUCENE-4123: - also readBytes should not catch ArrayIndexOutOfBoundsException. it must be the more general IndexOutOfBoundsException. Add CachingRAMDirectory --- Key: LUCENE-4123 URL: https://issues.apache.org/jira/browse/LUCENE-4123 Project: Lucene - Core Issue Type: Bug Components: core/store Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4123.patch, LUCENE-4123.patch The directory is very simple and useful if you have an index that you know fully fits into available RAM. You could also use FileSwitchDir if you want to leave some files (eg stored fields or term vectors) on disk. It wraps any other Directory and delegates all writing (IndexOutput) to it, but for reading (IndexInput), it allocates a single byte[] and fully reads the file in and then serves requests off that single byte[]. It's more GC friendly than RAMDir since it only allocates a single array per file. It has a few nocommits still, but all tests pass if I wrap the delegate inside MockDirectoryWrapper using this. I tested with 1M Wikipedia english index (would like to test w/ 10M docs but I don't have enough RAM...); it seems to give a nice speedup: {noformat} TaskQPS base StdDev base QPS cachedStdDev cached Pct diff Respell 197.007.27 203.198.17 -4% - 11% PKLookup 121.122.80 125.463.20 -1% - 8% Fuzzy2 66.622.62 69.912.85 -3% - 13% Fuzzy1 206.206.47 222.216.521% - 14% TermGroup100K 160.146.62 175.713.793% - 16% Phrase 34.850.40 38.750.618% - 14% TermBGroup100K 363.75 15.74 406.98 13.233% - 20% SpanNear 53.081.11 59.532.944% - 20% TermBGroup100K1P 222.539.78 252.865.966% - 21% SloppyPhrase 70.362.05 79.954.484% - 23% Wildcard 238.104.29 272.784.97 10% - 18% OrHighMed 123.494.85 149.324.66 12% - 29% Prefix3 288.468.10 350.405.38 16% - 26% OrHighHigh 76.463.27 93.132.96 13% - 31% IntNRQ 92.252.12 113.475.74 14% - 32% Term 757.12 39.03 958.62 22.68 17% - 36% AndHighHigh 103.034.48 133.893.76 21% - 39% AndHighMed 376.36 16.58 493.99 10.00 23% - 40% {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13449225#comment-13449225 ] Uwe Schindler commented on LUCENE-4123: --- Mike, I am not sure if we really need that directory. With my changes in LUCENE-3659 we can handle that easily (also for files 2 GiB). LUCENE-3659 makes the buf size of RAMDir configureable (depending on IOContext while writing) and when you do new RAMDirectory(otherDir) - to cache the whole dir in RAM - it will use the maximum possible buffer size for the underlying file (2 GiB) - as we dont write and need no smaller buf size. We should really get LUCENE-3659 in. The only missing parts are: - make RAMFile visible to ConcurrentMap after IndexOutput is closed, so we dont need synchronization on RAMFile - use maybe Robert's cool ByteBufferIndexInput from LUCENE-4364 Add CachingRAMDirectory --- Key: LUCENE-4123 URL: https://issues.apache.org/jira/browse/LUCENE-4123 Project: Lucene - Core Issue Type: Bug Components: core/store Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4123.patch, LUCENE-4123.patch, LUCENE-4123.patch, LUCENE-4123.patch The directory is very simple and useful if you have an index that you know fully fits into available RAM. You could also use FileSwitchDir if you want to leave some files (eg stored fields or term vectors) on disk. It wraps any other Directory and delegates all writing (IndexOutput) to it, but for reading (IndexInput), it allocates a single byte[] and fully reads the file in and then serves requests off that single byte[]. It's more GC friendly than RAMDir since it only allocates a single array per file. It has a few nocommits still, but all tests pass if I wrap the delegate inside MockDirectoryWrapper using this. I tested with 1M Wikipedia english index (would like to test w/ 10M docs but I don't have enough RAM...); it seems to give a nice speedup: {noformat} TaskQPS base StdDev base QPS cachedStdDev cached Pct diff Respell 197.007.27 203.198.17 -4% - 11% PKLookup 121.122.80 125.463.20 -1% - 8% Fuzzy2 66.622.62 69.912.85 -3% - 13% Fuzzy1 206.206.47 222.216.521% - 14% TermGroup100K 160.146.62 175.713.793% - 16% Phrase 34.850.40 38.750.618% - 14% TermBGroup100K 363.75 15.74 406.98 13.233% - 20% SpanNear 53.081.11 59.532.944% - 20% TermBGroup100K1P 222.539.78 252.865.966% - 21% SloppyPhrase 70.362.05 79.954.484% - 23% Wildcard 238.104.29 272.784.97 10% - 18% OrHighMed 123.494.85 149.324.66 12% - 29% Prefix3 288.468.10 350.405.38 16% - 26% OrHighHigh 76.463.27 93.132.96 13% - 31% IntNRQ 92.252.12 113.475.74 14% - 32% Term 757.12 39.03 958.62 22.68 17% - 36% AndHighHigh 103.034.48 133.893.76 21% - 39% AndHighMed 376.36 16.58 493.99 10.00 23% - 40% {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404743#comment-13404743 ] Michael McCandless commented on LUCENE-4123: bq. You should make the II correctly throw IOExceptions like MMap does, so catch the AIOOBE and rethrow as EOFException (just copy the code). +1. Are we sure the catch + rethrow adds no cost? Though, I think tests don't actually fail as is, because I intentionally skip caching segments_N. Probably we should improve that to skip any file that's opened with readOnce=true. bq. Can we make this IndexInput impl extend ByteArrayDataInput somehow? +1 I won't have time for this any time soon so if you want to work on it Uwe feel free! Add CachingRAMDirectory --- Key: LUCENE-4123 URL: https://issues.apache.org/jira/browse/LUCENE-4123 Project: Lucene - Java Issue Type: Bug Components: core/store Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4123.patch The directory is very simple and useful if you have an index that you know fully fits into available RAM. You could also use FileSwitchDir if you want to leave some files (eg stored fields or term vectors) on disk. It wraps any other Directory and delegates all writing (IndexOutput) to it, but for reading (IndexInput), it allocates a single byte[] and fully reads the file in and then serves requests off that single byte[]. It's more GC friendly than RAMDir since it only allocates a single array per file. It has a few nocommits still, but all tests pass if I wrap the delegate inside MockDirectoryWrapper using this. I tested with 1M Wikipedia english index (would like to test w/ 10M docs but I don't have enough RAM...); it seems to give a nice speedup: {noformat} TaskQPS base StdDev base QPS cachedStdDev cached Pct diff Respell 197.007.27 203.198.17 -4% - 11% PKLookup 121.122.80 125.463.20 -1% - 8% Fuzzy2 66.622.62 69.912.85 -3% - 13% Fuzzy1 206.206.47 222.216.521% - 14% TermGroup100K 160.146.62 175.713.793% - 16% Phrase 34.850.40 38.750.618% - 14% TermBGroup100K 363.75 15.74 406.98 13.233% - 20% SpanNear 53.081.11 59.532.944% - 20% TermBGroup100K1P 222.539.78 252.865.966% - 21% SloppyPhrase 70.362.05 79.954.484% - 23% Wildcard 238.104.29 272.784.97 10% - 18% OrHighMed 123.494.85 149.324.66 12% - 29% Prefix3 288.468.10 350.405.38 16% - 26% OrHighHigh 76.463.27 93.132.96 13% - 31% IntNRQ 92.252.12 113.475.74 14% - 32% Term 757.12 39.03 958.62 22.68 17% - 36% AndHighHigh 103.034.48 133.893.76 21% - 39% AndHighMed 376.36 16.58 493.99 10.00 23% - 40% {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404742#comment-13404742 ] Michael McCandless commented on LUCENE-4123: bq. You should make the II correctly throw IOExceptions like MMap does, so catch the AIOOBE and rethrow as EOFException (just copy the code). +1. Are we sure the catch + rethrow adds no cost? Though, I think tests don't actually fail as is, because I intentionally skip caching segments_N. Probably we should improve that to skip any file that's opened with readOnce=true. bq. Can we make this IndexInput impl extend ByteArrayDataInput somehow? +1 I won't have time for this any time soon so if you want to work on it Uwe feel free! Add CachingRAMDirectory --- Key: LUCENE-4123 URL: https://issues.apache.org/jira/browse/LUCENE-4123 Project: Lucene - Java Issue Type: Bug Components: core/store Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4123.patch The directory is very simple and useful if you have an index that you know fully fits into available RAM. You could also use FileSwitchDir if you want to leave some files (eg stored fields or term vectors) on disk. It wraps any other Directory and delegates all writing (IndexOutput) to it, but for reading (IndexInput), it allocates a single byte[] and fully reads the file in and then serves requests off that single byte[]. It's more GC friendly than RAMDir since it only allocates a single array per file. It has a few nocommits still, but all tests pass if I wrap the delegate inside MockDirectoryWrapper using this. I tested with 1M Wikipedia english index (would like to test w/ 10M docs but I don't have enough RAM...); it seems to give a nice speedup: {noformat} TaskQPS base StdDev base QPS cachedStdDev cached Pct diff Respell 197.007.27 203.198.17 -4% - 11% PKLookup 121.122.80 125.463.20 -1% - 8% Fuzzy2 66.622.62 69.912.85 -3% - 13% Fuzzy1 206.206.47 222.216.521% - 14% TermGroup100K 160.146.62 175.713.793% - 16% Phrase 34.850.40 38.750.618% - 14% TermBGroup100K 363.75 15.74 406.98 13.233% - 20% SpanNear 53.081.11 59.532.944% - 20% TermBGroup100K1P 222.539.78 252.865.966% - 21% SloppyPhrase 70.362.05 79.954.484% - 23% Wildcard 238.104.29 272.784.97 10% - 18% OrHighMed 123.494.85 149.324.66 12% - 29% Prefix3 288.468.10 350.405.38 16% - 26% OrHighHigh 76.463.27 93.132.96 13% - 31% IntNRQ 92.252.12 113.475.74 14% - 32% Term 757.12 39.03 958.62 22.68 17% - 36% AndHighHigh 103.034.48 133.893.76 21% - 39% AndHighMed 376.36 16.58 493.99 10.00 23% - 40% {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404747#comment-13404747 ] Uwe Schindler commented on LUCENE-4123: --- bq. Are we sure the catch + rethrow adds no cost? Yes! It is definitely also less work for hotspot than asserts :-) In general, throwing exceptions instead of if statements is used because of this. The exception matrix is in the metadata of a method and just defines the goto statements in the exceptional case. If you dont catch exception, this matrix only contains the bubble-up entry, otherwise the jvm goes to our bytecode that simply rethrows. this rethrowing is seldom, so overhead bywrapping the inner exception is neglectible. Exceptions for array indexes are implemented by traps on most processors, so in the exceptional case (AIOOBE) is not happening it does not exist. bq. I won't have time for this any time soon so if you want to work on it Uwe feel free! I have some time. The RAMDir issue is also open (it should also use index exceptions instead of if statements), so can look into it next week. Add CachingRAMDirectory --- Key: LUCENE-4123 URL: https://issues.apache.org/jira/browse/LUCENE-4123 Project: Lucene - Java Issue Type: Bug Components: core/store Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4123.patch The directory is very simple and useful if you have an index that you know fully fits into available RAM. You could also use FileSwitchDir if you want to leave some files (eg stored fields or term vectors) on disk. It wraps any other Directory and delegates all writing (IndexOutput) to it, but for reading (IndexInput), it allocates a single byte[] and fully reads the file in and then serves requests off that single byte[]. It's more GC friendly than RAMDir since it only allocates a single array per file. It has a few nocommits still, but all tests pass if I wrap the delegate inside MockDirectoryWrapper using this. I tested with 1M Wikipedia english index (would like to test w/ 10M docs but I don't have enough RAM...); it seems to give a nice speedup: {noformat} TaskQPS base StdDev base QPS cachedStdDev cached Pct diff Respell 197.007.27 203.198.17 -4% - 11% PKLookup 121.122.80 125.463.20 -1% - 8% Fuzzy2 66.622.62 69.912.85 -3% - 13% Fuzzy1 206.206.47 222.216.521% - 14% TermGroup100K 160.146.62 175.713.793% - 16% Phrase 34.850.40 38.750.618% - 14% TermBGroup100K 363.75 15.74 406.98 13.233% - 20% SpanNear 53.081.11 59.532.944% - 20% TermBGroup100K1P 222.539.78 252.865.966% - 21% SloppyPhrase 70.362.05 79.954.484% - 23% Wildcard 238.104.29 272.784.97 10% - 18% OrHighMed 123.494.85 149.324.66 12% - 29% Prefix3 288.468.10 350.405.38 16% - 26% OrHighHigh 76.463.27 93.132.96 13% - 31% IntNRQ 92.252.12 113.475.74 14% - 32% Term 757.12 39.03 958.62 22.68 17% - 36% AndHighHigh 103.034.48 133.893.76 21% - 39% AndHighMed 376.36 16.58 493.99 10.00 23% - 40% {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404856#comment-13404856 ] Michael McCandless commented on LUCENE-4123: OK thanks Uwe! Add CachingRAMDirectory --- Key: LUCENE-4123 URL: https://issues.apache.org/jira/browse/LUCENE-4123 Project: Lucene - Java Issue Type: Bug Components: core/store Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4123.patch The directory is very simple and useful if you have an index that you know fully fits into available RAM. You could also use FileSwitchDir if you want to leave some files (eg stored fields or term vectors) on disk. It wraps any other Directory and delegates all writing (IndexOutput) to it, but for reading (IndexInput), it allocates a single byte[] and fully reads the file in and then serves requests off that single byte[]. It's more GC friendly than RAMDir since it only allocates a single array per file. It has a few nocommits still, but all tests pass if I wrap the delegate inside MockDirectoryWrapper using this. I tested with 1M Wikipedia english index (would like to test w/ 10M docs but I don't have enough RAM...); it seems to give a nice speedup: {noformat} TaskQPS base StdDev base QPS cachedStdDev cached Pct diff Respell 197.007.27 203.198.17 -4% - 11% PKLookup 121.122.80 125.463.20 -1% - 8% Fuzzy2 66.622.62 69.912.85 -3% - 13% Fuzzy1 206.206.47 222.216.521% - 14% TermGroup100K 160.146.62 175.713.793% - 16% Phrase 34.850.40 38.750.618% - 14% TermBGroup100K 363.75 15.74 406.98 13.233% - 20% SpanNear 53.081.11 59.532.944% - 20% TermBGroup100K1P 222.539.78 252.865.966% - 21% SloppyPhrase 70.362.05 79.954.484% - 23% Wildcard 238.104.29 272.784.97 10% - 18% OrHighMed 123.494.85 149.324.66 12% - 29% Prefix3 288.468.10 350.405.38 16% - 26% OrHighHigh 76.463.27 93.132.96 13% - 31% IntNRQ 92.252.12 113.475.74 14% - 32% Term 757.12 39.03 958.62 22.68 17% - 36% AndHighHigh 103.034.48 133.893.76 21% - 39% AndHighMed 376.36 16.58 493.99 10.00 23% - 40% {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404429#comment-13404429 ] Uwe Schindler commented on LUCENE-4123: --- You should make the II correctly throw IOExceptions like MMap does, so catch the AIOOBE and rethrow as EOFException (just copy the code). This does not have any speed effect. Otherwise some tests will definitely fail. Add CachingRAMDirectory --- Key: LUCENE-4123 URL: https://issues.apache.org/jira/browse/LUCENE-4123 Project: Lucene - Java Issue Type: Bug Components: core/store Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4123.patch The directory is very simple and useful if you have an index that you know fully fits into available RAM. You could also use FileSwitchDir if you want to leave some files (eg stored fields or term vectors) on disk. It wraps any other Directory and delegates all writing (IndexOutput) to it, but for reading (IndexInput), it allocates a single byte[] and fully reads the file in and then serves requests off that single byte[]. It's more GC friendly than RAMDir since it only allocates a single array per file. It has a few nocommits still, but all tests pass if I wrap the delegate inside MockDirectoryWrapper using this. I tested with 1M Wikipedia english index (would like to test w/ 10M docs but I don't have enough RAM...); it seems to give a nice speedup: {noformat} TaskQPS base StdDev base QPS cachedStdDev cached Pct diff Respell 197.007.27 203.198.17 -4% - 11% PKLookup 121.122.80 125.463.20 -1% - 8% Fuzzy2 66.622.62 69.912.85 -3% - 13% Fuzzy1 206.206.47 222.216.521% - 14% TermGroup100K 160.146.62 175.713.793% - 16% Phrase 34.850.40 38.750.618% - 14% TermBGroup100K 363.75 15.74 406.98 13.233% - 20% SpanNear 53.081.11 59.532.944% - 20% TermBGroup100K1P 222.539.78 252.865.966% - 21% SloppyPhrase 70.362.05 79.954.484% - 23% Wildcard 238.104.29 272.784.97 10% - 18% OrHighMed 123.494.85 149.324.66 12% - 29% Prefix3 288.468.10 350.405.38 16% - 26% OrHighHigh 76.463.27 93.132.96 13% - 31% IntNRQ 92.252.12 113.475.74 14% - 32% Term 757.12 39.03 958.62 22.68 17% - 36% AndHighHigh 103.034.48 133.893.76 21% - 39% AndHighMed 376.36 16.58 493.99 10.00 23% - 40% {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13404452#comment-13404452 ] Uwe Schindler commented on LUCENE-4123: --- When thinking more about the patch: Can we make this IndexInput impl extend ByteArrayDataInput somehow? I would also like to fix ByteArrayDataInput to correctly rethrow AIOOBE and remove the vint methods. We already did tests with FSTs that showed that the code duplication is not helpful. Add CachingRAMDirectory --- Key: LUCENE-4123 URL: https://issues.apache.org/jira/browse/LUCENE-4123 Project: Lucene - Java Issue Type: Bug Components: core/store Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4123.patch The directory is very simple and useful if you have an index that you know fully fits into available RAM. You could also use FileSwitchDir if you want to leave some files (eg stored fields or term vectors) on disk. It wraps any other Directory and delegates all writing (IndexOutput) to it, but for reading (IndexInput), it allocates a single byte[] and fully reads the file in and then serves requests off that single byte[]. It's more GC friendly than RAMDir since it only allocates a single array per file. It has a few nocommits still, but all tests pass if I wrap the delegate inside MockDirectoryWrapper using this. I tested with 1M Wikipedia english index (would like to test w/ 10M docs but I don't have enough RAM...); it seems to give a nice speedup: {noformat} TaskQPS base StdDev base QPS cachedStdDev cached Pct diff Respell 197.007.27 203.198.17 -4% - 11% PKLookup 121.122.80 125.463.20 -1% - 8% Fuzzy2 66.622.62 69.912.85 -3% - 13% Fuzzy1 206.206.47 222.216.521% - 14% TermGroup100K 160.146.62 175.713.793% - 16% Phrase 34.850.40 38.750.618% - 14% TermBGroup100K 363.75 15.74 406.98 13.233% - 20% SpanNear 53.081.11 59.532.944% - 20% TermBGroup100K1P 222.539.78 252.865.966% - 21% SloppyPhrase 70.362.05 79.954.484% - 23% Wildcard 238.104.29 272.784.97 10% - 18% OrHighMed 123.494.85 149.324.66 12% - 29% Prefix3 288.468.10 350.405.38 16% - 26% OrHighHigh 76.463.27 93.132.96 13% - 31% IntNRQ 92.252.12 113.475.74 14% - 32% Term 757.12 39.03 958.62 22.68 17% - 36% AndHighHigh 103.034.48 133.893.76 21% - 39% AndHighMed 376.36 16.58 493.99 10.00 23% - 40% {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291857#comment-13291857 ] Simon Willnauer commented on LUCENE-4123: - bq.I tested with 1M Wikipedia english index (would like to test w/ 10M docs but I don't have enough RAM...); it seems to give a nice speedup: #fail! :) Add CachingRAMDirectory --- Key: LUCENE-4123 URL: https://issues.apache.org/jira/browse/LUCENE-4123 Project: Lucene - Java Issue Type: Bug Components: core/store Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4123.patch The directory is very simple and useful if you have an index that you know fully fits into available RAM. You could also use FileSwitchDir if you want to leave some files (eg stored fields or term vectors) on disk. It wraps any other Directory and delegates all writing (IndexOutput) to it, but for reading (IndexInput), it allocates a single byte[] and fully reads the file in and then serves requests off that single byte[]. It's more GC friendly than RAMDir since it only allocates a single array per file. It has a few nocommits still, but all tests pass if I wrap the delegate inside MockDirectoryWrapper using this. I tested with 1M Wikipedia english index (would like to test w/ 10M docs but I don't have enough RAM...); it seems to give a nice speedup: {noformat} TaskQPS base StdDev base QPS cachedStdDev cached Pct diff Respell 197.007.27 203.198.17 -4% - 11% PKLookup 121.122.80 125.463.20 -1% - 8% Fuzzy2 66.622.62 69.912.85 -3% - 13% Fuzzy1 206.206.47 222.216.521% - 14% TermGroup100K 160.146.62 175.713.793% - 16% Phrase 34.850.40 38.750.618% - 14% TermBGroup100K 363.75 15.74 406.98 13.233% - 20% SpanNear 53.081.11 59.532.944% - 20% TermBGroup100K1P 222.539.78 252.865.966% - 21% SloppyPhrase 70.362.05 79.954.484% - 23% Wildcard 238.104.29 272.784.97 10% - 18% OrHighMed 123.494.85 149.324.66 12% - 29% Prefix3 288.468.10 350.405.38 16% - 26% OrHighHigh 76.463.27 93.132.96 13% - 31% IntNRQ 92.252.12 113.475.74 14% - 32% Term 757.12 39.03 958.62 22.68 17% - 36% AndHighHigh 103.034.48 133.893.76 21% - 39% AndHighMed 376.36 16.58 493.99 10.00 23% - 40% {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291860#comment-13291860 ] Robert Muir commented on LUCENE-4123: - I dont think it buys anything to code dup the readVint/vlong here. it should be compiled to the same code. e.g. mmapdir doesnt do this. Add CachingRAMDirectory --- Key: LUCENE-4123 URL: https://issues.apache.org/jira/browse/LUCENE-4123 Project: Lucene - Java Issue Type: Bug Components: core/store Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4123.patch The directory is very simple and useful if you have an index that you know fully fits into available RAM. You could also use FileSwitchDir if you want to leave some files (eg stored fields or term vectors) on disk. It wraps any other Directory and delegates all writing (IndexOutput) to it, but for reading (IndexInput), it allocates a single byte[] and fully reads the file in and then serves requests off that single byte[]. It's more GC friendly than RAMDir since it only allocates a single array per file. It has a few nocommits still, but all tests pass if I wrap the delegate inside MockDirectoryWrapper using this. I tested with 1M Wikipedia english index (would like to test w/ 10M docs but I don't have enough RAM...); it seems to give a nice speedup: {noformat} TaskQPS base StdDev base QPS cachedStdDev cached Pct diff Respell 197.007.27 203.198.17 -4% - 11% PKLookup 121.122.80 125.463.20 -1% - 8% Fuzzy2 66.622.62 69.912.85 -3% - 13% Fuzzy1 206.206.47 222.216.521% - 14% TermGroup100K 160.146.62 175.713.793% - 16% Phrase 34.850.40 38.750.618% - 14% TermBGroup100K 363.75 15.74 406.98 13.233% - 20% SpanNear 53.081.11 59.532.944% - 20% TermBGroup100K1P 222.539.78 252.865.966% - 21% SloppyPhrase 70.362.05 79.954.484% - 23% Wildcard 238.104.29 272.784.97 10% - 18% OrHighMed 123.494.85 149.324.66 12% - 29% Prefix3 288.468.10 350.405.38 16% - 26% OrHighHigh 76.463.27 93.132.96 13% - 31% IntNRQ 92.252.12 113.475.74 14% - 32% Term 757.12 39.03 958.62 22.68 17% - 36% AndHighHigh 103.034.48 133.893.76 21% - 39% AndHighMed 376.36 16.58 493.99 10.00 23% - 40% {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291884#comment-13291884 ] Michael McCandless commented on LUCENE-4123: Results for 5M doc index: {noformat} TaskQPS base StdDev base QPS cachedStdDev cached Pct diff Respell 104.067.63 108.597.55 -9% - 20% TermGroup1M 57.941.59 60.700.301% - 8% TermBGroup1M 103.282.54 108.512.540% - 10% Fuzzy2 43.072.96 45.323.06 -8% - 20% Fuzzy1 72.644.73 76.924.38 -6% - 19% TermBGroup1M1P 90.143.03 95.953.81 -1% - 14% IntNRQ 16.010.95 17.170.330% - 16% PKLookup 86.212.51 92.552.591% - 13% Wildcard 65.513.13 71.001.451% - 16% OrHighMed 21.641.83 23.561.24 -4% - 25% Prefix3 105.334.94 114.752.461% - 16% OrHighHigh 17.391.45 18.970.95 -4% - 24% AndHighHigh 30.051.14 33.420.884% - 18% Term 243.139.03 273.928.265% - 20% SloppyPhrase 15.800.28 17.840.786% - 19% SpanNear 10.520.14 11.970.259% - 17% AndHighMed 117.603.54 135.912.49 10% - 21% Phrase 20.150.78 24.220.26 14% - 26% {noformat} Add CachingRAMDirectory --- Key: LUCENE-4123 URL: https://issues.apache.org/jira/browse/LUCENE-4123 Project: Lucene - Java Issue Type: Bug Components: core/store Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4123.patch The directory is very simple and useful if you have an index that you know fully fits into available RAM. You could also use FileSwitchDir if you want to leave some files (eg stored fields or term vectors) on disk. It wraps any other Directory and delegates all writing (IndexOutput) to it, but for reading (IndexInput), it allocates a single byte[] and fully reads the file in and then serves requests off that single byte[]. It's more GC friendly than RAMDir since it only allocates a single array per file. It has a few nocommits still, but all tests pass if I wrap the delegate inside MockDirectoryWrapper using this. I tested with 1M Wikipedia english index (would like to test w/ 10M docs but I don't have enough RAM...); it seems to give a nice speedup: {noformat} TaskQPS base StdDev base QPS cachedStdDev cached Pct diff Respell 197.007.27 203.198.17 -4% - 11% PKLookup 121.122.80 125.463.20 -1% - 8% Fuzzy2 66.622.62 69.912.85 -3% - 13% Fuzzy1 206.206.47 222.216.521% - 14% TermGroup100K 160.146.62 175.713.793% - 16% Phrase 34.850.40 38.750.618% - 14% TermBGroup100K 363.75 15.74 406.98 13.233% - 20% SpanNear 53.081.11 59.532.944% - 20% TermBGroup100K1P 222.539.78 252.865.966% - 21% SloppyPhrase 70.362.05 79.954.484% - 23% Wildcard 238.104.29 272.784.97 10% - 18% OrHighMed 123.494.85 149.324.66 12% - 29% Prefix3 288.468.10 350.405.38 16% - 26% OrHighHigh 76.463.27 93.132.96 13% - 31% IntNRQ 92.252.12 113.475.74 14% - 32% Term 757.12 39.03 958.62 22.68 17% - 36% AndHighHigh 103.034.48 133.893.76 21% - 39% AndHighMed 376.36 16.58 493.99 10.00 23% - 40% {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see:
[jira] [Commented] (LUCENE-4123) Add CachingRAMDirectory
[ https://issues.apache.org/jira/browse/LUCENE-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13291886#comment-13291886 ] Michael McCandless commented on LUCENE-4123: bq. I dont think it buys anything to code dup the readVint/vlong here. it should be compiled to the same code. e.g. mmapdir doesnt do this. I think you're right! Here are the results w/ the code dup removed (same static seed as previous 5M doc results): {noformat} TaskQPS base StdDev base QPS cachedStdDev cached Pct diff IntNRQ 16.360.86 16.920.75 -6% - 14% TermBGroup1M1P 91.713.03 95.073.94 -3% - 11% TermGroup1M 58.141.00 60.381.530% - 8% TermBGroup1M 103.111.76 108.142.630% - 9% Prefix3 108.830.97 115.052.892% - 9% Wildcard 67.270.72 71.221.712% - 9% Respell 102.297.78 109.087.22 -7% - 23% Fuzzy2 42.462.95 45.513.31 -7% - 23% Fuzzy1 72.463.55 77.964.51 -3% - 19% Term 247.45 17.73 268.17 12.28 -3% - 22% OrHighMed 22.381.19 24.471.64 -3% - 23% OrHighHigh 18.010.92 19.711.20 -2% - 22% AndHighHigh 30.790.35 33.800.377% - 12% PKLookup 84.712.40 93.952.325% - 16% SpanNear 10.540.13 12.020.13 11% - 16% AndHighMed 119.181.05 136.641.80 12% - 17% SloppyPhrase 15.500.15 18.260.30 14% - 20% Phrase 20.640.12 24.940.48 17% - 23% {noformat} So I'll remove the code dup. Add CachingRAMDirectory --- Key: LUCENE-4123 URL: https://issues.apache.org/jira/browse/LUCENE-4123 Project: Lucene - Java Issue Type: Bug Components: core/store Reporter: Michael McCandless Assignee: Michael McCandless Attachments: LUCENE-4123.patch The directory is very simple and useful if you have an index that you know fully fits into available RAM. You could also use FileSwitchDir if you want to leave some files (eg stored fields or term vectors) on disk. It wraps any other Directory and delegates all writing (IndexOutput) to it, but for reading (IndexInput), it allocates a single byte[] and fully reads the file in and then serves requests off that single byte[]. It's more GC friendly than RAMDir since it only allocates a single array per file. It has a few nocommits still, but all tests pass if I wrap the delegate inside MockDirectoryWrapper using this. I tested with 1M Wikipedia english index (would like to test w/ 10M docs but I don't have enough RAM...); it seems to give a nice speedup: {noformat} TaskQPS base StdDev base QPS cachedStdDev cached Pct diff Respell 197.007.27 203.198.17 -4% - 11% PKLookup 121.122.80 125.463.20 -1% - 8% Fuzzy2 66.622.62 69.912.85 -3% - 13% Fuzzy1 206.206.47 222.216.521% - 14% TermGroup100K 160.146.62 175.713.793% - 16% Phrase 34.850.40 38.750.618% - 14% TermBGroup100K 363.75 15.74 406.98 13.233% - 20% SpanNear 53.081.11 59.532.944% - 20% TermBGroup100K1P 222.539.78 252.865.966% - 21% SloppyPhrase 70.362.05 79.954.484% - 23% Wildcard 238.104.29 272.784.97 10% - 18% OrHighMed 123.494.85 149.324.66 12% - 29% Prefix3 288.468.10 350.405.38 16% - 26% OrHighHigh 76.463.27 93.132.96 13% - 31% IntNRQ 92.252.12 113.475.74 14% - 32% Term 757.12 39.03 958.62 22.68 17% - 36% AndHighHigh 103.034.48 133.893.76 21% - 39% AndHighMed 376.36 16.58 493.99 10.00 23% - 40%