[jira] [Comment Edited] (LUCENE-8438) RAMDirectory speed improvements and cleanup

2018-08-08 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572907#comment-16572907
 ] 

Uwe Schindler edited comment on LUCENE-8438 at 8/8/18 9:07 AM:
---

And yes, I agree with you about bytebuffers: Let's use ByteBuffers and not 
byte[]. I know the problems of misuse/incorrect use of the bytebuffer APIs by 
beginners, but in general we have an abstraction, so Lucene's codecs or users 
of the new directory implementations do not need to take care. We just have to 
make the implementations in this issue correct. To me it looks ok, no major 
problems.

In addition with bytebuffers, we can (possibly) make a small knob in 
ByteBufferDirectory do use ByteBuffer.allocateDirect() instead of allocate(). 
But then we have to add the usual unmapping code, so when the file is deleted 
in the directory, we can unmap all buffers. Ah, and in Java 12 we may get 
non-volatile bytebuffers!


was (Author: thetaphi):
And yes, I agree with you about bytebuffers: Let's use ByteBuffers and not 
byte[]. I know the problems of misuse/incorrect use of the bytebuffer APIs by 
beginners, but in general we have an abstraction, so Lucene's codecs or users 
of the new directory implementations do not need to take care. We just have to 
make the implementations in this issue correct. To me it looks ok, no major 
problems.

In addition with bytebuffers, we can (possibly) make a small knob in 
ByteBufferDirectory do use ByteBuffer.allocateDirect() instead of allocate(). 
But then we have to add the usual unmapping code, so when the file is deleted 
in the directory, we can unmap all buffers.

> RAMDirectory speed improvements and cleanup
> ---
>
> Key: LUCENE-8438
> URL: https://issues.apache.org/jira/browse/LUCENE-8438
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Attachments: capture-1.png, capture-4.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> RAMDirectory screams for a cleanup. It is used and abused in many places and 
> even if we discourage its use in favor of native (mmapped) buffers, there 
> seem to be benefits of keeping RAMDirectory available (quick throw-away 
> indexes without the need to setup external tmpfs, for example).
> Currently RAMDirectory performs very poorly under concurrent loads. The 
> implementation is also open for all sorts of abuses – the streams can be 
> reset and are used all around the place as temporary buffers, even without 
> the presence of RAMDirectory itself. This complicates the implementation and 
> is pretty confusing.
> An example of how dramatically slow RAMDirectory is under concurrent load, 
> consider this PoC pseudo-benchmark. It creates a single monolithic segment 
> with 500K very short documents (single field, with norms). The index is ~60MB 
> once created. We then run semi-complex Boolean queries on top of that index 
> from N concurrent threads. The attached capture-4 shows the result (queries 
> per second over 5-second spans) for a varying number of concurrent threads on 
> an AWS machine with 32 CPUs available (of which it seems 16 seem to be real, 
> 16 hyper-threaded). That red line at the bottom (which drops compared to a 
> single-threaded performance) is the current RAMDirectory. RAMDirectory2 is an 
> alternative implementation I wrote that uses ByteBuffers. Yes, it's slower 
> than the native mmapped implementation, but a *lot* faster then the current 
> RAMDirectory (and more GC-friendly because it uses dynamic progressive block 
> scaling internally).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8438) RAMDirectory speed improvements and cleanup

2018-08-08 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16572907#comment-16572907
 ] 

Uwe Schindler edited comment on LUCENE-8438 at 8/8/18 9:06 AM:
---

And yes, I agree with you about bytebuffers: Let's use ByteBuffers and not 
byte[]. I know the problems of misuse/incorrect use of the bytebuffer APIs by 
beginners, but in general we have an abstraction, so Lucene's codecs or users 
of the new directory implementations do not need to take care. We just have to 
make the implementations in this issue correct. To me it looks ok, no major 
problems.

In addition with bytebuffers, we can (possibly) make a small knob in 
ByteBufferDirectory do use ByteBuffer.allocateDirect() instead of allocate(). 
But then we have to add the usual unmapping code, so when the file is deleted 
in the directory, we can unmap all buffers.


was (Author: thetaphi):
And yes, I agree with you about bytebuffers: Let's use ByteBuffers and not 
byte[]. I know the problems of misuse/incorrect use of the bytebuffer APIs by 
beginners, but in general we have an abstraction, so Lucene's codecs or users 
of the new directory implementations do not need to take care. We just have to 
make the implementations in this issue correct. To me it looks ok, no major 
problems.

> RAMDirectory speed improvements and cleanup
> ---
>
> Key: LUCENE-8438
> URL: https://issues.apache.org/jira/browse/LUCENE-8438
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Attachments: capture-1.png, capture-4.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> RAMDirectory screams for a cleanup. It is used and abused in many places and 
> even if we discourage its use in favor of native (mmapped) buffers, there 
> seem to be benefits of keeping RAMDirectory available (quick throw-away 
> indexes without the need to setup external tmpfs, for example).
> Currently RAMDirectory performs very poorly under concurrent loads. The 
> implementation is also open for all sorts of abuses – the streams can be 
> reset and are used all around the place as temporary buffers, even without 
> the presence of RAMDirectory itself. This complicates the implementation and 
> is pretty confusing.
> An example of how dramatically slow RAMDirectory is under concurrent load, 
> consider this PoC pseudo-benchmark. It creates a single monolithic segment 
> with 500K very short documents (single field, with norms). The index is ~60MB 
> once created. We then run semi-complex Boolean queries on top of that index 
> from N concurrent threads. The attached capture-4 shows the result (queries 
> per second over 5-second spans) for a varying number of concurrent threads on 
> an AWS machine with 32 CPUs available (of which it seems 16 seem to be real, 
> 16 hyper-threaded). That red line at the bottom (which drops compared to a 
> single-threaded performance) is the current RAMDirectory. RAMDirectory2 is an 
> alternative implementation I wrote that uses ByteBuffers. Yes, it's slower 
> than the native mmapped implementation, but a *lot* faster then the current 
> RAMDirectory (and more GC-friendly because it uses dynamic progressive block 
> scaling internally).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8438) RAMDirectory speed improvements and cleanup

2018-08-07 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16571426#comment-16571426
 ] 

Dawid Weiss edited comment on LUCENE-8438 at 8/7/18 10:33 AM:
--

This shows the QPS performance on an AWS 36-core (18 physical cores) with 
increasing thread count and various directory implementations – BBDIR is 
ByteBuffersDirectory, FSDir is Lucene's native FSDirectory, RAMDIR is current 
RAMDirectory. The variations of BBDIR relate to which IndexInput is returned: 
MANY_BUFS is multiple ByteBuffers (exactly the buffers written to IndexOutput), 
ONE_BUF is the same implementation, buf buffers rewritten into a single 
ByteBuffer (results in contiguous access and fewer block-boundary hits), 
BYTE_ARRAY is rewritten into a contiguous array and wrapped in 
ByteArrayIndexInput, LUCENE_BUFS is the original ByteBuffers wrapped in 
Lucene's ByteBuffer handling code.

My opinion is to leave LUCENE_BUFS as the default since it exhibits high 
performance, doesn't require contiguous memory allocation, etc.

!capture-1.png|width=600!


was (Author: dweiss):
This shows the QPS performance on an AWS 36-core (18 physical cores) with 
increasing thread count and various directory implementations – BBDIR is 
ByteBuffersDirectory, FSDir is Lucene's native FSDirectory, RAMDIR is current 
RAMDirectory. The variations of BBDIR relate to which IndexInput is returned: 
MANY_BUFS is multiple ByteBuffers (exactly the buffers written to IndexOutput), 
ONE_BUF is the same implementation, buf buffers rewritten into a single 
ByteBuffer (results in contiguous access and fewer block-boundary hits), 
BYTE_ARRAY is rewritten into a contiguous array and wrapped in 
ByteArrayIndexInput, LUCENE_BUFS is the original ByteBuffers wrapped in 
Lucene's ByteBuffer handling code.

My opinion is to leave LUCENE_BUFS as the default since it exhibits high 
performance, doesn't require contiguous memory allocation, etc.

!capture-1.png|width=90!

> RAMDirectory speed improvements and cleanup
> ---
>
> Key: LUCENE-8438
> URL: https://issues.apache.org/jira/browse/LUCENE-8438
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Minor
> Attachments: capture-1.png, capture-4.png
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> RAMDirectory screams for a cleanup. It is used and abused in many places and 
> even if we discourage its use in favor of native (mmapped) buffers, there 
> seem to be benefits of keeping RAMDirectory available (quick throw-away 
> indexes without the need to setup external tmpfs, for example).
> Currently RAMDirectory performs very poorly under concurrent loads. The 
> implementation is also open for all sorts of abuses – the streams can be 
> reset and are used all around the place as temporary buffers, even without 
> the presence of RAMDirectory itself. This complicates the implementation and 
> is pretty confusing.
> An example of how dramatically slow RAMDirectory is under concurrent load, 
> consider this PoC pseudo-benchmark. It creates a single monolithic segment 
> with 500K very short documents (single field, with norms). The index is ~60MB 
> once created. We then run semi-complex Boolean queries on top of that index 
> from N concurrent threads. The attached capture-4 shows the result (queries 
> per second over 5-second spans) for a varying number of concurrent threads on 
> an AWS machine with 32 CPUs available (of which it seems 16 seem to be real, 
> 16 hyper-threaded). That red line at the bottom (which drops compared to a 
> single-threaded performance) is the current RAMDirectory. RAMDirectory2 is an 
> alternative implementation I wrote that uses ByteBuffers. Yes, it's slower 
> than the native mmapped implementation, but a *lot* faster then the current 
> RAMDirectory (and more GC-friendly because it uses dynamic progressive block 
> scaling internally).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org