[jira] [Updated] (LUCENE-8783) Add FST Offheap for non-default Codecs

2019-04-29 Thread Ankit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Jain updated LUCENE-8783:
---
Description: 
Even though, LUCENE-8635 and LUCENE-8671 adds support to keep FST offheap for 
default codec, there are many other codecs which do not support FST offheap. 
Few examples are below:

* CompletionPostingsFormat
* BlockTreeOrdsPostingsFormat
* IDVersionPostingsFormat

  was:
Even though, LUCENE-8635 and LUCENE-8671 adds support to keep FST offheap for 
default codec, there are many other codecs which do not support this. Few 
examples are below:

* CompletionPostingsFormat
* BlockTreeOrdsPostingsFormat
* IDVersionPostingsFormat


> Add FST Offheap for non-default Codecs
> --
>
> Key: LUCENE-8783
> URL: https://issues.apache.org/jira/browse/LUCENE-8783
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs
>Reporter: Ankit Jain
>Priority: Major
> Fix For: 8.0, 8.x, master (9.0)
>
>
> Even though, LUCENE-8635 and LUCENE-8671 adds support to keep FST offheap for 
> default codec, there are many other codecs which do not support FST offheap. 
> Few examples are below:
> * CompletionPostingsFormat
> * BlockTreeOrdsPostingsFormat
> * IDVersionPostingsFormat



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8783) Add FST Offheap for non-default Codecs

2019-04-29 Thread Ankit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Jain updated LUCENE-8783:
---
Description: 
Even though, LUCENE-8635 and LUCENE-8671 adds support to keep FST offheap for 
default codec, there are many other codecs which do not support this. Few 
examples are below:

* CompletionPostingsFormat
* BlockTreeOrdsPostingsFormat
* IDVersionPostingsFormat

  was:
Even though, [~LUCENE-8635] and [~LUCENE-8671]adds support to keep FST offheap 
for default codec, there are many other codecs which do not support this. Few 
examples are below:

* CompletionPostingsFormat
* BlockTreeOrdsPostingsFormat
* IDVersionPostingsFormat


> Add FST Offheap for non-default Codecs
> --
>
> Key: LUCENE-8783
> URL: https://issues.apache.org/jira/browse/LUCENE-8783
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs
>Reporter: Ankit Jain
>Priority: Major
> Fix For: 8.0, 8.x, master (9.0)
>
>
> Even though, LUCENE-8635 and LUCENE-8671 adds support to keep FST offheap for 
> default codec, there are many other codecs which do not support this. Few 
> examples are below:
> * CompletionPostingsFormat
> * BlockTreeOrdsPostingsFormat
> * IDVersionPostingsFormat



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8783) Add FST Offheap for non-default Codecs

2019-04-29 Thread Ankit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Jain updated LUCENE-8783:
---
Description: 
Even though, [~LUCENE-8635] and [~LUCENE-8671]adds support to keep FST offheap 
for default codec, there are many other codecs which do not support this. Few 
examples are below:

* CompletionPostingsFormat
* BlockTreeOrdsPostingsFormat
* IDVersionPostingsFormat

  was:
Even though, [^LUCENE-8635] and [^LUCENE-8671]adds support to keep FST offheap 
for default codec, there are many other codecs which do not support this. Few 
examples are below:

* CompletionPostingsFormat
* BlockTreeOrdsPostingsFormat
* IDVersionPostingsFormat


> Add FST Offheap for non-default Codecs
> --
>
> Key: LUCENE-8783
> URL: https://issues.apache.org/jira/browse/LUCENE-8783
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs
>Reporter: Ankit Jain
>Priority: Major
> Fix For: 8.0, 8.x, master (9.0)
>
>
> Even though, [~LUCENE-8635] and [~LUCENE-8671]adds support to keep FST 
> offheap for default codec, there are many other codecs which do not support 
> this. Few examples are below:
> * CompletionPostingsFormat
> * BlockTreeOrdsPostingsFormat
> * IDVersionPostingsFormat



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8783) Add FST Offheap for non-default Codecs

2019-04-29 Thread Ankit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Jain updated LUCENE-8783:
---
Description: 
Even though, [^LUCENE-8635] and [^LUCENE-8671]adds support to keep FST offheap 
for default codec, there are many other codecs which do not support this. Few 
examples are below:

* CompletionPostingsFormat
* BlockTreeOrdsPostingsFormat
* IDVersionPostingsFormat

  was:
Even though, [LUCENE-8635](https://issues.apache.org/jira/browse/LUCENE-8635) 
and [LUCENE-8671](https://issues.apache.org/jira/browse/LUCENE-8671) adds 
support to keep FST offheap for default codec, there are many other codecs 
which do not support this. Few examples are below:

* CompletionPostingsFormat
* BlockTreeOrdsPostingsFormat
* IDVersionPostingsFormat


> Add FST Offheap for non-default Codecs
> --
>
> Key: LUCENE-8783
> URL: https://issues.apache.org/jira/browse/LUCENE-8783
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs
>Reporter: Ankit Jain
>Priority: Major
> Fix For: 8.0, 8.x, master (9.0)
>
>
> Even though, [^LUCENE-8635] and [^LUCENE-8671]adds support to keep FST 
> offheap for default codec, there are many other codecs which do not support 
> this. Few examples are below:
> * CompletionPostingsFormat
> * BlockTreeOrdsPostingsFormat
> * IDVersionPostingsFormat



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8783) Add FST Offheap for non-default Codecs

2019-04-29 Thread Ankit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Jain updated LUCENE-8783:
---
Description: 
Even though, [LUCENE-8635](https://issues.apache.org/jira/browse/LUCENE-8635) 
and [LUCENE-8671](https://issues.apache.org/jira/browse/LUCENE-8671) adds 
support to keep FST offheap for default codec, there are many other codecs 
which do not support this. Few examples are below:

* CompletionPostingsFormat
* BlockTreeOrdsPostingsFormat
* IDVersionPostingsFormat

  was:Even though, 
[LUCENE-8635](https://issues.apache.org/jira/browse/LUCENE-8635) and 
[LUCENE-8671](https://issues.apache.org/jira/browse/LUCENE-8671) adds sup


> Add FST Offheap for non-default Codecs
> --
>
> Key: LUCENE-8783
> URL: https://issues.apache.org/jira/browse/LUCENE-8783
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs
>Reporter: Ankit Jain
>Priority: Major
> Fix For: 8.0, 8.x, master (9.0)
>
>
> Even though, [LUCENE-8635](https://issues.apache.org/jira/browse/LUCENE-8635) 
> and [LUCENE-8671](https://issues.apache.org/jira/browse/LUCENE-8671) adds 
> support to keep FST offheap for default codec, there are many other codecs 
> which do not support this. Few examples are below:
> * CompletionPostingsFormat
> * BlockTreeOrdsPostingsFormat
> * IDVersionPostingsFormat



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8783) Support FST lazy loading for non-default Codecs

2019-04-29 Thread Ankit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Jain updated LUCENE-8783:
---
Description: Even though, 
[LUCENE-8635](https://issues.apache.org/jira/browse/LUCENE-8635) and 
[LUCENE-8671](https://issues.apache.org/jira/browse/LUCENE-8671) adds sup  
(was: Currently, FST loads all the terms into heap memory during index open. 
This causes frequent JVM OOM issues if the term size gets big. A better way of 
doing this will be to lazily load FST using mmap. That ensures only the 
required terms get loaded into memory.

 
Lucene can expose API for providing list of fields to load terms offheap. I'm 
planning to take following approach for this:
 # Add a boolean property fstOffHeap in FieldInfo
 # Pass list of offheap fields to lucene during index open (ALL can be special 
keyword for loading ALL fields offheap)
 # Initialize the fstOffHeap property during lucene index open
 # FieldReader invokes default FST constructor or OffHeap constructor based on 
fstOffHeap field

 
I created a patch (that loads all fields offheap), did some benchmarks using 
es_rally and results look good.)

> Support FST lazy loading for non-default Codecs
> ---
>
> Key: LUCENE-8783
> URL: https://issues.apache.org/jira/browse/LUCENE-8783
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs
>Reporter: Ankit Jain
>Priority: Major
> Fix For: 8.0, 8.x, master (9.0)
>
>
> Even though, [LUCENE-8635](https://issues.apache.org/jira/browse/LUCENE-8635) 
> and [LUCENE-8671](https://issues.apache.org/jira/browse/LUCENE-8671) adds sup



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8783) Add FST Offheap for non-default Codecs

2019-04-29 Thread Ankit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Jain updated LUCENE-8783:
---
Summary: Add FST Offheap for non-default Codecs  (was: Support FST lazy 
loading for non-default Codecs)

> Add FST Offheap for non-default Codecs
> --
>
> Key: LUCENE-8783
> URL: https://issues.apache.org/jira/browse/LUCENE-8783
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs
>Reporter: Ankit Jain
>Priority: Major
> Fix For: 8.0, 8.x, master (9.0)
>
>
> Even though, [LUCENE-8635](https://issues.apache.org/jira/browse/LUCENE-8635) 
> and [LUCENE-8671](https://issues.apache.org/jira/browse/LUCENE-8671) adds sup



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8783) Support FST lazy loading for non-default Codecs

2019-04-29 Thread Ankit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Jain updated LUCENE-8783:
---
Review Patch?:   (was: Yes)

> Support FST lazy loading for non-default Codecs
> ---
>
> Key: LUCENE-8783
> URL: https://issues.apache.org/jira/browse/LUCENE-8783
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs
>Reporter: Ankit Jain
>Priority: Major
> Fix For: 8.0, 8.x, master (9.0)
>
>
> Currently, FST loads all the terms into heap memory during index open. This 
> causes frequent JVM OOM issues if the term size gets big. A better way of 
> doing this will be to lazily load FST using mmap. That ensures only the 
> required terms get loaded into memory.
>  
> Lucene can expose API for providing list of fields to load terms offheap. I'm 
> planning to take following approach for this:
>  # Add a boolean property fstOffHeap in FieldInfo
>  # Pass list of offheap fields to lucene during index open (ALL can be 
> special keyword for loading ALL fields offheap)
>  # Initialize the fstOffHeap property during lucene index open
>  # FieldReader invokes default FST constructor or OffHeap constructor based 
> on fstOffHeap field
>  
> I created a patch (that loads all fields offheap), did some benchmarks using 
> es_rally and results look good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8783) Support FST lazy loading for non-default Codecs

2019-04-29 Thread Ankit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Jain updated LUCENE-8783:
---
Lucene Fields: New  (was: New,Patch Available)

> Support FST lazy loading for non-default Codecs
> ---
>
> Key: LUCENE-8783
> URL: https://issues.apache.org/jira/browse/LUCENE-8783
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs
>Reporter: Ankit Jain
>Priority: Major
> Fix For: 8.0, 8.x, master (9.0)
>
>
> Currently, FST loads all the terms into heap memory during index open. This 
> causes frequent JVM OOM issues if the term size gets big. A better way of 
> doing this will be to lazily load FST using mmap. That ensures only the 
> required terms get loaded into memory.
>  
> Lucene can expose API for providing list of fields to load terms offheap. I'm 
> planning to take following approach for this:
>  # Add a boolean property fstOffHeap in FieldInfo
>  # Pass list of offheap fields to lucene during index open (ALL can be 
> special keyword for loading ALL fields offheap)
>  # Initialize the fstOffHeap property during lucene index open
>  # FieldReader invokes default FST constructor or OffHeap constructor based 
> on fstOffHeap field
>  
> I created a patch (that loads all fields offheap), did some benchmarks using 
> es_rally and results look good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8783) Support FST lazy loading for non-default Codecs

2019-04-29 Thread Ankit Jain (JIRA)
Ankit Jain created LUCENE-8783:
--

 Summary: Support FST lazy loading for non-default Codecs
 Key: LUCENE-8783
 URL: https://issues.apache.org/jira/browse/LUCENE-8783
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/FSTs
 Environment: I used below setup for es_rally tests:

single node i3.xlarge running ES 6.5

es_rally was running on another i3.xlarge instance
Reporter: Ankit Jain
 Fix For: 8.0, 8.x, master (9.0)


Currently, FST loads all the terms into heap memory during index open. This 
causes frequent JVM OOM issues if the term size gets big. A better way of doing 
this will be to lazily load FST using mmap. That ensures only the required 
terms get loaded into memory.

 
Lucene can expose API for providing list of fields to load terms offheap. I'm 
planning to take following approach for this:
 # Add a boolean property fstOffHeap in FieldInfo
 # Pass list of offheap fields to lucene during index open (ALL can be special 
keyword for loading ALL fields offheap)
 # Initialize the fstOffHeap property during lucene index open
 # FieldReader invokes default FST constructor or OffHeap constructor based on 
fstOffHeap field

 
I created a patch (that loads all fields offheap), did some benchmarks using 
es_rally and results look good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8783) Support FST lazy loading for non-default Codecs

2019-04-29 Thread Ankit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Jain updated LUCENE-8783:
---
Environment: (was: I used below setup for es_rally tests:

single node i3.xlarge running ES 6.5

es_rally was running on another i3.xlarge instance)

> Support FST lazy loading for non-default Codecs
> ---
>
> Key: LUCENE-8783
> URL: https://issues.apache.org/jira/browse/LUCENE-8783
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs
>Reporter: Ankit Jain
>Priority: Major
> Fix For: 8.0, 8.x, master (9.0)
>
>
> Currently, FST loads all the terms into heap memory during index open. This 
> causes frequent JVM OOM issues if the term size gets big. A better way of 
> doing this will be to lazily load FST using mmap. That ensures only the 
> required terms get loaded into memory.
>  
> Lucene can expose API for providing list of fields to load terms offheap. I'm 
> planning to take following approach for this:
>  # Add a boolean property fstOffHeap in FieldInfo
>  # Pass list of offheap fields to lucene during index open (ALL can be 
> special keyword for loading ALL fields offheap)
>  # Initialize the fstOffHeap property during lucene index open
>  # FieldReader invokes default FST constructor or OffHeap constructor based 
> on fstOffHeap field
>  
> I created a patch (that loads all fields offheap), did some benchmarks using 
> es_rally and results look good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8671) Add setting for moving FST offheap/onheap

2019-03-09 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16788340#comment-16788340
 ] 

Ankit Jain edited comment on LUCENE-8671 at 3/10/19 1:01 AM:
-

[~simonw] [~mikemccand] I have created PR - 
https://github.com/apache/lucene-solr/pull/601 with the code change to keep 
reader attributes in the index writer config. Please take a look and give 
feedback.


was (Author: akjain):
[~simonw] [~mikemccand] I have created PR - 
https://github.com/apache/lucene-solr/pull/601 with the code change for having 
reader attributes in the index writer config. Please take a look.

> Add setting for moving FST offheap/onheap
> -
>
> Key: LUCENE-8671
> URL: https://issues.apache.org/jira/browse/LUCENE-8671
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs, core/store
>Reporter: Ankit Jain
>Priority: Minor
> Attachments: offheap_generic_settings.patch, offheap_settings.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> While LUCENE-8635, adds support for loading FST offheap using mmap, users do 
> not have the  flexibility to specify fields for which FST needs to be 
> offheap. This allows users to tune heap usage as per their workload.
> Ideal way will be to add an attribute to FieldInfo, where we have 
> put/getAttribute. Then FieldReader can inspect the FieldInfo and pass the 
> appropriate On/OffHeapStore when creating its FST. It can support special 
> keywords like ALL/NONE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8671) Add setting for moving FST offheap/onheap

2019-03-08 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16788340#comment-16788340
 ] 

Ankit Jain commented on LUCENE-8671:


[~simonw] [~mikemccand] I have created PR - 
https://github.com/apache/lucene-solr/pull/601 with the code change for having 
reader attributes in the index writer config. Please take a look.

> Add setting for moving FST offheap/onheap
> -
>
> Key: LUCENE-8671
> URL: https://issues.apache.org/jira/browse/LUCENE-8671
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs, core/store
>Reporter: Ankit Jain
>Priority: Minor
> Attachments: offheap_generic_settings.patch, offheap_settings.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> While LUCENE-8635, adds support for loading FST offheap using mmap, users do 
> not have the  flexibility to specify fields for which FST needs to be 
> offheap. This allows users to tune heap usage as per their workload.
> Ideal way will be to add an attribute to FieldInfo, where we have 
> put/getAttribute. Then FieldReader can inspect the FieldInfo and pass the 
> appropriate On/OffHeapStore when creating its FST. It can support special 
> keywords like ALL/NONE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8671) Add setting for moving FST offheap/onheap

2019-03-06 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16786133#comment-16786133
 ] 

Ankit Jain commented on LUCENE-8671:


bq. We can then pass it down to the relevant parts and make it part of 
`SegmentReaderState`? This map can also be passed via IndexWriterConfig for the 
NRT case. That way we can pass stuff per DirectoryReader open which is what we 
want I guess.

[~simonw] you're spot on with what we want here. Let me try it out and see how 
the code change looks.

> Add setting for moving FST offheap/onheap
> -
>
> Key: LUCENE-8671
> URL: https://issues.apache.org/jira/browse/LUCENE-8671
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs, core/store
>Reporter: Ankit Jain
>Priority: Minor
> Attachments: offheap_generic_settings.patch, offheap_settings.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> While LUCENE-8635, adds support for loading FST offheap using mmap, users do 
> not have the  flexibility to specify fields for which FST needs to be 
> offheap. This allows users to tune heap usage as per their workload.
> Ideal way will be to add an attribute to FieldInfo, where we have 
> put/getAttribute. Then FieldReader can inspect the FieldInfo and pass the 
> appropriate On/OffHeapStore when creating its FST. It can support special 
> keywords like ALL/NONE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8671) Add setting for moving FST offheap/onheap

2019-02-22 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16775680#comment-16775680
 ] 

Ankit Jain commented on LUCENE-8671:


bq. Ankit Jain We could maybe add a setter on BlockTreeTermsWriter?  And it'd 
write that setting into the index, and BlockTreeTermsReader would read that and 
then load FSTs on or off heap.

[~mikemccand] This sounds pretty good, except that setting is write time. Isn't 
there a way to make this read time setting? If not, isn't making this system 
property a better option?
Though, I'm happy to go with BlockTreeTermsWriter approach if nobody has better 
suggestion. Maybe [~jpountz] has any ideas.

> Add setting for moving FST offheap/onheap
> -
>
> Key: LUCENE-8671
> URL: https://issues.apache.org/jira/browse/LUCENE-8671
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs, core/store
>Reporter: Ankit Jain
>Priority: Minor
> Attachments: offheap_generic_settings.patch, offheap_settings.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> While LUCENE-8635, adds support for loading FST offheap using mmap, users do 
> not have the  flexibility to specify fields for which FST needs to be 
> offheap. This allows users to tune heap usage as per their workload.
> Ideal way will be to add an attribute to FieldInfo, where we have 
> put/getAttribute. Then FieldReader can inspect the FieldInfo and pass the 
> appropriate On/OffHeapStore when creating its FST. It can support special 
> keywords like ALL/NONE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8671) Add setting for moving FST offheap/onheap

2019-02-19 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772641#comment-16772641
 ] 

Ankit Jain commented on LUCENE-8671:


[~mikemccand] Also, I'm wondering if we need per field setting now. Since, as 
part of [^LUCENE-8635] offheap is default for non PK eligible fields, we can 
keep it simple by having index level setting which if set to true, puts PK 
eligible fields offheap as well?

> Add setting for moving FST offheap/onheap
> -
>
> Key: LUCENE-8671
> URL: https://issues.apache.org/jira/browse/LUCENE-8671
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs, core/store
>Reporter: Ankit Jain
>Priority: Minor
> Attachments: offheap_generic_settings.patch, offheap_settings.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> While LUCENE-8635, adds support for loading FST offheap using mmap, users do 
> not have the  flexibility to specify fields for which FST needs to be 
> offheap. This allows users to tune heap usage as per their workload.
> Ideal way will be to add an attribute to FieldInfo, where we have 
> put/getAttribute. Then FieldReader can inspect the FieldInfo and pass the 
> appropriate On/OffHeapStore when creating its FST. It can support special 
> keywords like ALL/NONE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8671) Add setting for moving FST offheap/onheap

2019-02-19 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16772384#comment-16772384
 ] 

Ankit Jain commented on LUCENE-8671:


bq. But can you use the existing attributes instead of adding a new 
readerAttributes?  And could we make this something a custom Codec impl would 
set?  Then we shouldn't need any changes to FieldInfo.java, IndexWriter.java, 
LiveIndexWriterConfig.java, etc.  We'd just make a custom codec setting this 
attribute for fields where we want to override Lucene's (BlockTreeTermReader's) 
default behavior.  Yes, it'd mean one must commit at indexing time as to which 
fields will be on vs off heap at search time, but I think that's an OK tradeoff?
I like this idea, just did not want it to be indexing time decision. Given 
performance implications are not significant and we were discussing making 
offheap as default earlier, most users eventually will have it on during 
indexing also.

> Add setting for moving FST offheap/onheap
> -
>
> Key: LUCENE-8671
> URL: https://issues.apache.org/jira/browse/LUCENE-8671
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs, core/store
>Reporter: Ankit Jain
>Priority: Minor
> Attachments: offheap_generic_settings.patch, offheap_settings.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> While LUCENE-8635, adds support for loading FST offheap using mmap, users do 
> not have the  flexibility to specify fields for which FST needs to be 
> offheap. This allows users to tune heap usage as per their workload.
> Ideal way will be to add an attribute to FieldInfo, where we have 
> put/getAttribute. Then FieldReader can inspect the FieldInfo and pass the 
> appropriate On/OffHeapStore when creating its FST. It can support special 
> keywords like ALL/NONE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

2019-02-10 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16764370#comment-16764370
 ] 

Ankit Jain edited comment on LUCENE-8635 at 2/10/19 9:51 AM:
-

I added print statements while running the benchmarks, and the classification 
looks correct:
{code}
Initializing field offheap start=55 field=Date.taxonomy
Initializing field offheap start=76 field=DayOfYear.sortedset
Initializing field offheap start=97 field=Month.sortedset
Initializing field offheap start=118 field=body
Initializing field onheap start=267 field=date
Initializing field onheap start=289 field=groupend
Initializing field onheap start=311 field=id
Initializing field onheap start=333 field=title
{code}
Though, when I restricted tests to PKLookups only using 
comp.addTaskPattern('PKLookup') in localrun.py, results look as expected:
{code:title=wikimedium10k|borderStyle=solid}
TaskQPS   baseline  StdDevQPS candidate   StdDev   Pct diff 

PKLookup  163.29(1.6%)  164.80  (2.1%)   0.9% (-2% 
- 4%)  
{code}
{code:title=wikimedium10m|borderStyle=solid}  
TaskQPS  baseline  StdDevQPS  candidateStdDev Pct diff
PKLookup  114.29(1.7%) 114.73   (1.2%) 0.4% ( -2% - 
3%)   
{code}
It seems we are good with this change then.


was (Author: akjain):
I added print statements while running the benchmarks, and the classification 
looks correct:
```
Initializing field offheap start=55 field=Date.taxonomy
Initializing field offheap start=76 field=DayOfYear.sortedset
Initializing field offheap start=97 field=Month.sortedset
Initializing field offheap start=118 field=body
Initializing field onheap start=267 field=date
Initializing field onheap start=289 field=groupend
Initializing field onheap start=311 field=id
Initializing field onheap start=333 field=title
```
Though, when I restricted tests to PKLookups only using 
comp.addTaskPattern('PKLookup') in localrun.py, results look as expected:
```
wikimedium10k 
TaskQPS   baseline  StdDevQPS candidate   StdDev   Pct diff 

PKLookup  163.29(1.6%)  164.80  (2.1%)   0.9% (-2% 
- 4%)  
```
```
wikimedium10m
TaskQPS  baseline  StdDevQPS  candidateStdDev Pct diff
PKLookup  114.29(1.7%) 114.73   (1.2%) 0.4% ( -2% - 
3%)   
```
I guess we are good then.

> Lazy loading Lucene FST offheap using mmap
> --
>
> Key: LUCENE-8635
> URL: https://issues.apache.org/jira/browse/LUCENE-8635
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs
> Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
>Reporter: Ankit Jain
>Priority: Major
> Attachments: fst-offheap-ra-rev.patch, fst-offheap-rev.patch, 
> offheap.patch, optional_offheap_ra.patch, ra.patch, rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This 
> causes frequent JVM OOM issues if the term size gets big. A better way of 
> doing this will be to lazily load FST using mmap. That ensures only the 
> required terms get loaded into memory.
>  
> Lucene can expose API for providing list of fields to load terms offheap. I'm 
> planning to take following approach for this:
>  # Add a boolean property fstOffHeap in FieldInfo
>  # Pass list of offheap fields to lucene during index open (ALL can be 
> special keyword for loading ALL fields offheap)
>  # Initialize the fstOffHeap property during lucene index open
>  # FieldReader invokes default FST constructor or OffHeap constructor based 
> on fstOffHeap field
>  
> I created a patch (that loads all fields offheap), did some benchmarks using 
> es_rally and results look good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

2019-02-10 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16764370#comment-16764370
 ] 

Ankit Jain commented on LUCENE-8635:


I added print statements while running the benchmarks, and the classification 
looks correct:
```
Initializing field offheap start=55 field=Date.taxonomy
Initializing field offheap start=76 field=DayOfYear.sortedset
Initializing field offheap start=97 field=Month.sortedset
Initializing field offheap start=118 field=body
Initializing field onheap start=267 field=date
Initializing field onheap start=289 field=groupend
Initializing field onheap start=311 field=id
Initializing field onheap start=333 field=title
```
Though, when I restricted tests to PKLookups only using 
comp.addTaskPattern('PKLookup') in localrun.py, results look as expected:
```
wikimedium10k 
TaskQPS   baseline  StdDevQPS candidate   StdDev   Pct diff 

PKLookup  163.29(1.6%)  164.80  (2.1%)   0.9% (-2% 
- 4%)  
```
```
wikimedium10m
TaskQPS  baseline  StdDevQPS  candidateStdDev Pct diff
PKLookup  114.29(1.7%) 114.73   (1.2%) 0.4% ( -2% - 
3%)   
```
I guess we are good then.

> Lazy loading Lucene FST offheap using mmap
> --
>
> Key: LUCENE-8635
> URL: https://issues.apache.org/jira/browse/LUCENE-8635
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs
> Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
>Reporter: Ankit Jain
>Priority: Major
> Attachments: fst-offheap-ra-rev.patch, fst-offheap-rev.patch, 
> offheap.patch, optional_offheap_ra.patch, ra.patch, rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This 
> causes frequent JVM OOM issues if the term size gets big. A better way of 
> doing this will be to lazily load FST using mmap. That ensures only the 
> required terms get loaded into memory.
>  
> Lucene can expose API for providing list of fields to load terms offheap. I'm 
> planning to take following approach for this:
>  # Add a boolean property fstOffHeap in FieldInfo
>  # Pass list of offheap fields to lucene during index open (ALL can be 
> special keyword for loading ALL fields offheap)
>  # Initialize the fstOffHeap property during lucene index open
>  # FieldReader invokes default FST constructor or OffHeap constructor based 
> on fstOffHeap field
>  
> I created a patch (that loads all fields offheap), did some benchmarks using 
> es_rally and results look good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8671) Add setting for moving FST offheap/onheap

2019-02-08 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16764055#comment-16764055
 ] 

Ankit Jain commented on LUCENE-8671:


Hi David,

Thanks for the feedback.
{quote}Modifying FieldInfo feels wrong to me.  This is a setting that could 
only apply to a subset of our PostingsFormat implementations.  It's not 
fundamental to the metadata FieldInfo tracks.  I'd prefer a more general 
per-field name=value setting approach{quote}
I have added more generic reader settings map to FieldInfo in 
[^offheap_generic_settings.patch] that can be used for other purposes as well.

{quote}There are plenty of other settings to our postings formats that don't 
get such 1st class treatment. It's true that it's not "easy" to make these 
low-level settings changes but this doesn't feel like the right way. {quote}
Just for my understanding, since I'm pretty new, can you give example of some 
of those settings?

Thanks
Ankit



> Add setting for moving FST offheap/onheap
> -
>
> Key: LUCENE-8671
> URL: https://issues.apache.org/jira/browse/LUCENE-8671
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs, core/store
>Reporter: Ankit Jain
>Priority: Minor
> Attachments: offheap_generic_settings.patch, offheap_settings.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> While LUCENE-8635, adds support for loading FST offheap using mmap, users do 
> not have the  flexibility to specify fields for which FST needs to be 
> offheap. This allows users to tune heap usage as per their workload.
> Ideal way will be to add an attribute to FieldInfo, where we have 
> put/getAttribute. Then FieldReader can inspect the FieldInfo and pass the 
> appropriate On/OffHeapStore when creating its FST. It can support special 
> keywords like ALL/NONE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8671) Add setting for moving FST offheap/onheap

2019-02-08 Thread Ankit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Jain updated LUCENE-8671:
---
Attachment: offheap_generic_settings.patch

> Add setting for moving FST offheap/onheap
> -
>
> Key: LUCENE-8671
> URL: https://issues.apache.org/jira/browse/LUCENE-8671
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs, core/store
>Reporter: Ankit Jain
>Priority: Minor
> Attachments: offheap_generic_settings.patch, offheap_settings.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> While LUCENE-8635, adds support for loading FST offheap using mmap, users do 
> not have the  flexibility to specify fields for which FST needs to be 
> offheap. This allows users to tune heap usage as per their workload.
> Ideal way will be to add an attribute to FieldInfo, where we have 
> put/getAttribute. Then FieldReader can inspect the FieldInfo and pass the 
> appropriate On/OffHeapStore when creating its FST. It can support special 
> keywords like ALL/NONE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

2019-02-08 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16764051#comment-16764051
 ] 

Ankit Jain commented on LUCENE-8635:


{quote}Ankit Jain that's strange yeah – this patch was supposed to avoid 
kicking in for PK fields right?{quote}
[~sokolov] - Yeah, not sure what's going on. Will be great if someone can 
review the changes, in case I missed something.

> Lazy loading Lucene FST offheap using mmap
> --
>
> Key: LUCENE-8635
> URL: https://issues.apache.org/jira/browse/LUCENE-8635
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs
> Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
>Reporter: Ankit Jain
>Priority: Major
> Attachments: fst-offheap-ra-rev.patch, fst-offheap-rev.patch, 
> offheap.patch, optional_offheap_ra.patch, ra.patch, rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This 
> causes frequent JVM OOM issues if the term size gets big. A better way of 
> doing this will be to lazily load FST using mmap. That ensures only the 
> required terms get loaded into memory.
>  
> Lucene can expose API for providing list of fields to load terms offheap. I'm 
> planning to take following approach for this:
>  # Add a boolean property fstOffHeap in FieldInfo
>  # Pass list of offheap fields to lucene during index open (ALL can be 
> special keyword for loading ALL fields offheap)
>  # Initialize the fstOffHeap property during lucene index open
>  # FieldReader invokes default FST constructor or OffHeap constructor based 
> on fstOffHeap field
>  
> I created a patch (that loads all fields offheap), did some benchmarks using 
> es_rally and results look good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8671) Add setting for moving FST offheap/onheap

2019-02-06 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16762028#comment-16762028
 ] 

Ankit Jain commented on LUCENE-8671:


I have added [^offheap_settings.patch] that allows user to pass list of offheap 
field names through IndexWriterConfig. Interface looks clean enough from the 
user and postings format perspective. There is some passing around of 
offheapFieldNames parameter in lucene readers, but the changes are small and 
internal to Lucene.
There is minor clean up that needs to be done, just want to get some feedback 
before doing that.

> Add setting for moving FST offheap/onheap
> -
>
> Key: LUCENE-8671
> URL: https://issues.apache.org/jira/browse/LUCENE-8671
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs, core/store
>Reporter: Ankit Jain
>Priority: Minor
> Attachments: offheap_settings.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> While LUCENE-8635, adds support for loading FST offheap using mmap, users do 
> not have the  flexibility to specify fields for which FST needs to be 
> offheap. This allows users to tune heap usage as per their workload.
> Ideal way will be to add an attribute to FieldInfo, where we have 
> put/getAttribute. Then FieldReader can inspect the FieldInfo and pass the 
> appropriate On/OffHeapStore when creating its FST. It can support special 
> keywords like ALL/NONE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8671) Add setting for moving FST offheap/onheap

2019-02-06 Thread Ankit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Jain updated LUCENE-8671:
---
Attachment: offheap_settings.patch

> Add setting for moving FST offheap/onheap
> -
>
> Key: LUCENE-8671
> URL: https://issues.apache.org/jira/browse/LUCENE-8671
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs, core/store
>Reporter: Ankit Jain
>Priority: Minor
> Attachments: offheap_settings.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> While LUCENE-8635, adds support for loading FST offheap using mmap, users do 
> not have the  flexibility to specify fields for which FST needs to be 
> offheap. This allows users to tune heap usage as per their workload.
> Ideal way will be to add an attribute to FieldInfo, where we have 
> put/getAttribute. Then FieldReader can inspect the FieldInfo and pass the 
> appropriate On/OffHeapStore when creating its FST. It can support special 
> keywords like ALL/NONE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

2019-02-04 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16760118#comment-16760118
 ] 

Ankit Jain edited comment on LUCENE-8635 at 2/4/19 7:35 PM:


I have created [pull request|https://github.com/apache/lucene-solr/pull/563] 
with the proposed changes. Though surprisingly, I still see some impact on the 
PKLookup performance. This does not make sense to me, might be my perf run 
setup.

{code:title=wikimedium10m|borderStyle=solid}
TaskQPS baseline  StdDevQPS candidate  StdDev   
 Pct diff
PKLookup  117.45  (2.2%)  108.72  (2.3%)   
-7.4% ( -11% -   -3%)
OrHighNotMed 1094.23  (2.5%) 1057.88  (2.7%)   
-3.3% (  -8% -1%)
OrHighNotLow 1047.30  (1.7%) 1012.91  (2.5%)   
-3.3% (  -7% -1%)
  Fuzzy2   44.10  (2.3%)   42.71  (2.7%)   
-3.2% (  -7% -1%)
OrNotHighLow 1022.67  (2.5%)  992.28  (2.4%)   
-3.0% (  -7% -1%)
BrowseDayOfYearTaxoFacets 7907.19  (2.0%) 7677.99  (2.7%)   
-2.9% (  -7% -1%)
OrNotHighMed  866.37  (1.9%)  843.10  (2.3%)   
-2.7% (  -6% -1%)
 LowTerm 2103.58  (3.5%) 2048.98  (3.6%)   
-2.6% (  -9% -4%)
   BrowseMonthTaxoFacets 7883.86  (2.0%) 7692.48  (2.1%)   
-2.4% (  -6% -1%)
  Fuzzy1   64.44  (1.9%)   62.88  (2.3%)   
-2.4% (  -6% -1%)
   OrNotHighHigh  779.27  (2.0%)  761.04  (2.1%)   
-2.3% (  -6% -1%)
 Respell   55.60  (2.6%)   54.34  (2.3%)   
-2.3% (  -7% -2%)
   OrHighNotHigh  877.28  (2.2%)  858.10  (2.5%)   
-2.2% (  -6% -2%)
   BrowseMonthSSDVFacets   14.85  (7.9%)   14.57 (10.7%)   
-1.9% ( -18% -   18%)
 MedTerm 1984.26  (3.6%) 1947.76  (2.3%)   
-1.8% (  -7% -4%)
  AndHighLow  718.71  (1.5%)  706.06  (1.6%)   
-1.8% (  -4% -1%)
   OrHighLow  523.40  (2.5%)  515.56  (2.4%)   
-1.5% (  -6% -3%)
HighTerm 1381.10  (2.9%) 1360.80  (2.7%)   
-1.5% (  -6% -4%)
   HighTermMonthSort  120.45 (12.3%)  119.00 (16.4%)   
-1.2% ( -26% -   31%)
BrowseDayOfYearSSDVFacets   11.55  (9.7%)   11.45 (10.0%)   
-0.8% ( -18% -   20%)
  AndHighMed  155.15  (2.6%)  154.25  (2.4%)   
-0.6% (  -5% -4%)
   OrHighMed   88.00  (2.5%)   87.85  (2.7%)   
-0.2% (  -5% -5%)
   LowPhrase   80.53  (1.6%)   80.40  (1.4%)   
-0.2% (  -3% -2%)
 AndHighHigh   41.91  (4.2%)   41.86  (2.9%)   
-0.1% (  -6% -7%)
   MedPhrase   46.29  (1.4%)   46.33  (1.5%)
0.1% (  -2% -3%)
  IntNRQ  127.54  (0.4%)  127.76  (0.4%)
0.2% (   0% -1%)
   HighTermDayOfYearSort   48.59  (5.1%)   48.71  (6.0%)
0.2% ( -10% -   12%)
 LowSloppyPhrase   13.04  (4.0%)   13.08  (4.3%)
0.3% (  -7% -8%)
 MedSloppyPhrase   19.48  (2.3%)   19.54  (2.4%)
0.3% (  -4% -5%)
  OrHighHigh   23.60  (3.0%)   23.68  (2.9%)
0.3% (  -5% -6%)
  HighPhrase   20.25  (2.4%)   20.32  (1.8%)
0.3% (  -3% -4%)
HighSloppyPhrase9.29  (3.3%)9.32  (3.2%)
0.4% (  -5% -7%)
 LowSpanNear   25.70  (3.8%)   25.89  (3.9%)
0.7% (  -6% -8%)
 MedSpanNear   30.46  (4.1%)   30.69  (4.3%)
0.7% (  -7% -9%)
HighSpanNear   14.41  (4.3%)   14.60  (4.7%)
1.3% (  -7% -   10%)
Wildcard   70.08 (10.3%)   71.09  (6.1%)
1.4% ( -13% -   19%)
BrowseDateTaxoFacets2.37  (0.2%)2.41  (0.3%)
1.5% (   0% -1%)
 Prefix3   86.71 (11.4%)   89.04  (6.8%)
2.7% ( -13% -   23%)
{code}


was (Author: akjain):
I have created [pull request|https://github.com/apache/lucene-solr/pull/563] 
with the proposed changes. Though surprisingly, I still see some impact on the 
PKLookup performance.

{code:title=wikimedium10m|borderStyle=solid}
TaskQPS baseline  StdDevQPS candidate  StdDev   
 Pct diff
PKLookup  117.45  (2.2%)  108.72  (2.3%)   
-7.4% ( -11% -   -3%)
OrHighNotMed 1094.23  (2.5%) 1057.88  (2.7%)   
-3.3% (  -8% -1%)
OrHighNotLow 1047.30

[jira] [Commented] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

2019-02-04 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16760118#comment-16760118
 ] 

Ankit Jain commented on LUCENE-8635:


I have created [pull request|https://github.com/apache/lucene-solr/pull/563] 
with the proposed changes. Though surprisingly, I still see some impact on the 
PKLookup performance.

{code:title=wikimedium10m|borderStyle=solid}
TaskQPS baseline  StdDevQPS candidate  StdDev   
 Pct diff
PKLookup  117.45  (2.2%)  108.72  (2.3%)   
-7.4% ( -11% -   -3%)
OrHighNotMed 1094.23  (2.5%) 1057.88  (2.7%)   
-3.3% (  -8% -1%)
OrHighNotLow 1047.30  (1.7%) 1012.91  (2.5%)   
-3.3% (  -7% -1%)
  Fuzzy2   44.10  (2.3%)   42.71  (2.7%)   
-3.2% (  -7% -1%)
OrNotHighLow 1022.67  (2.5%)  992.28  (2.4%)   
-3.0% (  -7% -1%)
BrowseDayOfYearTaxoFacets 7907.19  (2.0%) 7677.99  (2.7%)   
-2.9% (  -7% -1%)
OrNotHighMed  866.37  (1.9%)  843.10  (2.3%)   
-2.7% (  -6% -1%)
 LowTerm 2103.58  (3.5%) 2048.98  (3.6%)   
-2.6% (  -9% -4%)
   BrowseMonthTaxoFacets 7883.86  (2.0%) 7692.48  (2.1%)   
-2.4% (  -6% -1%)
  Fuzzy1   64.44  (1.9%)   62.88  (2.3%)   
-2.4% (  -6% -1%)
   OrNotHighHigh  779.27  (2.0%)  761.04  (2.1%)   
-2.3% (  -6% -1%)
 Respell   55.60  (2.6%)   54.34  (2.3%)   
-2.3% (  -7% -2%)
   OrHighNotHigh  877.28  (2.2%)  858.10  (2.5%)   
-2.2% (  -6% -2%)
   BrowseMonthSSDVFacets   14.85  (7.9%)   14.57 (10.7%)   
-1.9% ( -18% -   18%)
 MedTerm 1984.26  (3.6%) 1947.76  (2.3%)   
-1.8% (  -7% -4%)
  AndHighLow  718.71  (1.5%)  706.06  (1.6%)   
-1.8% (  -4% -1%)
   OrHighLow  523.40  (2.5%)  515.56  (2.4%)   
-1.5% (  -6% -3%)
HighTerm 1381.10  (2.9%) 1360.80  (2.7%)   
-1.5% (  -6% -4%)
   HighTermMonthSort  120.45 (12.3%)  119.00 (16.4%)   
-1.2% ( -26% -   31%)
BrowseDayOfYearSSDVFacets   11.55  (9.7%)   11.45 (10.0%)   
-0.8% ( -18% -   20%)
  AndHighMed  155.15  (2.6%)  154.25  (2.4%)   
-0.6% (  -5% -4%)
   OrHighMed   88.00  (2.5%)   87.85  (2.7%)   
-0.2% (  -5% -5%)
   LowPhrase   80.53  (1.6%)   80.40  (1.4%)   
-0.2% (  -3% -2%)
 AndHighHigh   41.91  (4.2%)   41.86  (2.9%)   
-0.1% (  -6% -7%)
   MedPhrase   46.29  (1.4%)   46.33  (1.5%)
0.1% (  -2% -3%)
  IntNRQ  127.54  (0.4%)  127.76  (0.4%)
0.2% (   0% -1%)
   HighTermDayOfYearSort   48.59  (5.1%)   48.71  (6.0%)
0.2% ( -10% -   12%)
 LowSloppyPhrase   13.04  (4.0%)   13.08  (4.3%)
0.3% (  -7% -8%)
 MedSloppyPhrase   19.48  (2.3%)   19.54  (2.4%)
0.3% (  -4% -5%)
  OrHighHigh   23.60  (3.0%)   23.68  (2.9%)
0.3% (  -5% -6%)
  HighPhrase   20.25  (2.4%)   20.32  (1.8%)
0.3% (  -3% -4%)
HighSloppyPhrase9.29  (3.3%)9.32  (3.2%)
0.4% (  -5% -7%)
 LowSpanNear   25.70  (3.8%)   25.89  (3.9%)
0.7% (  -6% -8%)
 MedSpanNear   30.46  (4.1%)   30.69  (4.3%)
0.7% (  -7% -9%)
HighSpanNear   14.41  (4.3%)   14.60  (4.7%)
1.3% (  -7% -   10%)
Wildcard   70.08 (10.3%)   71.09  (6.1%)
1.4% ( -13% -   19%)
BrowseDateTaxoFacets2.37  (0.2%)2.41  (0.3%)
1.5% (   0% -1%)
 Prefix3   86.71 (11.4%)   89.04  (6.8%)
2.7% ( -13% -   23%)
{code}

> Lazy loading Lucene FST offheap using mmap
> --
>
> Key: LUCENE-8635
> URL: https://issues.apache.org/jira/browse/LUCENE-8635
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs
> Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
>Reporter: Ankit Jain
>Priority: Major
> Attachments: fst-offheap-ra-rev.patch, fst-offheap-rev.patch, 
> offheap.patch, optional_offheap_ra.patch, ra.patch, rally_benchmark.xlsx
>
>
> Currently, F

[jira] [Commented] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

2019-01-30 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16756963#comment-16756963
 ] 

Ankit Jain commented on LUCENE-8635:


Given that reversing the index during write to make it forward reading didn't 
help the performance (in addition to it not being backward compatible), is the 
consensus to add exception for PK and directories other than mmap for offheap 
FST in  [^ra.patch]?

> Lazy loading Lucene FST offheap using mmap
> --
>
> Key: LUCENE-8635
> URL: https://issues.apache.org/jira/browse/LUCENE-8635
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs
> Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
>Reporter: Ankit Jain
>Priority: Major
> Attachments: fst-offheap-ra-rev.patch, fst-offheap-rev.patch, 
> offheap.patch, optional_offheap_ra.patch, ra.patch, rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This 
> causes frequent JVM OOM issues if the term size gets big. A better way of 
> doing this will be to lazily load FST using mmap. That ensures only the 
> required terms get loaded into memory.
>  
> Lucene can expose API for providing list of fields to load terms offheap. I'm 
> planning to take following approach for this:
>  # Add a boolean property fstOffHeap in FieldInfo
>  # Pass list of offheap fields to lucene during index open (ALL can be 
> special keyword for loading ALL fields offheap)
>  # Initialize the fstOffHeap property during lucene index open
>  # FieldReader invokes default FST constructor or OffHeap constructor based 
> on fstOffHeap field
>  
> I created a patch (that loads all fields offheap), did some benchmarks using 
> es_rally and results look good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-8671) Add setting for moving FST offheap/onheap

2019-01-30 Thread Ankit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Jain updated LUCENE-8671:
---
Description: 
While LUCENE-8635, adds support for loading FST offheap using mmap, users do 
not have the  flexibility to specify fields for which FST needs to be offheap. 
This allows users to tune heap usage as per their workload.

Ideal way will be to add an attribute to FieldInfo, where we have 
put/getAttribute. Then FieldReader can inspect the FieldInfo and pass the 
appropriate On/OffHeapStore when creating its FST. It can support special 
keywords like ALL/NONE.

  was:
in real case,we use lucene index many documents. But some machine have not much 
memory.,once documents reach up to tens of billion,lucene can not start because 
of no enough memory. Most of the memry cost is FST;s .tip content.
So I want to pull my change on lucene core to make load FST's .tip into memory 
become configurable!
What do you think?

Summary: Add setting for moving FST offheap/onheap  (was: Adding 
setting for moving FST offheap/onheap)

> Add setting for moving FST offheap/onheap
> -
>
> Key: LUCENE-8671
> URL: https://issues.apache.org/jira/browse/LUCENE-8671
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs, core/store
>Reporter: Ankit Jain
>Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> While LUCENE-8635, adds support for loading FST offheap using mmap, users do 
> not have the  flexibility to specify fields for which FST needs to be 
> offheap. This allows users to tune heap usage as per their workload.
> Ideal way will be to add an attribute to FieldInfo, where we have 
> put/getAttribute. Then FieldReader can inspect the FieldInfo and pass the 
> appropriate On/OffHeapStore when creating its FST. It can support special 
> keywords like ALL/NONE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-8671) Adding setting for moving FST offheap/onheap

2019-01-30 Thread Ankit Jain (JIRA)
Ankit Jain created LUCENE-8671:
--

 Summary: Adding setting for moving FST offheap/onheap
 Key: LUCENE-8671
 URL: https://issues.apache.org/jira/browse/LUCENE-8671
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/FSTs, core/store
Reporter: Ankit Jain


in real case,we use lucene index many documents. But some machine have not much 
memory.,once documents reach up to tens of billion,lucene can not start because 
of no enough memory. Most of the memry cost is FST;s .tip content.
So I want to pull my change on lucene core to make load FST's .tip into memory 
become configurable!
What do you think?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

2019-01-29 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755389#comment-16755389
 ] 

Ankit Jain edited comment on LUCENE-8635 at 1/29/19 9:20 PM:
-

{quote}Given that the performance hit is mostly on PK lookups, maybe a starting 
point could be to always put the FST off-heap except when docCount == 
sumDocFreq, which suggests the field is an ID field.{quote}
[~jpountz] - Does that exlude autogenerated id fields that are uuid, resulting 
in large FSTs? Elasticsearch for example has _id field, which IMO is better 
offheap.


was (Author: akjain):
{quote}Given that the performance hit is mostly on PK lookups, maybe a starting 
point could be to always put the FST off-heap except when docCount == 
sumDocFreq, which suggests the field is an ID field.{quote}
[~jpountz] - Does that exlude autogenerated id fields that are uuid, resulting 
in huge FST? Elasticsearch for example has _id field, that is better offheap.

> Lazy loading Lucene FST offheap using mmap
> --
>
> Key: LUCENE-8635
> URL: https://issues.apache.org/jira/browse/LUCENE-8635
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs
> Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
>Reporter: Ankit Jain
>Priority: Major
> Attachments: fst-offheap-ra-rev.patch, fst-offheap-rev.patch, 
> offheap.patch, optional_offheap_ra.patch, ra.patch, rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This 
> causes frequent JVM OOM issues if the term size gets big. A better way of 
> doing this will be to lazily load FST using mmap. That ensures only the 
> required terms get loaded into memory.
>  
> Lucene can expose API for providing list of fields to load terms offheap. I'm 
> planning to take following approach for this:
>  # Add a boolean property fstOffHeap in FieldInfo
>  # Pass list of offheap fields to lucene during index open (ALL can be 
> special keyword for loading ALL fields offheap)
>  # Initialize the fstOffHeap property during lucene index open
>  # FieldReader invokes default FST constructor or OffHeap constructor based 
> on fstOffHeap field
>  
> I created a patch (that loads all fields offheap), did some benchmarks using 
> es_rally and results look good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

2019-01-29 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16755389#comment-16755389
 ] 

Ankit Jain commented on LUCENE-8635:


{quote}Given that the performance hit is mostly on PK lookups, maybe a starting 
point could be to always put the FST off-heap except when docCount == 
sumDocFreq, which suggests the field is an ID field.{quote}
[~jpountz] - Does that exlude autogenerated id fields that are uuid, resulting 
in huge FST? Elasticsearch for example has _id field, that is better offheap.

> Lazy loading Lucene FST offheap using mmap
> --
>
> Key: LUCENE-8635
> URL: https://issues.apache.org/jira/browse/LUCENE-8635
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs
> Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
>Reporter: Ankit Jain
>Priority: Major
> Attachments: fst-offheap-ra-rev.patch, offheap.patch, 
> optional_offheap_ra.patch, ra.patch, rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This 
> causes frequent JVM OOM issues if the term size gets big. A better way of 
> doing this will be to lazily load FST using mmap. That ensures only the 
> required terms get loaded into memory.
>  
> Lucene can expose API for providing list of fields to load terms offheap. I'm 
> planning to take following approach for this:
>  # Add a boolean property fstOffHeap in FieldInfo
>  # Pass list of offheap fields to lucene during index open (ALL can be 
> special keyword for loading ALL fields offheap)
>  # Initialize the fstOffHeap property during lucene index open
>  # FieldReader invokes default FST constructor or OffHeap constructor based 
> on fstOffHeap field
>  
> I created a patch (that loads all fields offheap), did some benchmarks using 
> es_rally and results look good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

2019-01-27 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16753609#comment-16753609
 ] 

Ankit Jain edited comment on LUCENE-8635 at 1/27/19 10:14 PM:
--

Results for bigger data sets:

{code:title=wikimedium10m, java .. -DFST.offheap=true|borderStyle=solid}
TaskQPS baseline  StdDevQPS candidate  StdDev   
 Pct diff
PKLookup  117.59  (3.0%)  107.48  (2.3%)   
-8.6% ( -13% -   -3%)
OrHighNotMed 1085.05  (2.1%) 1056.43  (2.2%)   
-2.6% (  -6% -1%)
OrNotHighLow  976.94  (2.4%)  955.32  (1.8%)   
-2.2% (  -6% -2%)
OrHighNotLow 1152.58  (2.6%) 1128.25  (2.0%)   
-2.1% (  -6% -2%)
  Fuzzy1   83.10  (2.6%)   81.54  (2.5%)   
-1.9% (  -6% -3%)
  IntNRQ   88.53 (16.2%)   86.92 (14.7%)   
-1.8% ( -28% -   34%)
   OrNotHighHigh  886.10  (1.7%)  870.26  (1.4%)   
-1.8% (  -4% -1%)
   OrHighNotHigh  838.32  (1.8%)  824.15  (1.9%)   
-1.7% (  -5% -2%)
   BrowseMonthTaxoFacets 8099.58  (2.0%) 7968.65  (1.8%)   
-1.6% (  -5% -2%)
  Fuzzy2   55.95  (2.7%)   55.08  (2.5%)   
-1.6% (  -6% -3%)
OrNotHighMed  764.40  (2.3%)  752.56  (1.7%)   
-1.5% (  -5% -2%)
BrowseDayOfYearTaxoFacets 8081.37  (2.1%) 7957.27  (2.7%)   
-1.5% (  -6% -3%)
 LowTerm 1941.88  (5.2%) 1912.71  (4.0%)   
-1.5% ( -10% -8%)
   HighTermMonthSort   78.12 (10.8%)   76.99 (14.3%)   
-1.4% ( -23% -   26%)
 Respell   61.23  (2.7%)   60.57  (2.7%)   
-1.1% (  -6% -4%)
HighTerm 1526.16  (3.1%) 1510.23  (1.8%)   
-1.0% (  -5% -4%)
 MedTerm 1814.44  (3.7%) 1797.69  (2.1%)   
-0.9% (  -6% -5%)
   OrHighLow  443.93  (2.4%)  439.92  (2.5%)   
-0.9% (  -5% -4%)
  AndHighLow  577.60  (2.0%)  573.43  (1.4%)   
-0.7% (  -4% -2%)
Wildcard   62.79  (5.8%)   62.54  (6.1%)   
-0.4% ( -11% -   12%)
BrowseDayOfYearSSDVFacets   11.56  (8.0%)   11.55  (8.2%)   
-0.0% ( -15% -   17%)
 Prefix3  165.76  (8.7%)  165.70  (9.2%)   
-0.0% ( -16% -   19%)
 MedSpanNear   51.40  (2.3%)   51.48  (2.5%)
0.2% (  -4% -5%)
   BrowseMonthSSDVFacets   14.45 (13.6%)   14.47 (13.2%)
0.2% ( -23% -   31%)
   HighTermDayOfYearSort   44.98  (6.8%)   45.05  (5.3%)
0.2% ( -11% -   13%)
   OrHighMed  111.81  (3.0%)  112.01  (2.8%)
0.2% (  -5% -6%)
 LowSpanNear   47.14  (2.4%)   47.24  (2.5%)
0.2% (  -4% -5%)
 MedSloppyPhrase   48.25  (1.9%)   48.37  (2.3%)
0.2% (  -3% -4%)
 LowSloppyPhrase   35.36  (2.2%)   35.46  (2.5%)
0.3% (  -4% -5%)
  AndHighMed  144.05  (3.6%)  144.53  (2.7%)
0.3% (  -5% -6%)
HighSpanNear6.92  (3.5%)6.95  (3.5%)
0.5% (  -6% -7%)
   MedPhrase   25.88  (2.4%)   26.00  (1.4%)
0.5% (  -3% -4%)
 AndHighHigh   38.77  (4.0%)   38.98  (3.9%)
0.5% (  -7% -8%)
  OrHighHigh   27.47  (3.2%)   27.63  (3.1%)
0.6% (  -5% -7%)
   LowPhrase   91.71  (4.3%)   92.56  (3.5%)
0.9% (  -6% -9%)
HighSloppyPhrase   18.28  (3.2%)   18.45  (3.6%)
0.9% (  -5% -8%)
  HighPhrase   20.07  (3.9%)   20.35  (1.3%)
1.4% (  -3% -6%)
BrowseDateTaxoFacets2.37  (0.4%)2.41  (0.2%)
1.4% (   0% -2%)
{code}


was (Author: akjain):
Results for bigger data sets:

{code| title=wikimedium10m, java .. -DFST.offheap=true|borderStyle=solid}
TaskQPS baseline  StdDevQPS candidate  StdDev   
 Pct diff
PKLookup  117.59  (3.0%)  107.48  (2.3%)   
-8.6% ( -13% -   -3%)
OrHighNotMed 1085.05  (2.1%) 1056.43  (2.2%)   
-2.6% (  -6% -1%)
OrNotHighLow  976.94  (2.4%)  955.32  (1.8%)   
-2.2% (  -6% -2%)
OrHighNotLow 1152.58  (2.6%) 1128.25  (2.0%)   
-2.1% (  -6% -2%)
  Fuzzy1   83.10  (2.6%)   81.54  (2.5%)   
-1.9% (  -6% -3%)
  IntNRQ   88.

[jira] [Commented] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

2019-01-27 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16753609#comment-16753609
 ] 

Ankit Jain commented on LUCENE-8635:


Results for bigger data sets:

{code| title=wikimedium10m, java .. -DFST.offheap=true|borderStyle=solid}
TaskQPS baseline  StdDevQPS candidate  StdDev   
 Pct diff
PKLookup  117.59  (3.0%)  107.48  (2.3%)   
-8.6% ( -13% -   -3%)
OrHighNotMed 1085.05  (2.1%) 1056.43  (2.2%)   
-2.6% (  -6% -1%)
OrNotHighLow  976.94  (2.4%)  955.32  (1.8%)   
-2.2% (  -6% -2%)
OrHighNotLow 1152.58  (2.6%) 1128.25  (2.0%)   
-2.1% (  -6% -2%)
  Fuzzy1   83.10  (2.6%)   81.54  (2.5%)   
-1.9% (  -6% -3%)
  IntNRQ   88.53 (16.2%)   86.92 (14.7%)   
-1.8% ( -28% -   34%)
   OrNotHighHigh  886.10  (1.7%)  870.26  (1.4%)   
-1.8% (  -4% -1%)
   OrHighNotHigh  838.32  (1.8%)  824.15  (1.9%)   
-1.7% (  -5% -2%)
   BrowseMonthTaxoFacets 8099.58  (2.0%) 7968.65  (1.8%)   
-1.6% (  -5% -2%)
  Fuzzy2   55.95  (2.7%)   55.08  (2.5%)   
-1.6% (  -6% -3%)
OrNotHighMed  764.40  (2.3%)  752.56  (1.7%)   
-1.5% (  -5% -2%)
BrowseDayOfYearTaxoFacets 8081.37  (2.1%) 7957.27  (2.7%)   
-1.5% (  -6% -3%)
 LowTerm 1941.88  (5.2%) 1912.71  (4.0%)   
-1.5% ( -10% -8%)
   HighTermMonthSort   78.12 (10.8%)   76.99 (14.3%)   
-1.4% ( -23% -   26%)
 Respell   61.23  (2.7%)   60.57  (2.7%)   
-1.1% (  -6% -4%)
HighTerm 1526.16  (3.1%) 1510.23  (1.8%)   
-1.0% (  -5% -4%)
 MedTerm 1814.44  (3.7%) 1797.69  (2.1%)   
-0.9% (  -6% -5%)
   OrHighLow  443.93  (2.4%)  439.92  (2.5%)   
-0.9% (  -5% -4%)
  AndHighLow  577.60  (2.0%)  573.43  (1.4%)   
-0.7% (  -4% -2%)
Wildcard   62.79  (5.8%)   62.54  (6.1%)   
-0.4% ( -11% -   12%)
BrowseDayOfYearSSDVFacets   11.56  (8.0%)   11.55  (8.2%)   
-0.0% ( -15% -   17%)
 Prefix3  165.76  (8.7%)  165.70  (9.2%)   
-0.0% ( -16% -   19%)
 MedSpanNear   51.40  (2.3%)   51.48  (2.5%)
0.2% (  -4% -5%)
   BrowseMonthSSDVFacets   14.45 (13.6%)   14.47 (13.2%)
0.2% ( -23% -   31%)
   HighTermDayOfYearSort   44.98  (6.8%)   45.05  (5.3%)
0.2% ( -11% -   13%)
   OrHighMed  111.81  (3.0%)  112.01  (2.8%)
0.2% (  -5% -6%)
 LowSpanNear   47.14  (2.4%)   47.24  (2.5%)
0.2% (  -4% -5%)
 MedSloppyPhrase   48.25  (1.9%)   48.37  (2.3%)
0.2% (  -3% -4%)
 LowSloppyPhrase   35.36  (2.2%)   35.46  (2.5%)
0.3% (  -4% -5%)
  AndHighMed  144.05  (3.6%)  144.53  (2.7%)
0.3% (  -5% -6%)
HighSpanNear6.92  (3.5%)6.95  (3.5%)
0.5% (  -6% -7%)
   MedPhrase   25.88  (2.4%)   26.00  (1.4%)
0.5% (  -3% -4%)
 AndHighHigh   38.77  (4.0%)   38.98  (3.9%)
0.5% (  -7% -8%)
  OrHighHigh   27.47  (3.2%)   27.63  (3.1%)
0.6% (  -5% -7%)
   LowPhrase   91.71  (4.3%)   92.56  (3.5%)
0.9% (  -6% -9%)
HighSloppyPhrase   18.28  (3.2%)   18.45  (3.6%)
0.9% (  -5% -8%)
  HighPhrase   20.07  (3.9%)   20.35  (1.3%)
1.4% (  -3% -6%)
BrowseDateTaxoFacets2.37  (0.4%)2.41  (0.2%)
1.4% (   0% -2%)
{code}

> Lazy loading Lucene FST offheap using mmap
> --
>
> Key: LUCENE-8635
> URL: https://issues.apache.org/jira/browse/LUCENE-8635
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs
> Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
>Reporter: Ankit Jain
>Priority: Major
> Attachments: fst-offheap-ra-rev.patch, offheap.patch, 
> optional_offheap_ra.patch, ra.patch, rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This 
> causes frequent JVM OOM issues if the term size gets big. A better way of

[jira] [Comment Edited] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

2019-01-27 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16753595#comment-16753595
 ] 

Ankit Jain edited comment on LUCENE-8635 at 1/27/19 9:23 PM:
-

I also independently tried performance run after removing the array reversal in 
readBytes in original patch, but results looked similar to earlier results.

Since, we are leaning towards keep this as optional, I created another patch - 
[^optional_offheap_ra.patch] based off reverse random access reader - 
[^ra.patch], that adds FST.offheap as system property to allow toggling between 
offheap and onheap.

The results for wikimedium10k with:

java .. -DFST.offheap=true
 
{code}   TaskQPS baseline  StdDevQPS candidate  StdDev  
  Pct diff
PKLookup  172.88  (3.3%)  153.94  (3.7%)  
-11.0% ( -17% -   -4%)
 LowTerm12229.10  (3.5%)11032.10  (3.3%)   
-9.8% ( -16% -   -3%)
  AndHighLow 4679.22  (3.2%) 4349.12  (3.3%)   
-7.1% ( -13% -0%)
 MedTerm10179.43  (5.4%) 9533.14  (3.4%)   
-6.3% ( -14% -2%)
HighTerm 5123.89  (3.1%) 4814.09  (4.7%)   
-6.0% ( -13% -1%)
   LowPhrase 3459.57  (5.3%) 3253.20  (7.5%)   
-6.0% ( -17% -7%)
   MedPhrase 2815.82  (5.1%) 2654.13  (5.6%)   
-5.7% ( -15% -5%)
 MedSpanNear 2196.98  (4.4%) 2082.39  (3.9%)   
-5.2% ( -12% -3%)
HighSloppyPhrase 1680.32  (5.7%) 1592.91  (8.0%)   
-5.2% ( -17% -9%)
 LowSloppyPhrase 3205.99  (4.9%) 3045.94  (4.4%)   
-5.0% ( -13% -4%)
   OrHighMed 1960.52  (4.8%) 1866.03  (6.2%)   
-4.8% ( -15% -6%)
Wildcard 1388.45  (8.5%) 1324.82  (6.2%)   
-4.6% ( -17% -   11%)
  OrHighHigh 1304.03  (7.8%) 1247.72  (5.1%)   
-4.3% ( -16% -9%)
  AndHighMed 2268.22  (2.8%) 2171.27  (2.8%)   
-4.3% (  -9% -1%)
 MedSloppyPhrase 2697.01  (6.1%) 2597.71  (5.0%)   
-3.7% ( -13% -7%)
   HighTermDayOfYearSort 1719.25  (5.3%) 1657.10  (5.8%)   
-3.6% ( -13% -7%)
HighSpanNear 1624.69  (4.4%) 1567.35  (5.6%)   
-3.5% ( -12% -6%)
 AndHighHigh 1645.28  (3.7%) 1589.76  (2.9%)   
-3.4% (  -9% -3%)
 LowSpanNear 2319.98  (6.0%) 2246.30  (5.5%)   
-3.2% ( -13% -8%)
   OrHighLow 2264.00  (6.0%) 2200.33  (4.3%)   
-2.8% ( -12% -7%)
   HighTermMonthSort 4829.60  (3.9%) 4700.35  (2.5%)   
-2.7% (  -8% -3%)
  Fuzzy2  172.46  (4.8%)  168.02  (5.4%)   
-2.6% ( -12% -8%)
  HighPhrase 2525.60  (6.3%) 2464.09  (5.3%)   
-2.4% ( -13% -9%)
  Fuzzy1  585.39  (4.4%)  571.20  (4.1%)   
-2.4% ( -10% -6%)
 Prefix3 1359.75  (8.2%) 1330.98  (5.8%)   
-2.1% ( -14% -   12%)
 Respell  501.29  (3.2%)  490.92  (4.7%)   
-2.1% (  -9% -5%)
   BrowseMonthTaxoFacets 8450.33  (4.7%) 8354.07  (4.9%)   
-1.1% ( -10% -8%)
BrowseDayOfYearSSDVFacets 2016.73  (3.4%) 2009.96  (4.0%)   
-0.3% (  -7% -7%)
BrowseDayOfYearTaxoFacets 8303.67  (6.4%) 8294.91  (5.6%)   
-0.1% ( -11% -   12%)
  IntNRQ 1380.11  (2.1%) 1380.36  (2.0%)
0.0% (  -3% -4%)
BrowseDateTaxoFacets 3564.47  (3.2%) 3575.88  (3.2%)
0.3% (  -5% -7%)
   BrowseMonthSSDVFacets 2247.87  (5.4%) 2276.28  (3.5%)
1.3% (  -7% -   10%)
{code}

java .. -DFST.offheap=false

{{TaskQPS baseline  StdDevQPS candidate  StdDev 
   Pct diff
   LowPhrase 3244.01  (6.3%) 3201.30  (7.0%)   
-1.3% ( -13% -   12%)
PKLookup  171.24  (3.3%)  169.28  (5.3%)   
-1.1% (  -9% -7%)
 MedSloppyPhrase 2867.58  (6.3%) 2848.80  (6.9%)   
-0.7% ( -13% -   13%)
   BrowseMonthTaxoFacets 8565.92  (4.9%) 8514.51  (5.3%)   
-0.6% ( -10% -   10%)
 Respell  529.20  (3.6%)  526.69  (3.4%)   
-0.5% (  -7% -6%)
Wildcard 1252.25  (7.6%) 1249.97  (7.3%)   
-0.2% ( -13% -   15%)
  IntNRQ 1536.74  (1.7%) 1536.53  (2.1%)   
-0.0% (  -3% -3%)
BrowseDayOfYearTaxoFacets 8490.89  (6.3%) 8490.94  (5.5%)
0.0% ( -11% -   12%)
 LowSpanNear 2391.88  (3.0%)

[jira] [Commented] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

2019-01-27 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16753595#comment-16753595
 ] 

Ankit Jain commented on LUCENE-8635:


I also independently tried performance run after removing the array reversal in 
readBytes in original patch, but results looked similar to earlier results.

Since, we are leaning towards keep this as optional, I created another patch - 
[^optional_offheap_ra.patch] based off reverse random access reader - 
[^ra.patch], that adds FST.offheap as system property to allow toggling between 
offheap and onheap.

The results for wikimedium10k with:

java .. -DFST.offheap=true
 
{{TaskQPS baseline  StdDevQPS candidate  StdDev 
   Pct diff
PKLookup  172.88  (3.3%)  153.94  (3.7%)  
-11.0% ( -17% -   -4%)
 LowTerm12229.10  (3.5%)11032.10  (3.3%)   
-9.8% ( -16% -   -3%)
  AndHighLow 4679.22  (3.2%) 4349.12  (3.3%)   
-7.1% ( -13% -0%)
 MedTerm10179.43  (5.4%) 9533.14  (3.4%)   
-6.3% ( -14% -2%)
HighTerm 5123.89  (3.1%) 4814.09  (4.7%)   
-6.0% ( -13% -1%)
   LowPhrase 3459.57  (5.3%) 3253.20  (7.5%)   
-6.0% ( -17% -7%)
   MedPhrase 2815.82  (5.1%) 2654.13  (5.6%)   
-5.7% ( -15% -5%)
 MedSpanNear 2196.98  (4.4%) 2082.39  (3.9%)   
-5.2% ( -12% -3%)
HighSloppyPhrase 1680.32  (5.7%) 1592.91  (8.0%)   
-5.2% ( -17% -9%)
 LowSloppyPhrase 3205.99  (4.9%) 3045.94  (4.4%)   
-5.0% ( -13% -4%)
   OrHighMed 1960.52  (4.8%) 1866.03  (6.2%)   
-4.8% ( -15% -6%)
Wildcard 1388.45  (8.5%) 1324.82  (6.2%)   
-4.6% ( -17% -   11%)
  OrHighHigh 1304.03  (7.8%) 1247.72  (5.1%)   
-4.3% ( -16% -9%)
  AndHighMed 2268.22  (2.8%) 2171.27  (2.8%)   
-4.3% (  -9% -1%)
 MedSloppyPhrase 2697.01  (6.1%) 2597.71  (5.0%)   
-3.7% ( -13% -7%)
   HighTermDayOfYearSort 1719.25  (5.3%) 1657.10  (5.8%)   
-3.6% ( -13% -7%)
HighSpanNear 1624.69  (4.4%) 1567.35  (5.6%)   
-3.5% ( -12% -6%)
 AndHighHigh 1645.28  (3.7%) 1589.76  (2.9%)   
-3.4% (  -9% -3%)
 LowSpanNear 2319.98  (6.0%) 2246.30  (5.5%)   
-3.2% ( -13% -8%)
   OrHighLow 2264.00  (6.0%) 2200.33  (4.3%)   
-2.8% ( -12% -7%)
   HighTermMonthSort 4829.60  (3.9%) 4700.35  (2.5%)   
-2.7% (  -8% -3%)
  Fuzzy2  172.46  (4.8%)  168.02  (5.4%)   
-2.6% ( -12% -8%)
  HighPhrase 2525.60  (6.3%) 2464.09  (5.3%)   
-2.4% ( -13% -9%)
  Fuzzy1  585.39  (4.4%)  571.20  (4.1%)   
-2.4% ( -10% -6%)
 Prefix3 1359.75  (8.2%) 1330.98  (5.8%)   
-2.1% ( -14% -   12%)
 Respell  501.29  (3.2%)  490.92  (4.7%)   
-2.1% (  -9% -5%)
   BrowseMonthTaxoFacets 8450.33  (4.7%) 8354.07  (4.9%)   
-1.1% ( -10% -8%)
BrowseDayOfYearSSDVFacets 2016.73  (3.4%) 2009.96  (4.0%)   
-0.3% (  -7% -7%)
BrowseDayOfYearTaxoFacets 8303.67  (6.4%) 8294.91  (5.6%)   
-0.1% ( -11% -   12%)
  IntNRQ 1380.11  (2.1%) 1380.36  (2.0%)
0.0% (  -3% -4%)
BrowseDateTaxoFacets 3564.47  (3.2%) 3575.88  (3.2%)
0.3% (  -5% -7%)
   BrowseMonthSSDVFacets 2247.87  (5.4%) 2276.28  (3.5%)
1.3% (  -7% -   10%)
}}

java .. -DFST.offheap=false

{{TaskQPS baseline  StdDevQPS candidate  StdDev 
   Pct diff
   LowPhrase 3244.01  (6.3%) 3201.30  (7.0%)   
-1.3% ( -13% -   12%)
PKLookup  171.24  (3.3%)  169.28  (5.3%)   
-1.1% (  -9% -7%)
 MedSloppyPhrase 2867.58  (6.3%) 2848.80  (6.9%)   
-0.7% ( -13% -   13%)
   BrowseMonthTaxoFacets 8565.92  (4.9%) 8514.51  (5.3%)   
-0.6% ( -10% -   10%)
 Respell  529.20  (3.6%)  526.69  (3.4%)   
-0.5% (  -7% -6%)
Wildcard 1252.25  (7.6%) 1249.97  (7.3%)   
-0.2% ( -13% -   15%)
  IntNRQ 1536.74  (1.7%) 1536.53  (2.1%)   
-0.0% (  -3% -3%)
BrowseDayOfYearTaxoFacets 8490.89  (6.3%) 8490.94  (5.5%)
0.0% ( -11% -   12%)
 LowSpanNear 2391.88  (3.0%) 2392.15  (4.9%)
0.0% (  -7% -8%)
  

[jira] [Comment Edited] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

2019-01-27 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16753595#comment-16753595
 ] 

Ankit Jain edited comment on LUCENE-8635 at 1/27/19 9:26 PM:
-

I also independently tried performance run after removing the array reversal in 
readBytes in original patch, but results looked similar to earlier results.

Since, we are leaning towards keep this as optional, I created another patch - 
[^optional_offheap_ra.patch] based off reverse random access reader - 
[^ra.patch], that adds FST.offheap as system property to allow toggling between 
offheap and onheap.

The results for wikimedium10k with:

java .. -DFST.offheap=true
 
{code}   TaskQPS baseline  StdDevQPS candidate  StdDev  
  Pct diff
PKLookup  172.88  (3.3%)  153.94  (3.7%)  
-11.0% ( -17% -   -4%)
 LowTerm12229.10  (3.5%)11032.10  (3.3%)   
-9.8% ( -16% -   -3%)
  AndHighLow 4679.22  (3.2%) 4349.12  (3.3%)   
-7.1% ( -13% -0%)
 MedTerm10179.43  (5.4%) 9533.14  (3.4%)   
-6.3% ( -14% -2%)
HighTerm 5123.89  (3.1%) 4814.09  (4.7%)   
-6.0% ( -13% -1%)
   LowPhrase 3459.57  (5.3%) 3253.20  (7.5%)   
-6.0% ( -17% -7%)
   MedPhrase 2815.82  (5.1%) 2654.13  (5.6%)   
-5.7% ( -15% -5%)
 MedSpanNear 2196.98  (4.4%) 2082.39  (3.9%)   
-5.2% ( -12% -3%)
HighSloppyPhrase 1680.32  (5.7%) 1592.91  (8.0%)   
-5.2% ( -17% -9%)
 LowSloppyPhrase 3205.99  (4.9%) 3045.94  (4.4%)   
-5.0% ( -13% -4%)
   OrHighMed 1960.52  (4.8%) 1866.03  (6.2%)   
-4.8% ( -15% -6%)
Wildcard 1388.45  (8.5%) 1324.82  (6.2%)   
-4.6% ( -17% -   11%)
  OrHighHigh 1304.03  (7.8%) 1247.72  (5.1%)   
-4.3% ( -16% -9%)
  AndHighMed 2268.22  (2.8%) 2171.27  (2.8%)   
-4.3% (  -9% -1%)
 MedSloppyPhrase 2697.01  (6.1%) 2597.71  (5.0%)   
-3.7% ( -13% -7%)
   HighTermDayOfYearSort 1719.25  (5.3%) 1657.10  (5.8%)   
-3.6% ( -13% -7%)
HighSpanNear 1624.69  (4.4%) 1567.35  (5.6%)   
-3.5% ( -12% -6%)
 AndHighHigh 1645.28  (3.7%) 1589.76  (2.9%)   
-3.4% (  -9% -3%)
 LowSpanNear 2319.98  (6.0%) 2246.30  (5.5%)   
-3.2% ( -13% -8%)
   OrHighLow 2264.00  (6.0%) 2200.33  (4.3%)   
-2.8% ( -12% -7%)
   HighTermMonthSort 4829.60  (3.9%) 4700.35  (2.5%)   
-2.7% (  -8% -3%)
  Fuzzy2  172.46  (4.8%)  168.02  (5.4%)   
-2.6% ( -12% -8%)
  HighPhrase 2525.60  (6.3%) 2464.09  (5.3%)   
-2.4% ( -13% -9%)
  Fuzzy1  585.39  (4.4%)  571.20  (4.1%)   
-2.4% ( -10% -6%)
 Prefix3 1359.75  (8.2%) 1330.98  (5.8%)   
-2.1% ( -14% -   12%)
 Respell  501.29  (3.2%)  490.92  (4.7%)   
-2.1% (  -9% -5%)
   BrowseMonthTaxoFacets 8450.33  (4.7%) 8354.07  (4.9%)   
-1.1% ( -10% -8%)
BrowseDayOfYearSSDVFacets 2016.73  (3.4%) 2009.96  (4.0%)   
-0.3% (  -7% -7%)
BrowseDayOfYearTaxoFacets 8303.67  (6.4%) 8294.91  (5.6%)   
-0.1% ( -11% -   12%)
  IntNRQ 1380.11  (2.1%) 1380.36  (2.0%)
0.0% (  -3% -4%)
BrowseDateTaxoFacets 3564.47  (3.2%) 3575.88  (3.2%)
0.3% (  -5% -7%)
   BrowseMonthSSDVFacets 2247.87  (5.4%) 2276.28  (3.5%)
1.3% (  -7% -   10%)
{code}

java .. -DFST.offheap=false

{code}TaskQPS baseline  StdDevQPS candidate  StdDev 
   Pct diff
   LowPhrase 3244.01  (6.3%) 3201.30  (7.0%)   
-1.3% ( -13% -   12%)
PKLookup  171.24  (3.3%)  169.28  (5.3%)   
-1.1% (  -9% -7%)
 MedSloppyPhrase 2867.58  (6.3%) 2848.80  (6.9%)   
-0.7% ( -13% -   13%)
   BrowseMonthTaxoFacets 8565.92  (4.9%) 8514.51  (5.3%)   
-0.6% ( -10% -   10%)
 Respell  529.20  (3.6%)  526.69  (3.4%)   
-0.5% (  -7% -6%)
Wildcard 1252.25  (7.6%) 1249.97  (7.3%)   
-0.2% ( -13% -   15%)
  IntNRQ 1536.74  (1.7%) 1536.53  (2.1%)   
-0.0% (  -3% -3%)
BrowseDayOfYearTaxoFacets 8490.89  (6.3%) 8490.94  (5.5%)
0.0% ( -11% -   12%)
 LowSpanNear 2391.88  (3

[jira] [Updated] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

2019-01-27 Thread Ankit Jain (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Jain updated LUCENE-8635:
---
Attachment: optional_offheap_ra.patch

> Lazy loading Lucene FST offheap using mmap
> --
>
> Key: LUCENE-8635
> URL: https://issues.apache.org/jira/browse/LUCENE-8635
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs
> Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
>Reporter: Ankit Jain
>Priority: Major
> Attachments: fst-offheap-ra-rev.patch, offheap.patch, 
> optional_offheap_ra.patch, ra.patch, rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This 
> causes frequent JVM OOM issues if the term size gets big. A better way of 
> doing this will be to lazily load FST using mmap. That ensures only the 
> required terms get loaded into memory.
>  
> Lucene can expose API for providing list of fields to load terms offheap. I'm 
> planning to take following approach for this:
>  # Add a boolean property fstOffHeap in FieldInfo
>  # Pass list of offheap fields to lucene during index open (ALL can be 
> special keyword for loading ALL fields offheap)
>  # Initialize the fstOffHeap property during lucene index open
>  # FieldReader invokes default FST constructor or OffHeap constructor based 
> on fstOffHeap field
>  
> I created a patch (that loads all fields offheap), did some benchmarks using 
> es_rally and results look good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

2019-01-23 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750316#comment-16750316
 ] 

Ankit Jain edited comment on LUCENE-8635 at 1/23/19 6:47 PM:
-

{quote}Ankit Jain unfortunately RandomAccessInput doesn't offer readBytes. I'm 
looking into adding it; shouldn't be hard as there aren't that many 
implementations.{quote}
You don't need to use RandomAccessInput. You can revert back to original 
IndexInputReader and get rid of the reversal logic.
{code:title=ForwardIndexInputReader|borderStyle=solid}
/** Implements forward read for FST from an index input. */
final class ForwardIndexInputReader extends FST.BytesReader {
private final IndexInput in;
private final long startFP;

public ReverseIndexInputReader(IndexInput in, long startFP) {
this.in = in;
this.startFP = startFP;
}

@Override
public byte readByte() throws IOException {
return this.in.readByte();
}

@Override
public void readBytes(byte[] b, int offset, int len) throws IOException {
this.in.readBytes(b, offset, len);
}

@Override
public void skipBytes(long count) {
this.setPosition(this.getPosition() + count);
}

@Override
public long getPosition() {
final long position = this.in.getFilePointer() - startFP;
return position;
}

@Override
public void setPosition(long pos) {
try {
this.in.seek(startFP + pos);
} catch (IOException ex) {
System.out.println(String.format("Unreported exception in set 
position at %d - %s", pos, ex.getMessage()));
}
}

@Override
public boolean reversed() {
return false;
}
}
{code}

{quote}Furthermore the NIO and Simple FS directories use buffering. I'm 
wondering how bad things would be if every seek would need to reload the 
buffer?{quote}
This can be serious concern for NIO and Simple FS systems. Given that most of 
the systems today use mmap, can we limit the offheap FST to mmap supported 
systems i.e.
{code:title=isMMapSupported|borderStyle=solid}
Constants.JRE_IS_64BIT && MMapDirectory.UNMAP_SUPPORTED
{code}




was (Author: akjain):
{quote}Ankit Jain unfortunately RandomAccessInput doesn't offer readBytes. I'm 
looking into adding it; shouldn't be hard as there aren't that many 
implementations.{quote}
You don't need to use RandomAccessInput. You can revert back to original 
IndexInputReader and get rid of the reversal logic.
{code:title=ForwardIndexInputReader|borderStyle=Solid}
/** Implements reverse read from an index input. */
final class ForwardIndexInputReader extends FST.BytesReader {
private final IndexInput in;
private final long startFP;

public ReverseIndexInputReader(IndexInput in, long startFP) {
this.in = in;
this.startFP = startFP;
}

@Override
public byte readByte() throws IOException {
return this.in.readByte();
}

@Override
public void readBytes(byte[] b, int offset, int len) throws IOException {
this.in.readBytes(b, offset, len);
}

@Override
public void skipBytes(long count) {
this.setPosition(this.getPosition() + count);
}

@Override
public long getPosition() {
final long position = this.in.getFilePointer() - startFP;
return position;
}

@Override
public void setPosition(long pos) {
try {
this.in.seek(startFP + pos);
} catch (IOException ex) {
System.out.println(String.format("Unreported exception in set 
position at %d - %s", pos, ex.getMessage()));
}
}

@Override
public boolean reversed() {
return false;
}
}
{code}

{quote}Furthermore the NIO and Simple FS directories use buffering. I'm 
wondering how bad things would be if every seek would need to reload the 
buffer?{quote}
This can be serious concern for NIO and Simple FS systems. Given that most of 
the systems today use mmap, can we limit the offheap FST to mmap supported 
systems i.e.
{code:title=isMMapSupported|borderStyle=Solid}
Constants.JRE_IS_64BIT && MMapDirectory.UNMAP_SUPPORTED
{code}



> Lazy loading Lucene FST offheap using mmap
> --
>
> Key: LUCENE-8635
> URL: https://issues.apache.org/jira/browse/LUCENE-8635
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs
> Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
>Reporter: Ankit Jain
>Priority: Major
> Attachments: fst-offheap-ra-rev.patch, offheap.patch, ra.patch, 
> rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This 
> causes frequ

[jira] [Commented] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

2019-01-23 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16750316#comment-16750316
 ] 

Ankit Jain commented on LUCENE-8635:


{quote}Ankit Jain unfortunately RandomAccessInput doesn't offer readBytes. I'm 
looking into adding it; shouldn't be hard as there aren't that many 
implementations.{quote}
You don't need to use RandomAccessInput. You can revert back to original 
IndexInputReader and get rid of the reversal logic.
{code:title=ForwardIndexInputReader|borderStyle=Solid}
/** Implements reverse read from an index input. */
final class ForwardIndexInputReader extends FST.BytesReader {
private final IndexInput in;
private final long startFP;

public ReverseIndexInputReader(IndexInput in, long startFP) {
this.in = in;
this.startFP = startFP;
}

@Override
public byte readByte() throws IOException {
return this.in.readByte();
}

@Override
public void readBytes(byte[] b, int offset, int len) throws IOException {
this.in.readBytes(b, offset, len);
}

@Override
public void skipBytes(long count) {
this.setPosition(this.getPosition() + count);
}

@Override
public long getPosition() {
final long position = this.in.getFilePointer() - startFP;
return position;
}

@Override
public void setPosition(long pos) {
try {
this.in.seek(startFP + pos);
} catch (IOException ex) {
System.out.println(String.format("Unreported exception in set 
position at %d - %s", pos, ex.getMessage()));
}
}

@Override
public boolean reversed() {
return false;
}
}
{code}

{quote}Furthermore the NIO and Simple FS directories use buffering. I'm 
wondering how bad things would be if every seek would need to reload the 
buffer?{quote}
This can be serious concern for NIO and Simple FS systems. Given that most of 
the systems today use mmap, can we limit the offheap FST to mmap supported 
systems i.e.
{code:title=isMMapSupported|borderStyle=Solid}
Constants.JRE_IS_64BIT && MMapDirectory.UNMAP_SUPPORTED
{code}



> Lazy loading Lucene FST offheap using mmap
> --
>
> Key: LUCENE-8635
> URL: https://issues.apache.org/jira/browse/LUCENE-8635
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs
> Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
>Reporter: Ankit Jain
>Priority: Major
> Attachments: fst-offheap-ra-rev.patch, offheap.patch, ra.patch, 
> rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This 
> causes frequent JVM OOM issues if the term size gets big. A better way of 
> doing this will be to lazily load FST using mmap. That ensures only the 
> required terms get loaded into memory.
>  
> Lucene can expose API for providing list of fields to load terms offheap. I'm 
> planning to take following approach for this:
>  # Add a boolean property fstOffHeap in FieldInfo
>  # Pass list of offheap fields to lucene during index open (ALL can be 
> special keyword for loading ALL fields offheap)
>  # Initialize the fstOffHeap property during lucene index open
>  # FieldReader invokes default FST constructor or OffHeap constructor based 
> on fstOffHeap field
>  
> I created a patch (that loads all fields offheap), did some benchmarks using 
> es_rally and results look good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

2019-01-22 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16749180#comment-16749180
 ] 

Ankit Jain edited comment on LUCENE-8635 at 1/22/19 9:41 PM:
-

{quote}Technically we could make things work for existing segments since your 
patch doesn't change the file format.{quote}
[~jpountz] - I'm curious on how this can be done. I looked at the code and it 
seemed that all settings are passed to the segment writer and writer should put 
those settings in codec for reader to consume. Do you have any pointers on this?

{quote}I agree it's a bit unlikely that the terms index gets paged out, but you 
can still end up with a cold FS cache eg. when the host restarts?{quote}
There can be option for preloading terms index during index open. Even though, 
lucene already provides option for preloading mapped buffer 
[here|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java#L95],
 it is done at directory level and not file level. Though, elasticsearch worked 
around that to provide [file level 
setting|https://www.elastic.co/guide/en/elasticsearch/reference/master/_pre_loading_data_into_the_file_system_cache.html]

{quote}For the record, Lucene also performs implicit PK lookups when indexing 
with updateDocument. So this might have an impact on indexing speed as 
well.{quote}
If customer workload is updateDocument heavy, the impact should be minimal, as 
terms index will get loaded into memory after first fault for every page and 
then there should not be any page faults. If customers are sensitive to 
latency, they can use the preload option for terms index.

{quote}Wondering whether avoiding 'array reversal' in the second patch is what 
helped rather than moving to random access and removing skip? May be we should 
try with reading one byte at a time with original patch.{quote}
I overlooked that earlier and attributed performance gain to absence of seek 
operation. This makes lot more sense, will try to do some by changing readBytes 
to below:
{code:title=ReverseIndexInputReader.java|borderStyle=solid}  
public byte readByte() throws IOException {
final byte b = this.in.readByte();
this.skipBytes(2);
return b;
}

public void readBytes(byte[] b, int offset, int len) throws IOException {
for (int i=offset+len-1; i>=offset; i--) {
b[i] = this.readByte();
}
}
{code}

{quote}I uploaded a patch that combines these three things: off-heap FST + 
random-access reader + reversal of the FST so it is forward-read. Unit tests 
are passing; I'm running some benchmarks to see what the impact is on 
performance{quote}
That's great Mike. If this works, we don't need the reverse reader. We don't 
even need the random-access reader, as we can simply change readBytes to below:
{code:title=ReverseIndexInputReader.java|borderStyle=solid}  
public void readBytes(byte[] b, int offset, int len) throws IOException {
this.in.readBytes(b, offset, len);
}
{code}


was (Author: akjain):
bq. {quote}Technically we could make things work for existing segments since 
your patch doesn't change the file format.{quote}
[~jpountz] - I'm curious on how this can be done. I looked at the code and it 
seemed that all settings are passed to the segment writer and writer should put 
those settings in codec for reader to consume. Do you have any pointers on this?

{quote}I agree it's a bit unlikely that the terms index gets paged out, but you 
can still end up with a cold FS cache eg. when the host restarts?{quote}
There can be option for preloading terms index during index open. Even though, 
lucene already provides option for preloading mapped buffer 
[here|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java#L95],
 it is done at directory level and not file level. Though, elasticsearch worked 
around that to provide [file level 
setting|https://www.elastic.co/guide/en/elasticsearch/reference/master/_pre_loading_data_into_the_file_system_cache.html]

{quote}For the record, Lucene also performs implicit PK lookups when indexing 
with updateDocument. So this might have an impact on indexing speed as 
well.{quote}
If customer workload is updateDocument heavy, the impact should be minimal, as 
terms index will get loaded into memory after first fault for every page and 
then there should not be any page faults. If customers are sensitive to 
latency, they can use the preload option for terms index.

{quote}Wondering whether avoiding 'array reversal' in the second patch is what 
helped rather than moving to random access and removing skip? May be we should 
try with reading one byte at a time with original patch.{quote}
I overlooked that earlier and attributed performance gain to absence of seek 
operation. This makes

[jira] [Comment Edited] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

2019-01-22 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16749180#comment-16749180
 ] 

Ankit Jain edited comment on LUCENE-8635 at 1/22/19 9:40 PM:
-

bq. {quote}Technically we could make things work for existing segments since 
your patch doesn't change the file format.{quote}
[~jpountz] - I'm curious on how this can be done. I looked at the code and it 
seemed that all settings are passed to the segment writer and writer should put 
those settings in codec for reader to consume. Do you have any pointers on this?

{quote}I agree it's a bit unlikely that the terms index gets paged out, but you 
can still end up with a cold FS cache eg. when the host restarts?{quote}
There can be option for preloading terms index during index open. Even though, 
lucene already provides option for preloading mapped buffer 
[here|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java#L95],
 it is done at directory level and not file level. Though, elasticsearch worked 
around that to provide [file level 
setting|https://www.elastic.co/guide/en/elasticsearch/reference/master/_pre_loading_data_into_the_file_system_cache.html]

{quote}For the record, Lucene also performs implicit PK lookups when indexing 
with updateDocument. So this might have an impact on indexing speed as 
well.{quote}
If customer workload is updateDocument heavy, the impact should be minimal, as 
terms index will get loaded into memory after first fault for every page and 
then there should not be any page faults. If customers are sensitive to 
latency, they can use the preload option for terms index.

{quote}Wondering whether avoiding 'array reversal' in the second patch is what 
helped rather than moving to random access and removing skip? May be we should 
try with reading one byte at a time with original patch.{quote}
I overlooked that earlier and attributed performance gain to absence of seek 
operation. This makes lot more sense, will try to do some by changing readBytes 
to below:
{code:title=ReverseIndexInputReader.java|borderStyle=solid}  
public byte readByte() throws IOException {
final byte b = this.in.readByte();
this.skipBytes(2);
return b;
}

public void readBytes(byte[] b, int offset, int len) throws IOException {
for (int i=offset+len-1; i>=offset; i--) {
b[i] = this.readByte();
}
}
{code}

bq. {quote}I uploaded a patch that combines these three things: off-heap FST + 
random-access reader + reversal of the FST so it is forward-read. Unit tests 
are passing; I'm running some benchmarks to see what the impact is on 
performance{quote}
That's great Mike. If this works, we don't need the reverse reader. We don't 
even need the random-access reader, as we can simply change readBytes to below:
{code:title=ReverseIndexInputReader.java|borderStyle=solid}  
public void readBytes(byte[] b, int offset, int len) throws IOException {
this.in.readBytes(b, offset, len);
}
{code}


was (Author: akjain):
bq. {quote}Technically we could make things work for existing segments since 
your patch doesn't change the file format.{quote}
[~jpountz] - I'm curious on how this can be done. I looked at the code and it 
seemed that all settings are passed to the segment writer and writer should put 
those settings in codec for reader to consume. Do you have any pointers on this?

{quote}I agree it's a bit unlikely that the terms index gets paged out, but you 
can still end up with a cold FS cache eg. when the host restarts?{quote}
There can be option for preloading terms index during index open. Even though, 
lucene already provides option for preloading mapped buffer 
[here|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java#L95],
 it is done at directory level and not file level. Though, elasticsearch worked 
around that to provide [file level 
setting|https://www.elastic.co/guide/en/elasticsearch/reference/master/_pre_loading_data_into_the_file_system_cache.html]

{quote}For the record, Lucene also performs implicit PK lookups when indexing 
with updateDocument. So this might have an impact on indexing speed as 
well.{quote}
If customer workload is updateDocument heavy, the impact should be minimal, as 
terms index will get loaded into memory after first fault for every page and 
then there should not be any page faults. If customers are sensitive to 
latency, they can use the preload option for terms index.

{quote}Wondering whether avoiding 'array reversal' in the second patch is what 
helped rather than moving to random access and removing skip? May be we should 
try with reading one byte at a time with original patch.{quote}
I overlooked that earlier and attributed performance gain to absence of seek 
operation. Th

[jira] [Commented] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

2019-01-22 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16749180#comment-16749180
 ] 

Ankit Jain commented on LUCENE-8635:


bq. {quote}Technically we could make things work for existing segments since 
your patch doesn't change the file format.{quote}
[~jpountz] - I'm curious on how this can be done. I looked at the code and it 
seemed that all settings are passed to the segment writer and writer should put 
those settings in codec for reader to consume. Do you have any pointers on this?

{quote}I agree it's a bit unlikely that the terms index gets paged out, but you 
can still end up with a cold FS cache eg. when the host restarts?{quote}
There can be option for preloading terms index during index open. Even though, 
lucene already provides option for preloading mapped buffer 
[here|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java#L95],
 it is done at directory level and not file level. Though, elasticsearch worked 
around that to provide [file level 
setting|https://www.elastic.co/guide/en/elasticsearch/reference/master/_pre_loading_data_into_the_file_system_cache.html]

{quote}For the record, Lucene also performs implicit PK lookups when indexing 
with updateDocument. So this might have an impact on indexing speed as 
well.{quote}
If customer workload is updateDocument heavy, the impact should be minimal, as 
terms index will get loaded into memory after first fault for every page and 
then there should not be any page faults. If customers are sensitive to 
latency, they can use the preload option for terms index.

{quote}Wondering whether avoiding 'array reversal' in the second patch is what 
helped rather than moving to random access and removing skip? May be we should 
try with reading one byte at a time with original patch.{quote}
I overlooked that earlier and attributed performance gain to absence of seek 
operation. This makes lot more sense, will try to do some by changing readBytes 
to below:
{{   
public byte readByte() throws IOException {
final byte b = this.in.readByte();
this.skipBytes(2);
return b;
}

public void readBytes(byte[] b, int offset, int len) throws IOException {
for (int i=offset+len-1; i>=offset; i--) {
b[i] = this.readByte();
}
}
}}

bq. {quote}I uploaded a patch that combines these three things: off-heap FST + 
random-access reader + reversal of the FST so it is forward-read. Unit tests 
are passing; I'm running some benchmarks to see what the impact is on 
performance{quote}
That's great Mike. If this works, we don't need the reverse reader. We don't 
even need the random-access reader, as we can simply change readBytes to below:
{{
public void readBytes(byte[] b, int offset, int len) throws IOException {
this.in.readBytes(b, offset, len);
}
}}

> Lazy loading Lucene FST offheap using mmap
> --
>
> Key: LUCENE-8635
> URL: https://issues.apache.org/jira/browse/LUCENE-8635
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs
> Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
>Reporter: Ankit Jain
>Priority: Major
> Attachments: fst-offheap-ra-rev.patch, offheap.patch, ra.patch, 
> rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This 
> causes frequent JVM OOM issues if the term size gets big. A better way of 
> doing this will be to lazily load FST using mmap. That ensures only the 
> required terms get loaded into memory.
>  
> Lucene can expose API for providing list of fields to load terms offheap. I'm 
> planning to take following approach for this:
>  # Add a boolean property fstOffHeap in FieldInfo
>  # Pass list of offheap fields to lucene during index open (ALL can be 
> special keyword for loading ALL fields offheap)
>  # Initialize the fstOffHeap property during lucene index open
>  # FieldReader invokes default FST constructor or OffHeap constructor based 
> on fstOffHeap field
>  
> I created a patch (that loads all fields offheap), did some benchmarks using 
> es_rally and results look good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

2019-01-16 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16744419#comment-16744419
 ] 

Ankit Jain commented on LUCENE-8635:


Thanks [~sokolov] for updating patch and doing another run. As per my 
understanding, seek operation has very less overhead (should be in micro 
seconds), as it just sets the buffer to right position? Maybe the number of 
seek operations is huge and they add up.

> Lazy loading Lucene FST offheap using mmap
> --
>
> Key: LUCENE-8635
> URL: https://issues.apache.org/jira/browse/LUCENE-8635
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs
> Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
>Reporter: Ankit Jain
>Priority: Major
> Attachments: offheap.patch, ra.patch, rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This 
> causes frequent JVM OOM issues if the term size gets big. A better way of 
> doing this will be to lazily load FST using mmap. That ensures only the 
> required terms get loaded into memory.
>  
> Lucene can expose API for providing list of fields to load terms offheap. I'm 
> planning to take following approach for this:
>  # Add a boolean property fstOffHeap in FieldInfo
>  # Pass list of offheap fields to lucene during index open (ALL can be 
> special keyword for loading ALL fields offheap)
>  # Initialize the fstOffHeap property during lucene index open
>  # FieldReader invokes default FST constructor or OffHeap constructor based 
> on fstOffHeap field
>  
> I created a patch (that loads all fields offheap), did some benchmarks using 
> es_rally and results look good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

2019-01-15 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16743577#comment-16743577
 ] 

Ankit Jain commented on LUCENE-8635:


 Rally tests use underlying elasticsearch cluster which use cases other than 
search like log analytics. I ran 1 iteration for multiple data sets and did not 
notice significant performance degradations. Rather, I noticed 6% improvement 
in indexing throughput for all the data sets. Though, I should leave it running 
for more iterations, to get more conclusive evidence.

Thanks [~sokolov] for testing the changes. I think the impact is as expected, 
maybe slightly more for the PKLookup. Do the tests use randomized key for each 
PKLookup query or the keys are reused across queries? That will impact the 
overall throughput as mmap is inherently lazily loaded.

Though, I'm open to exposing per field setting in Lucene, I agree with 
[~dsmiley] about 25% reduction in throughput being tiny fraction of typical 
usage. And, throughput should be better if same keys get used for PKLookup 
queries. Adding per field setting might require code change and will be 
effective only for data indexed using new codec. My knowledge of Lucene 
settings is limited and I might be wrong.

> Lazy loading Lucene FST offheap using mmap
> --
>
> Key: LUCENE-8635
> URL: https://issues.apache.org/jira/browse/LUCENE-8635
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs
> Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
>Reporter: Ankit Jain
>Priority: Major
> Attachments: offheap.patch, rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This 
> causes frequent JVM OOM issues if the term size gets big. A better way of 
> doing this will be to lazily load FST using mmap. That ensures only the 
> required terms get loaded into memory.
>  
> Lucene can expose API for providing list of fields to load terms offheap. I'm 
> planning to take following approach for this:
>  # Add a boolean property fstOffHeap in FieldInfo
>  # Pass list of offheap fields to lucene during index open (ALL can be 
> special keyword for loading ALL fields offheap)
>  # Initialize the fstOffHeap property during lucene index open
>  # FieldReader invokes default FST constructor or OffHeap constructor based 
> on fstOffHeap field
>  
> I created a patch (that loads all fields offheap), did some benchmarks using 
> es_rally and results look good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

2019-01-11 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740855#comment-16740855
 ] 

Ankit Jain edited comment on LUCENE-8635 at 1/12/19 4:58 AM:
-

The excel sheet is big, so pasting here might not help? You have good point 
about moving FSTs off-heap in the default codec as we can always preload mmap 
file during index open as demonstrated 
[here|https://www.elastic.co/guide/en/elasticsearch/reference/master/_pre_loading_data_into_the_file_system_cache.html]

 

I ran the default lucene test suite and couple of tests seem to fail. Though, 
they don't seem to have anything to do with my change:

 

   [junit4] Tests with failures [seed: 1D3ADDF6AE377902]:

   [junit4]   - 
org.apache.solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest.testInactiveShardCleanup

   [junit4]   - 
org.apache.solr.cloud.autoscaling.ScheduledTriggerTest.testTrigger

   [junit4] Execution time total: 1 hour 12 minutes 40 seconds

   [junit4] Tests summary: 833 suites (7 ignored), 4024 tests, 2 failures, 286 
ignored (153 assumptions)

 

UPDATE: The tests passed after retrying individually. 

 


was (Author: akjain):
The excel sheet is big, so pasting here might not help? You have good point 
about moving FSTs off-heap in the default codec as we can always preload mmap 
file during index open as demonstrated 
[here|https://www.elastic.co/guide/en/elasticsearch/reference/master/_pre_loading_data_into_the_file_system_cache.html]

 

I ran the default lucene test suite and couple of tests seem to fail. Though, 
they don't seem to have anything to do with my change:

 

   [junit4] Tests with failures [seed: 1D3ADDF6AE377902]:

   [junit4]   - 
org.apache.solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest.testInactiveShardCleanup

   [junit4]   - 
org.apache.solr.cloud.autoscaling.ScheduledTriggerTest.testTrigger

   [junit4]

   [junit4]

   [junit4] JVM J0:     1.40 ..  4359.18 =  4357.78s

   [junit4] JVM J1:     1.40 ..  4359.35 =  4357.95s

   [junit4] JVM J2:     1.40 ..  4359.30 =  4357.90s

   [junit4] Execution time total: 1 hour 12 minutes 40 seconds

   [junit4] Tests summary: 833 suites (7 ignored), 4024 tests, 2 failures, 286 
ignored (153 assumptions)

 

Details for failing tests

 

NOTE: reproduce with: ant test  -Dtestcase=ScheduledTriggerTest 
-Dtests.method=testTrigger -Dtests.seed=1D3ADDF6AE377902 -Dtests.slow=true 
-Dtests.badapples=true -Dtests.locale=mr-IN -Dtests.timezone=America/St_Lucia 
-Dtests.asserts=true -Dtests.file.encoding=US-ASCII

   [junit4] FAILURE 9.03s J2 | ScheduledTriggerTest.testTrigger <<<

   [junit4]    > Throwable #1: java.lang.AssertionError: expected:<3> but 
was:<2>

   [junit4]    >        at 
__randomizedtesting.SeedInfo.seed([1D3ADDF6AE377902:7EF1EB7437F80A2F]:0)

   [junit4]    >        at 
org.apache.solr.cloud.autoscaling.ScheduledTriggerTest.scheduledTriggerTest(ScheduledTriggerTest.java:113)

   [junit4]    >        at 
org.apache.solr.cloud.autoscaling.ScheduledTriggerTest.testTrigger(ScheduledTriggerTest.java:66)

   [junit4]    >        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

   [junit4]    >        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

   [junit4]    >        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

   [junit4]    >        at 
java.base/java.lang.reflect.Method.invoke(Method.java:564)

   [junit4]    >        at java.base/java.lang.Thread.run(Thread.java:844)

 

NOTE: reproduce with: ant test  -Dtestcase=ScheduledMaintenanceTriggerTest 
-Dtests.method=testInactiveShardCleanup -Dtests.seed=1D3ADDF6AE377902 
-Dtests.slow=true -Dtests.badapples=true -Dtests.locale=ha 
-Dtests.timezone=America/Nome -Dtests.asserts=true 
-Dtests.file.encoding=US-ASCII

   [junit4] FAILURE 2.01s J0 | 
ScheduledMaintenanceTriggerTest.testInactiveShardCleanup <<<

at __randomizedtesting.SeedInfo.seed([1D3ADDF6AE377902:161D84CF745E09]:0)

   [junit4]    >        at 
org.apache.solr.cloud.CloudTestUtils.waitForState(CloudTestUtils.java:70)

   [junit4]    >        at 
org.apache.solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest.testInactiveShardCleanup(ScheduledMaintenanceTriggerTest.java:167)

   [junit4]    >        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

   [junit4]    >        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

   [junit4]    >        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

   [junit4]    >        at 
java.base/java.lang.reflect.Method.invoke(Method.java:564)

   [junit4]    >        at java.base/java.lang.Thread.run(Thread.java:844)

   [junit4]    > Caused by: java.util.concurre

[jira] [Commented] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

2019-01-11 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740997#comment-16740997
 ] 

Ankit Jain commented on LUCENE-8635:


Thanks for the tip Erick. I ran the failing tests individually and they passed!

> Lazy loading Lucene FST offheap using mmap
> --
>
> Key: LUCENE-8635
> URL: https://issues.apache.org/jira/browse/LUCENE-8635
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs
> Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
>Reporter: Ankit Jain
>Priority: Major
> Attachments: offheap.patch, rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This 
> causes frequent JVM OOM issues if the term size gets big. A better way of 
> doing this will be to lazily load FST using mmap. That ensures only the 
> required terms get loaded into memory.
>  
> Lucene can expose API for providing list of fields to load terms offheap. I'm 
> planning to take following approach for this:
>  # Add a boolean property fstOffHeap in FieldInfo
>  # Pass list of offheap fields to lucene during index open (ALL can be 
> special keyword for loading ALL fields offheap)
>  # Initialize the fstOffHeap property during lucene index open
>  # FieldReader invokes default FST constructor or OffHeap constructor based 
> on fstOffHeap field
>  
> I created a patch (that loads all fields offheap), did some benchmarks using 
> es_rally and results look good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

2019-01-11 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740855#comment-16740855
 ] 

Ankit Jain edited comment on LUCENE-8635 at 1/12/19 12:08 AM:
--

The excel sheet is big, so pasting here might not help? You have good point 
about moving FSTs off-heap in the default codec as we can always preload mmap 
file during index open as demonstrated 
[here|https://www.elastic.co/guide/en/elasticsearch/reference/master/_pre_loading_data_into_the_file_system_cache.html]

 

I ran the default lucene test suite and couple of tests seem to fail. Though, 
they don't seem to have anything to do with my change:

 

   [junit4] Tests with failures [seed: 1D3ADDF6AE377902]:

   [junit4]   - 
org.apache.solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest.testInactiveShardCleanup

   [junit4]   - 
org.apache.solr.cloud.autoscaling.ScheduledTriggerTest.testTrigger

   [junit4]

   [junit4]

   [junit4] JVM J0:     1.40 ..  4359.18 =  4357.78s

   [junit4] JVM J1:     1.40 ..  4359.35 =  4357.95s

   [junit4] JVM J2:     1.40 ..  4359.30 =  4357.90s

   [junit4] Execution time total: 1 hour 12 minutes 40 seconds

   [junit4] Tests summary: 833 suites (7 ignored), 4024 tests, 2 failures, 286 
ignored (153 assumptions)

 

Details for failing tests

 

NOTE: reproduce with: ant test  -Dtestcase=ScheduledTriggerTest 
-Dtests.method=testTrigger -Dtests.seed=1D3ADDF6AE377902 -Dtests.slow=true 
-Dtests.badapples=true -Dtests.locale=mr-IN -Dtests.timezone=America/St_Lucia 
-Dtests.asserts=true -Dtests.file.encoding=US-ASCII

   [junit4] FAILURE 9.03s J2 | ScheduledTriggerTest.testTrigger <<<

   [junit4]    > Throwable #1: java.lang.AssertionError: expected:<3> but 
was:<2>

   [junit4]    >        at 
__randomizedtesting.SeedInfo.seed([1D3ADDF6AE377902:7EF1EB7437F80A2F]:0)

   [junit4]    >        at 
org.apache.solr.cloud.autoscaling.ScheduledTriggerTest.scheduledTriggerTest(ScheduledTriggerTest.java:113)

   [junit4]    >        at 
org.apache.solr.cloud.autoscaling.ScheduledTriggerTest.testTrigger(ScheduledTriggerTest.java:66)

   [junit4]    >        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

   [junit4]    >        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

   [junit4]    >        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

   [junit4]    >        at 
java.base/java.lang.reflect.Method.invoke(Method.java:564)

   [junit4]    >        at java.base/java.lang.Thread.run(Thread.java:844)

 

NOTE: reproduce with: ant test  -Dtestcase=ScheduledMaintenanceTriggerTest 
-Dtests.method=testInactiveShardCleanup -Dtests.seed=1D3ADDF6AE377902 
-Dtests.slow=true -Dtests.badapples=true -Dtests.locale=ha 
-Dtests.timezone=America/Nome -Dtests.asserts=true 
-Dtests.file.encoding=US-ASCII

   [junit4] FAILURE 2.01s J0 | 
ScheduledMaintenanceTriggerTest.testInactiveShardCleanup <<<

at __randomizedtesting.SeedInfo.seed([1D3ADDF6AE377902:161D84CF745E09]:0)

   [junit4]    >        at 
org.apache.solr.cloud.CloudTestUtils.waitForState(CloudTestUtils.java:70)

   [junit4]    >        at 
org.apache.solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest.testInactiveShardCleanup(ScheduledMaintenanceTriggerTest.java:167)

   [junit4]    >        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

   [junit4]    >        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

   [junit4]    >        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

   [junit4]    >        at 
java.base/java.lang.reflect.Method.invoke(Method.java:564)

   [junit4]    >        at java.base/java.lang.Thread.run(Thread.java:844)

   [junit4]    > Caused by: java.util.concurrent.TimeoutException: last state: 
DocCollection(ScheduledMaintenanceTriggerTest_collection1//clusterstate.json/6)={

 


was (Author: akjain):
The excel sheet is pretty big, so not sure if pasting it here is good idea. You 
have good point about moving FSTs off-heap in the default codec as we can 
always preload mmap file during index open as demonstrated 
[here|https://www.elastic.co/guide/en/elasticsearch/reference/master/_pre_loading_data_into_the_file_system_cache.html]

 

 

I ran the test suite and couple of tests seem to fail. Though, they don't seem 
to have anything to do with my change:

 

   [junit4] Tests with failures [seed: 1D3ADDF6AE377902]:

   [junit4]   - 
org.apache.solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest.testInactiveShardCleanup

   [junit4]   - 
org.apache.solr.cloud.autoscaling.ScheduledTriggerTest.testTrigger

   [junit4]

   [junit4]

   [junit4] JVM J0:     1.40 ..  4359.18 =  4357.78s

   [junit4] JVM J1:    

[jira] [Comment Edited] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

2019-01-11 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740855#comment-16740855
 ] 

Ankit Jain edited comment on LUCENE-8635 at 1/12/19 12:07 AM:
--

The excel sheet is pretty big, so not sure if pasting it here is good idea. You 
have good point about moving FSTs off-heap in the default codec as we can 
always preload mmap file during index open as demonstrated 
[here|https://www.elastic.co/guide/en/elasticsearch/reference/master/_pre_loading_data_into_the_file_system_cache.html]

 

 

I ran the test suite and couple of tests seem to fail. Though, they don't seem 
to have anything to do with my change:

 

   [junit4] Tests with failures [seed: 1D3ADDF6AE377902]:

   [junit4]   - 
org.apache.solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest.testInactiveShardCleanup

   [junit4]   - 
org.apache.solr.cloud.autoscaling.ScheduledTriggerTest.testTrigger

   [junit4]

   [junit4]

   [junit4] JVM J0:     1.40 ..  4359.18 =  4357.78s

   [junit4] JVM J1:     1.40 ..  4359.35 =  4357.95s

   [junit4] JVM J2:     1.40 ..  4359.30 =  4357.90s

   [junit4] Execution time total: 1 hour 12 minutes 40 seconds

   [junit4] Tests summary: 833 suites (7 ignored), 4024 tests, 2 failures, 286 
ignored (153 assumptions)

 

Details for failing tests

 

NOTE: reproduce with: ant test  -Dtestcase=ScheduledTriggerTest 
-Dtests.method=testTrigger -Dtests.seed=1D3ADDF6AE377902 -Dtests.slow=true 
-Dtests.badapples=true -Dtests.locale=mr-IN -Dtests.timezone=America/St_Lucia 
-Dtests.asserts=true -Dtests.file.encoding=US-ASCII

   [junit4] FAILURE 9.03s J2 | ScheduledTriggerTest.testTrigger <<<

   [junit4]    > Throwable #1: java.lang.AssertionError: expected:<3> but 
was:<2>

   [junit4]    >        at 
__randomizedtesting.SeedInfo.seed([1D3ADDF6AE377902:7EF1EB7437F80A2F]:0)

   [junit4]    >        at 
org.apache.solr.cloud.autoscaling.ScheduledTriggerTest.scheduledTriggerTest(ScheduledTriggerTest.java:113)

   [junit4]    >        at 
org.apache.solr.cloud.autoscaling.ScheduledTriggerTest.testTrigger(ScheduledTriggerTest.java:66)

   [junit4]    >        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

   [junit4]    >        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

   [junit4]    >        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

   [junit4]    >        at 
java.base/java.lang.reflect.Method.invoke(Method.java:564)

   [junit4]    >        at java.base/java.lang.Thread.run(Thread.java:844)

 

NOTE: reproduce with: ant test  -Dtestcase=ScheduledMaintenanceTriggerTest 
-Dtests.method=testInactiveShardCleanup -Dtests.seed=1D3ADDF6AE377902 
-Dtests.slow=true -Dtests.badapples=true -Dtests.locale=ha 
-Dtests.timezone=America/Nome -Dtests.asserts=true 
-Dtests.file.encoding=US-ASCII

   [junit4] FAILURE 2.01s J0 | 
ScheduledMaintenanceTriggerTest.testInactiveShardCleanup <<<

at __randomizedtesting.SeedInfo.seed([1D3ADDF6AE377902:161D84CF745E09]:0)

   [junit4]    >        at 
org.apache.solr.cloud.CloudTestUtils.waitForState(CloudTestUtils.java:70)

   [junit4]    >        at 
org.apache.solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest.testInactiveShardCleanup(ScheduledMaintenanceTriggerTest.java:167)

   [junit4]    >        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

   [junit4]    >        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

   [junit4]    >        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

   [junit4]    >        at 
java.base/java.lang.reflect.Method.invoke(Method.java:564)

   [junit4]    >        at java.base/java.lang.Thread.run(Thread.java:844)

   [junit4]    > Caused by: java.util.concurrent.TimeoutException: last state: 
DocCollection(ScheduledMaintenanceTriggerTest_collection1//clusterstate.json/6)={

 


was (Author: akjain):
I ran the test suite and couple of tests seem to fail. Though, they don't seem 
to have anything to do with my change:

 

   [junit4] Tests with failures [seed: 1D3ADDF6AE377902]:

   [junit4]   - 
org.apache.solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest.testInactiveShardCleanup

   [junit4]   - 
org.apache.solr.cloud.autoscaling.ScheduledTriggerTest.testTrigger

   [junit4]

   [junit4]

   [junit4] JVM J0:     1.40 ..  4359.18 =  4357.78s

   [junit4] JVM J1:     1.40 ..  4359.35 =  4357.95s

   [junit4] JVM J2:     1.40 ..  4359.30 =  4357.90s

   [junit4] Execution time total: 1 hour 12 minutes 40 seconds

   [junit4] Tests summary: 833 suites (7 ignored), 4024 tests, 2 failures, 286 
ignored (153 assumptions)

 

Details for failing tests

 

 NOTE: reproduce with: ant test  -Dtestcase=Sc

[jira] [Commented] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

2019-01-11 Thread Ankit Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-8635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16740855#comment-16740855
 ] 

Ankit Jain commented on LUCENE-8635:


I ran the test suite and couple of tests seem to fail. Though, they don't seem 
to have anything to do with my change:

 

   [junit4] Tests with failures [seed: 1D3ADDF6AE377902]:

   [junit4]   - 
org.apache.solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest.testInactiveShardCleanup

   [junit4]   - 
org.apache.solr.cloud.autoscaling.ScheduledTriggerTest.testTrigger

   [junit4]

   [junit4]

   [junit4] JVM J0:     1.40 ..  4359.18 =  4357.78s

   [junit4] JVM J1:     1.40 ..  4359.35 =  4357.95s

   [junit4] JVM J2:     1.40 ..  4359.30 =  4357.90s

   [junit4] Execution time total: 1 hour 12 minutes 40 seconds

   [junit4] Tests summary: 833 suites (7 ignored), 4024 tests, 2 failures, 286 
ignored (153 assumptions)

 

Details for failing tests

 

 NOTE: reproduce with: ant test  -Dtestcase=ScheduledTriggerTest 
-Dtests.method=testTrigger -Dtests.seed=1D3ADDF6AE377902 -Dtests.slow=true 
-Dtests.badapples=true -Dtests.locale=mr-IN -Dtests.timezone=America/St_Lucia 
-Dtests.asserts=true -Dtests.file.encoding=US-ASCII

   [junit4] FAILURE 9.03s J2 | ScheduledTriggerTest.testTrigger <<<

   [junit4]    > Throwable #1: java.lang.AssertionError: expected:<3> but 
was:<2>

   [junit4]    >        at 
__randomizedtesting.SeedInfo.seed([1D3ADDF6AE377902:7EF1EB7437F80A2F]:0)

   [junit4]    >        at 
org.apache.solr.cloud.autoscaling.ScheduledTriggerTest.scheduledTriggerTest(ScheduledTriggerTest.java:113)

   [junit4]    >        at 
org.apache.solr.cloud.autoscaling.ScheduledTriggerTest.testTrigger(ScheduledTriggerTest.java:66)

   [junit4]    >        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

   [junit4]    >        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

   [junit4]    >        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

   [junit4]    >        at 
java.base/java.lang.reflect.Method.invoke(Method.java:564)

   [junit4]    >        at java.base/java.lang.Thread.run(Thread.java:844)

 

NOTE: reproduce with: ant test  -Dtestcase=ScheduledMaintenanceTriggerTest 
-Dtests.method=testInactiveShardCleanup -Dtests.seed=1D3ADDF6AE377902 
-Dtests.slow=true -Dtests.badapples=true -Dtests.locale=ha 
-Dtests.timezone=America/Nome -Dtests.asserts=true 
-Dtests.file.encoding=US-ASCII

   [junit4] FAILURE 2.01s J0 | 
ScheduledMaintenanceTriggerTest.testInactiveShardCleanup <<<

at __randomizedtesting.SeedInfo.seed([1D3ADDF6AE377902:161D84CF745E09]:0)

   [junit4]    >        at 
org.apache.solr.cloud.CloudTestUtils.waitForState(CloudTestUtils.java:70)

   [junit4]    >        at 
org.apache.solr.cloud.autoscaling.ScheduledMaintenanceTriggerTest.testInactiveShardCleanup(ScheduledMaintenanceTriggerTest.java:167)

   [junit4]    >        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

   [junit4]    >        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

   [junit4]    >        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

   [junit4]    >        at 
java.base/java.lang.reflect.Method.invoke(Method.java:564)

   [junit4]    >        at java.base/java.lang.Thread.run(Thread.java:844)

   [junit4]    > Caused by: java.util.concurrent.TimeoutException: last state: 
DocCollection(ScheduledMaintenanceTriggerTest_collection1//clusterstate.json/6)={

 

> Lazy loading Lucene FST offheap using mmap
> --
>
> Key: LUCENE-8635
> URL: https://issues.apache.org/jira/browse/LUCENE-8635
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/FSTs
> Environment: I used below setup for es_rally tests:
> single node i3.xlarge running ES 6.5
> es_rally was running on another i3.xlarge instance
>Reporter: Ankit Jain
>Priority: Major
> Attachments: offheap.patch, rally_benchmark.xlsx
>
>
> Currently, FST loads all the terms into heap memory during index open. This 
> causes frequent JVM OOM issues if the term size gets big. A better way of 
> doing this will be to lazily load FST using mmap. That ensures only the 
> required terms get loaded into memory.
>  
> Lucene can expose API for providing list of fields to load terms offheap. I'm 
> planning to take following approach for this:
>  # Add a boolean property fstOffHeap in FieldInfo
>  # Pass list of offheap fields to lucene during index open (ALL can be 
> special keyword for loading ALL fields offheap)
>  # Initialize the fstOffHeap property during lucene index open

[jira] [Created] (LUCENE-8635) Lazy loading Lucene FST offheap using mmap

2019-01-11 Thread Ankit Jain (JIRA)
Ankit Jain created LUCENE-8635:
--

 Summary: Lazy loading Lucene FST offheap using mmap
 Key: LUCENE-8635
 URL: https://issues.apache.org/jira/browse/LUCENE-8635
 Project: Lucene - Core
  Issue Type: New Feature
  Components: core/FSTs
 Environment: I used below setup for es_rally tests:

single node i3.xlarge running ES 6.5

es_rally was running on another i3.xlarge instance
Reporter: Ankit Jain
 Attachments: offheap.patch, rally_benchmark.xlsx

Currently, FST loads all the terms into heap memory during index open. This 
causes frequent JVM OOM issues if the term size gets big. A better way of doing 
this will be to lazily load FST using mmap. That ensures only the required 
terms get loaded into memory.

 
Lucene can expose API for providing list of fields to load terms offheap. I'm 
planning to take following approach for this:
 # Add a boolean property fstOffHeap in FieldInfo
 # Pass list of offheap fields to lucene during index open (ALL can be special 
keyword for loading ALL fields offheap)
 # Initialize the fstOffHeap property during lucene index open
 # FieldReader invokes default FST constructor or OffHeap constructor based on 
fstOffHeap field

 
I created a patch (that loads all fields offheap), did some benchmarks using 
es_rally and results look good.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org