[jira] [Updated] (LUCENE-4556) FuzzyTermsEnum creates tons of objects

2014-05-28 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-4556:


Assignee: Michael McCandless  (was: Simon Willnauer)

 FuzzyTermsEnum creates tons of objects
 --

 Key: LUCENE-4556
 URL: https://issues.apache.org/jira/browse/LUCENE-4556
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search, modules/spellchecker
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Michael McCandless
Priority: Critical
 Fix For: 4.9, 5.0

 Attachments: LUCENE-4556.patch, LUCENE-4556.patch


 I ran into this problem in production using the DirectSpellchecker. The 
 number of objects created by the spellchecker shoot through the roof very 
 very quickly. We ran about 130 queries and ended up with  2M transitions / 
 states. We spend 50% of the time in GC just because of transitions. Other 
 parts of the system behave just fine here.
 I talked quickly to robert and gave a POC a shot providing a 
 LevenshteinAutomaton#toRunAutomaton(prefix, n) method to optimize this case 
 and build a array based strucuture converted into UTF-8 directly instead of 
 going through the object based APIs. This involved quite a bit of changes but 
 they are all package private at this point. I have a patch that still has a 
 fair set of nocommits but its shows that its possible and IMO worth the 
 trouble to make this really useable in production. All tests pass with the 
 patch - its a start



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4556) FuzzyTermsEnum creates tons of objects

2014-03-15 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4556:
-

Fix Version/s: (was: 4.7)
   4.8

 FuzzyTermsEnum creates tons of objects
 --

 Key: LUCENE-4556
 URL: https://issues.apache.org/jira/browse/LUCENE-4556
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search, modules/spellchecker
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Critical
 Fix For: 4.8

 Attachments: LUCENE-4556.patch, LUCENE-4556.patch


 I ran into this problem in production using the DirectSpellchecker. The 
 number of objects created by the spellchecker shoot through the roof very 
 very quickly. We ran about 130 queries and ended up with  2M transitions / 
 states. We spend 50% of the time in GC just because of transitions. Other 
 parts of the system behave just fine here.
 I talked quickly to robert and gave a POC a shot providing a 
 LevenshteinAutomaton#toRunAutomaton(prefix, n) method to optimize this case 
 and build a array based strucuture converted into UTF-8 directly instead of 
 going through the object based APIs. This involved quite a bit of changes but 
 they are all package private at this point. I have a patch that still has a 
 fair set of nocommits but its shows that its possible and IMO worth the 
 trouble to make this really useable in production. All tests pass with the 
 patch - its a start



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4556) FuzzyTermsEnum creates tons of objects

2013-05-09 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-4556:
--

Fix Version/s: (was: 4.3)
   4.4

 FuzzyTermsEnum creates tons of objects
 --

 Key: LUCENE-4556
 URL: https://issues.apache.org/jira/browse/LUCENE-4556
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search, modules/spellchecker
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Critical
 Fix For: 4.4

 Attachments: LUCENE-4556.patch, LUCENE-4556.patch


 I ran into this problem in production using the DirectSpellchecker. The 
 number of objects created by the spellchecker shoot through the roof very 
 very quickly. We ran about 130 queries and ended up with  2M transitions / 
 states. We spend 50% of the time in GC just because of transitions. Other 
 parts of the system behave just fine here.
 I talked quickly to robert and gave a POC a shot providing a 
 LevenshteinAutomaton#toRunAutomaton(prefix, n) method to optimize this case 
 and build a array based strucuture converted into UTF-8 directly instead of 
 going through the object based APIs. This involved quite a bit of changes but 
 they are all package private at this point. I have a patch that still has a 
 fair set of nocommits but its shows that its possible and IMO worth the 
 trouble to make this really useable in production. All tests pass with the 
 patch - its a start

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4556) FuzzyTermsEnum creates tons of objects

2013-01-15 Thread Steve Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated LUCENE-4556:
---

Fix Version/s: (was: 4.1)
   4.2

 FuzzyTermsEnum creates tons of objects
 --

 Key: LUCENE-4556
 URL: https://issues.apache.org/jira/browse/LUCENE-4556
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search, modules/spellchecker
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Critical
 Fix For: 4.2, 5.0

 Attachments: LUCENE-4556.patch, LUCENE-4556.patch


 I ran into this problem in production using the DirectSpellchecker. The 
 number of objects created by the spellchecker shoot through the roof very 
 very quickly. We ran about 130 queries and ended up with  2M transitions / 
 states. We spend 50% of the time in GC just because of transitions. Other 
 parts of the system behave just fine here.
 I talked quickly to robert and gave a POC a shot providing a 
 LevenshteinAutomaton#toRunAutomaton(prefix, n) method to optimize this case 
 and build a array based strucuture converted into UTF-8 directly instead of 
 going through the object based APIs. This involved quite a bit of changes but 
 they are all package private at this point. I have a patch that still has a 
 fair set of nocommits but its shows that its possible and IMO worth the 
 trouble to make this really useable in production. All tests pass with the 
 patch - its a start

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4556) FuzzyTermsEnum creates tons of objects

2012-11-18 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4556:
---

Attachment: LUCENE-4556.patch

I'm attaching a possible alternate way to reduce objects ... it's
only just a start ...

I created a new LightAutomaton class (I'm not wed to that name!) which
places a severe append only restriction on how you are allowed to
build up the FSA: you must add all transitions for a given state
before adding another state's transitions.

It operates with only int state, and stores all transitions in a
private int[].

This is a big restriction, but I think a number of our FSA ops would
work fine with this.  I'm pretty sure building the LevA, and doing the
UTF32-UTF8 conversion would work fine append-only...

In the patch, I added Automaton.toLightAutomaton to convert from
heavy to LightAutomaton, and then fixed CompiledAutomaton (and its
consumers) to use that.  Tests pass.

I think it shouldn't be too hard to cut over the Lev building to this
too ... but wanted to get feedback first.

Simon, it'd be great if you could try this patch on your benchmark
since I can't reproduce the too-heavy GC in my benchmark ... I'm
particularly curious whether the 50% time spent in GC you see is due
to 1) creating too many objects vs 2) holding onto those objects for
too long (in CompiledAutomaton, while the query runs...).  So this
patch would test whether it's case 2).


 FuzzyTermsEnum creates tons of objects
 --

 Key: LUCENE-4556
 URL: https://issues.apache.org/jira/browse/LUCENE-4556
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search, modules/spellchecker
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Critical
 Fix For: 4.1, 5.0

 Attachments: LUCENE-4556.patch, LUCENE-4556.patch


 I ran into this problem in production using the DirectSpellchecker. The 
 number of objects created by the spellchecker shoot through the roof very 
 very quickly. We ran about 130 queries and ended up with  2M transitions / 
 states. We spend 50% of the time in GC just because of transitions. Other 
 parts of the system behave just fine here.
 I talked quickly to robert and gave a POC a shot providing a 
 LevenshteinAutomaton#toRunAutomaton(prefix, n) method to optimize this case 
 and build a array based strucuture converted into UTF-8 directly instead of 
 going through the object based APIs. This involved quite a bit of changes but 
 they are all package private at this point. I have a patch that still has a 
 fair set of nocommits but its shows that its possible and IMO worth the 
 trouble to make this really useable in production. All tests pass with the 
 patch - its a start

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4556) FuzzyTermsEnum creates tons of objects

2012-11-13 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-4556:


Attachment: LUCENE-4556.patch

here is a patch ...scaryâ„¢

 FuzzyTermsEnum creates tons of objects
 --

 Key: LUCENE-4556
 URL: https://issues.apache.org/jira/browse/LUCENE-4556
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search, modules/spellchecker
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
Priority: Critical
 Fix For: 4.1, 5.0

 Attachments: LUCENE-4556.patch


 I ran into this problem in production using the DirectSpellchecker. The 
 number of objects created by the spellchecker shoot through the roof very 
 very quickly. We ran about 130 queries and ended up with  2M transitions / 
 states. We spend 50% of the time in GC just because of transitions. Other 
 parts of the system behave just fine here.
 I talked quickly to robert and gave a POC a shot providing a 
 LevenshteinAutomaton#toRunAutomaton(prefix, n) method to optimize this case 
 and build a array based strucuture converted into UTF-8 directly instead of 
 going through the object based APIs. This involved quite a bit of changes but 
 they are all package private at this point. I have a patch that still has a 
 fair set of nocommits but its shows that its possible and IMO worth the 
 trouble to make this really useable in production. All tests pass with the 
 patch - its a start

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org