[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

2011-07-01 Thread Mitsu Hadeishi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058836#comment-13058836
 ] 

Mitsu Hadeishi commented on SOLR-2462:
--

Oh now you tell us. :) Well, we already built the patched 3.2 so we're going 
with that for now :)

 Using spellcheck.collate can result in extremely high memory usage
 --

 Key: SOLR-2462
 URL: https://issues.apache.org/jira/browse/SOLR-2462
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 3.1
Reporter: James Dyer
Assignee: Robert Muir
Priority: Critical
 Fix For: 3.3, 4.0

 Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch


 When using spellcheck.collate, class SpellPossibilityIterator creates a 
 ranked list of *every* possible correction combination.  But if returning 
 several corrections per term, and if several words are misspelled, the 
 existing algorithm uses a huge amount of memory.
 This bug was introduced with SOLR-2010.  However, it is triggered anytime 
 spellcheck.collate is used.  It is not necessary to use any features that 
 were added with SOLR-2010.
 We were in Production with Solr for 1 1/2 days and this bug started taking 
 our Solr servers down with infinite GC loops.  It was pretty easy for this 
 to happen as occasionally a user will accidently paste the URL into the 
 Search box on our app.  This URL results in a search with ~12 misspelled 
 words.  We have spellcheck.count set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

2011-06-29 Thread Mitsu Hadeishi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057483#comment-13057483
 ] 

Mitsu Hadeishi commented on SOLR-2462:
--

We just ran into this bug when we upgraded to 3.2, and suddenly SOLR was 
blowing up as soon as we built the spellcheck dictionary. I attempted to apply 
the patch to the 3.2 source code tgz file downloadable from 
http://www.apache.org/dyn/closer.cgi/lucene/solr, but it didn't apply cleanly. 
I manually applied the patch, as we're using the released version of 3.2.

 Using spellcheck.collate can result in extremely high memory usage
 --

 Key: SOLR-2462
 URL: https://issues.apache.org/jira/browse/SOLR-2462
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 3.1
Reporter: James Dyer
Assignee: Robert Muir
Priority: Critical
 Fix For: 3.3, 4.0

 Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch


 When using spellcheck.collate, class SpellPossibilityIterator creates a 
 ranked list of *every* possible correction combination.  But if returning 
 several corrections per term, and if several words are misspelled, the 
 existing algorithm uses a huge amount of memory.
 This bug was introduced with SOLR-2010.  However, it is triggered anytime 
 spellcheck.collate is used.  It is not necessary to use any features that 
 were added with SOLR-2010.
 We were in Production with Solr for 1 1/2 days and this bug started taking 
 our Solr servers down with infinite GC loops.  It was pretty easy for this 
 to happen as occasionally a user will accidently paste the URL into the 
 Search box on our app.  This URL results in a search with ~12 misspelled 
 words.  We have spellcheck.count set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

2011-06-29 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13057486#comment-13057486
 ] 

Simon Willnauer commented on SOLR-2462:
---

bq. We just ran into this bug when we upgraded to 3.2
3.3 should be released in the next two days which has a fix for this. So maybe 
you just check the mailinglist for the release mail tomorrow or the day after!

simon

 Using spellcheck.collate can result in extremely high memory usage
 --

 Key: SOLR-2462
 URL: https://issues.apache.org/jira/browse/SOLR-2462
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 3.1
Reporter: James Dyer
Assignee: Robert Muir
Priority: Critical
 Fix For: 3.3, 4.0

 Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch


 When using spellcheck.collate, class SpellPossibilityIterator creates a 
 ranked list of *every* possible correction combination.  But if returning 
 several corrections per term, and if several words are misspelled, the 
 existing algorithm uses a huge amount of memory.
 This bug was introduced with SOLR-2010.  However, it is triggered anytime 
 spellcheck.collate is used.  It is not necessary to use any features that 
 were added with SOLR-2010.
 We were in Production with Solr for 1 1/2 days and this bug started taking 
 our Solr servers down with infinite GC loops.  It was pretty easy for this 
 to happen as occasionally a user will accidently paste the URL into the 
 Search box on our app.  This URL results in a search with ~12 misspelled 
 words.  We have spellcheck.count set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

2011-06-21 Thread Peter Wolanin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052601#comment-13052601
 ] 

Peter Wolanin commented on SOLR-2462:
-

I generated a patch for 3.2 looking at the commit on branch_3x.  It looks 
somewhat different from the last patch by James.

I also just compared the trunk commit to the last patch and it doesn't match 
https://issues.apache.org/jira/secure/attachment/12481574/SOLR-2462.patch  

Did the wrong patch get committed, or was the final patch just never get posted 
to this issue before commit?

 Using spellcheck.collate can result in extremely high memory usage
 --

 Key: SOLR-2462
 URL: https://issues.apache.org/jira/browse/SOLR-2462
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 3.1
Reporter: James Dyer
Assignee: Robert Muir
Priority: Critical
 Fix For: 3.3, 4.0

 Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch


 When using spellcheck.collate, class SpellPossibilityIterator creates a 
 ranked list of *every* possible correction combination.  But if returning 
 several corrections per term, and if several words are misspelled, the 
 existing algorithm uses a huge amount of memory.
 This bug was introduced with SOLR-2010.  However, it is triggered anytime 
 spellcheck.collate is used.  It is not necessary to use any features that 
 were added with SOLR-2010.
 We were in Production with Solr for 1 1/2 days and this bug started taking 
 our Solr servers down with infinite GC loops.  It was pretty easy for this 
 to happen as occasionally a user will accidently paste the URL into the 
 Search box on our app.  This URL results in a search with ~12 misspelled 
 words.  We have spellcheck.count set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

2011-06-21 Thread James Dyer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052623#comment-13052623
 ] 

James Dyer commented on SOLR-2462:
--

Peter,

I reviewed Robert's commits (r1132730 to branch_3x ; r1132729 to trunk), and 
they appear to match the 06/Jun/11 15:10 version of the patch.  I looked mostly 
at the change in TestSpellCheckResponse.java, which is the last tweak that was 
made.  Keep in mind there are a few things that were committed that aren't in 
the patch (changes.txt, etc).  Did you have other specific discrepancies in 
mind?

 Using spellcheck.collate can result in extremely high memory usage
 --

 Key: SOLR-2462
 URL: https://issues.apache.org/jira/browse/SOLR-2462
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 3.1
Reporter: James Dyer
Assignee: Robert Muir
Priority: Critical
 Fix For: 3.3, 4.0

 Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch


 When using spellcheck.collate, class SpellPossibilityIterator creates a 
 ranked list of *every* possible correction combination.  But if returning 
 several corrections per term, and if several words are misspelled, the 
 existing algorithm uses a huge amount of memory.
 This bug was introduced with SOLR-2010.  However, it is triggered anytime 
 spellcheck.collate is used.  It is not necessary to use any features that 
 were added with SOLR-2010.
 We were in Production with Solr for 1 1/2 days and this bug started taking 
 our Solr servers down with infinite GC loops.  It was pretty easy for this 
 to happen as occasionally a user will accidently paste the URL into the 
 Search box on our app.  This URL results in a search with ~12 misspelled 
 words.  We have spellcheck.count set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

2011-06-06 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045028#comment-13045028
 ] 

Robert Muir commented on SOLR-2462:
---

Thanks for the explanation and updated patch James... I'll test this out 
shortly!

 Using spellcheck.collate can result in extremely high memory usage
 --

 Key: SOLR-2462
 URL: https://issues.apache.org/jira/browse/SOLR-2462
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 3.1
Reporter: James Dyer
Assignee: Robert Muir
Priority: Critical
 Fix For: 3.1.1, 4.0

 Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch


 When using spellcheck.collate, class SpellPossibilityIterator creates a 
 ranked list of *every* possible correction combination.  But if returning 
 several corrections per term, and if several words are misspelled, the 
 existing algorithm uses a huge amount of memory.
 This bug was introduced with SOLR-2010.  However, it is triggered anytime 
 spellcheck.collate is used.  It is not necessary to use any features that 
 were added with SOLR-2010.
 We were in Production with Solr for 1 1/2 days and this bug started taking 
 our Solr servers down with infinite GC loops.  It was pretty easy for this 
 to happen as occasionally a user will accidently paste the URL into the 
 Search box on our app.  This URL results in a search with ~12 misspelled 
 words.  We have spellcheck.count set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

2011-06-06 Thread James Dyer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13045082#comment-13045082
 ] 

James Dyer commented on SOLR-2462:
--

I added spellcheck.maxCollationEvaluations to the wiki.  Thanks, Robert for 
taking time helping get this fixed!

 Using spellcheck.collate can result in extremely high memory usage
 --

 Key: SOLR-2462
 URL: https://issues.apache.org/jira/browse/SOLR-2462
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 3.1
Reporter: James Dyer
Assignee: Robert Muir
Priority: Critical
 Fix For: 3.3, 4.0

 Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch


 When using spellcheck.collate, class SpellPossibilityIterator creates a 
 ranked list of *every* possible correction combination.  But if returning 
 several corrections per term, and if several words are misspelled, the 
 existing algorithm uses a huge amount of memory.
 This bug was introduced with SOLR-2010.  However, it is triggered anytime 
 spellcheck.collate is used.  It is not necessary to use any features that 
 were added with SOLR-2010.
 We were in Production with Solr for 1 1/2 days and this bug started taking 
 our Solr servers down with infinite GC loops.  It was pretty easy for this 
 to happen as occasionally a user will accidently paste the URL into the 
 Search box on our app.  This URL results in a search with ~12 misspelled 
 words.  We have spellcheck.count set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

2011-06-03 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13043913#comment-13043913
 ] 

Robert Muir commented on SOLR-2462:
---

Hi, this is looking much better!

To eliminate your last concerns, I think what is needed is a tie-breaker in 
RankedSpellPossibility's comparator, ideally something like a sequential 
identifier.

This way, when the PQ fills, and the lowest score is 20, then subsequent items 
that also have a score of 20 will not be competitive, and rejected by the 
competitive check... remember peek() is constant time :)

This is the way Lucene collectors work (tiebreaker is lucene docid), and the 
way e.g. FuzzyQuery/FuzzyLikeThisQuery works (tiebreaker is term text)


 Using spellcheck.collate can result in extremely high memory usage
 --

 Key: SOLR-2462
 URL: https://issues.apache.org/jira/browse/SOLR-2462
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 3.1
Reporter: James Dyer
Priority: Critical
 Fix For: 3.1.1, 4.0

 Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch


 When using spellcheck.collate, class SpellPossibilityIterator creates a 
 ranked list of *every* possible correction combination.  But if returning 
 several corrections per term, and if several words are misspelled, the 
 existing algorithm uses a huge amount of memory.
 This bug was introduced with SOLR-2010.  However, it is triggered anytime 
 spellcheck.collate is used.  It is not necessary to use any features that 
 were added with SOLR-2010.
 We were in Production with Solr for 1 1/2 days and this bug started taking 
 our Solr servers down with infinite GC loops.  It was pretty easy for this 
 to happen as occasionally a user will accidently paste the URL into the 
 Search box on our app.  This URL results in a search with ~12 misspelled 
 words.  We have spellcheck.count set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

2011-06-03 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13043981#comment-13043981
 ] 

Robert Muir commented on SOLR-2462:
---

great! So what about the 50ms time? can we eliminate this now?

it seems unrelated to this issue (memory usage), as the memory usage is now 
constant... 

I think I would prefer if we want to have a time-based limit that we do this 
independently (and in a way where it can be configured)

Other than this, I think its ready to commit!

 Using spellcheck.collate can result in extremely high memory usage
 --

 Key: SOLR-2462
 URL: https://issues.apache.org/jira/browse/SOLR-2462
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 3.1
Reporter: James Dyer
Priority: Critical
 Fix For: 3.1.1, 4.0

 Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462_3_1.patch


 When using spellcheck.collate, class SpellPossibilityIterator creates a 
 ranked list of *every* possible correction combination.  But if returning 
 several corrections per term, and if several words are misspelled, the 
 existing algorithm uses a huge amount of memory.
 This bug was introduced with SOLR-2010.  However, it is triggered anytime 
 spellcheck.collate is used.  It is not necessary to use any features that 
 were added with SOLR-2010.
 We were in Production with Solr for 1 1/2 days and this bug started taking 
 our Solr servers down with infinite GC loops.  It was pretty easy for this 
 to happen as occasionally a user will accidently paste the URL into the 
 Search box on our app.  This URL results in a search with ~12 misspelled 
 words.  We have spellcheck.count set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

2011-06-03 Thread James Dyer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13043991#comment-13043991
 ] 

James Dyer commented on SOLR-2462:
--

Yeah, the I agree the time limit is a bit of a hack.  On the other hand, the 
list of possibilities it needs to evaluate can get really long really fast.  If 
you're returning 15 or 20 suggestions per word and the user misspells 10 or so 
words, you get a pretty big list of combinations (in our case users were 
pasting the URL in the search box generating a query with 12 misspelled 
words...)  Then again, this latest version is much faster than what I had put 
out there originally...

Maybe we can just put a hard limit on the number of possibilities it will 
evaluate?  It could be really high like a million or something.  We could make 
it a configurable parameter, something like 
spellcheck.maxCollationPossibilitiesToEval , but then again that seems silly. 
 Who would really change it if a million was the default ?

At the end of the day, I'd feel better where I am at if Solr had some kind of 
secondary fallback here.  One thing that really made me nervous about our 
previous search engine is it wasn't terribly hard to send a query over to it 
that would crash the thing or make it churn a long time just to return nothing. 
 So far my experience is that Solr is less prone to this kind of failure and 
I'd really like to keep it that way...

 Using spellcheck.collate can result in extremely high memory usage
 --

 Key: SOLR-2462
 URL: https://issues.apache.org/jira/browse/SOLR-2462
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 3.1
Reporter: James Dyer
Priority: Critical
 Fix For: 3.1.1, 4.0

 Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462_3_1.patch


 When using spellcheck.collate, class SpellPossibilityIterator creates a 
 ranked list of *every* possible correction combination.  But if returning 
 several corrections per term, and if several words are misspelled, the 
 existing algorithm uses a huge amount of memory.
 This bug was introduced with SOLR-2010.  However, it is triggered anytime 
 spellcheck.collate is used.  It is not necessary to use any features that 
 were added with SOLR-2010.
 We were in Production with Solr for 1 1/2 days and this bug started taking 
 our Solr servers down with infinite GC loops.  It was pretty easy for this 
 to happen as occasionally a user will accidently paste the URL into the 
 Search box on our app.  This URL results in a search with ~12 misspelled 
 words.  We have spellcheck.count set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

2011-06-03 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13043992#comment-13043992
 ] 

Robert Muir commented on SOLR-2462:
---

{quote}
Maybe we can just put a hard limit on the number of possibilities it will 
evaluate? It could be really high like a million or something. We could make it 
a configurable parameter, something like 
spellcheck.maxCollationPossibilitiesToEval , but then again that seems silly. 
Who would really change it if a million was the default ?
{quote}

Well, I think this sounds much better than being time-based? And you know, use 
your best judgement as a default, definitely I'm ok with it as long as its 
configurable and has good defaults.


 Using spellcheck.collate can result in extremely high memory usage
 --

 Key: SOLR-2462
 URL: https://issues.apache.org/jira/browse/SOLR-2462
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 3.1
Reporter: James Dyer
Priority: Critical
 Fix For: 3.1.1, 4.0

 Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462_3_1.patch


 When using spellcheck.collate, class SpellPossibilityIterator creates a 
 ranked list of *every* possible correction combination.  But if returning 
 several corrections per term, and if several words are misspelled, the 
 existing algorithm uses a huge amount of memory.
 This bug was introduced with SOLR-2010.  However, it is triggered anytime 
 spellcheck.collate is used.  It is not necessary to use any features that 
 were added with SOLR-2010.
 We were in Production with Solr for 1 1/2 days and this bug started taking 
 our Solr servers down with infinite GC loops.  It was pretty easy for this 
 to happen as occasionally a user will accidently paste the URL into the 
 Search box on our app.  This URL results in a search with ~12 misspelled 
 words.  We have spellcheck.count set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

2011-06-03 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13044125#comment-13044125
 ] 

Robert Muir commented on SOLR-2462:
---

Hi James, when applying the latest patch, I noticed a test fail:
{noformat}
[junit] Testsuite: 
org.apache.solr.client.solrj.response.TestSpellCheckResponse
[junit] Testcase: 
testSpellCheckCollationResponse(org.apache.solr.client.solrj.response.TestSpellCheckResponse):
FAILED
[junit] 
[junit] junit.framework.AssertionFailedError: 
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1348)
[junit] at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1266)
[junit] at 
org.apache.solr.client.solrj.response.TestSpellCheckResponse.testSpellCheckCollationResponse(TestSpellCheckResponse.java:153)
{noformat}

This seemed odd... maybe a comparator is off somewhere?

 Using spellcheck.collate can result in extremely high memory usage
 --

 Key: SOLR-2462
 URL: https://issues.apache.org/jira/browse/SOLR-2462
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 3.1
Reporter: James Dyer
Assignee: Robert Muir
Priority: Critical
 Fix For: 3.1.1, 4.0

 Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462.patch, SOLR-2462_3_1.patch


 When using spellcheck.collate, class SpellPossibilityIterator creates a 
 ranked list of *every* possible correction combination.  But if returning 
 several corrections per term, and if several words are misspelled, the 
 existing algorithm uses a huge amount of memory.
 This bug was introduced with SOLR-2010.  However, it is triggered anytime 
 spellcheck.collate is used.  It is not necessary to use any features that 
 were added with SOLR-2010.
 We were in Production with Solr for 1 1/2 days and this bug started taking 
 our Solr servers down with infinite GC loops.  It was pretty easy for this 
 to happen as occasionally a user will accidently paste the URL into the 
 Search box on our app.  This URL results in a search with ~12 misspelled 
 words.  We have spellcheck.count set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

2011-06-02 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042675#comment-13042675
 ] 

Robert Muir commented on SOLR-2462:
---

Hi James, this sounds like an important issue to fix!

I'm sorry you have all these open improvements to the spellchecker, lets see if 
we can try to get some of them resolved.

{quote}
This sets the maximum limit to 1000 possibilities. When this limit is reached, 
the list is sorted by rank then reduced to the top 100. From then on, only 
collations with a rank equal or better than the 100th are added. This process 
repeats until finished or until it has taken 50ms, at which time it quits.
{quote}

Maybe we want to use a priority queue instead? It sorta seems like this is what 
you are doing.


 Using spellcheck.collate can result in extremely high memory usage
 --

 Key: SOLR-2462
 URL: https://issues.apache.org/jira/browse/SOLR-2462
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 3.1
Reporter: James Dyer
Priority: Critical
 Fix For: 3.1.1, 4.0

 Attachments: SOLR-2462.patch, SOLR-2462_3_1.patch


 When using spellcheck.collate, class SpellPossibilityIterator creates a 
 ranked list of *every* possible correction combination.  But if returning 
 several corrections per term, and if several words are misspelled, the 
 existing algorithm uses a huge amount of memory.
 This bug was introduced with SOLR-2010.  However, it is triggered anytime 
 spellcheck.collate is used.  It is not necessary to use any features that 
 were added with SOLR-2010.
 We were in Production with Solr for 1 1/2 days and this bug started taking 
 our Solr servers down with infinite GC loops.  It was pretty easy for this 
 to happen as occasionally a user will accidently paste the URL into the 
 Search box on our app.  This URL results in a search with ~12 misspelled 
 words.  We have spellcheck.count set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

2011-06-02 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042885#comment-13042885
 ] 

Robert Muir commented on SOLR-2462:
---

bq. 1. With a PriorityQueue, there is no simple way to get the 100th element 
and find its rank to determine whether or not to add subsequent elements.

Hi James, you might consider using peek() here to check

as an example, you can take a look at the main loop of DirectSpellChecker: 
suggestSimilar(Term, int, IndexReader, int, int, float, CharsRef)
http://svn.apache.org/repos/asf/lucene/dev/trunk/modules/suggest/src/java/org/apache/lucene/search/spell/DirectSpellChecker.java

 Using spellcheck.collate can result in extremely high memory usage
 --

 Key: SOLR-2462
 URL: https://issues.apache.org/jira/browse/SOLR-2462
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 3.1
Reporter: James Dyer
Priority: Critical
 Fix For: 3.1.1, 4.0

 Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch


 When using spellcheck.collate, class SpellPossibilityIterator creates a 
 ranked list of *every* possible correction combination.  But if returning 
 several corrections per term, and if several words are misspelled, the 
 existing algorithm uses a huge amount of memory.
 This bug was introduced with SOLR-2010.  However, it is triggered anytime 
 spellcheck.collate is used.  It is not necessary to use any features that 
 were added with SOLR-2010.
 We were in Production with Solr for 1 1/2 days and this bug started taking 
 our Solr servers down with infinite GC loops.  It was pretty easy for this 
 to happen as occasionally a user will accidently paste the URL into the 
 Search box on our app.  This URL results in a search with ~12 misspelled 
 words.  We have spellcheck.count set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

2011-06-02 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042940#comment-13042940
 ] 

Robert Muir commented on SOLR-2462:
---

Now that we have a PQ (thanks for iterating here!), maybe we should just bound 
the size by some value (say 20, or whatever the top-N the user requested is).

Then, we only add competitive possibilities always once the size reaches 20.
(this is how the spellchecker link i provided works, see the line // possibly 
drop entries from queue)



 Using spellcheck.collate can result in extremely high memory usage
 --

 Key: SOLR-2462
 URL: https://issues.apache.org/jira/browse/SOLR-2462
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 3.1
Reporter: James Dyer
Priority: Critical
 Fix For: 3.1.1, 4.0

 Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462_3_1.patch


 When using spellcheck.collate, class SpellPossibilityIterator creates a 
 ranked list of *every* possible correction combination.  But if returning 
 several corrections per term, and if several words are misspelled, the 
 existing algorithm uses a huge amount of memory.
 This bug was introduced with SOLR-2010.  However, it is triggered anytime 
 spellcheck.collate is used.  It is not necessary to use any features that 
 were added with SOLR-2010.
 We were in Production with Solr for 1 1/2 days and this bug started taking 
 our Solr servers down with infinite GC loops.  It was pretty easy for this 
 to happen as occasionally a user will accidently paste the URL into the 
 Search box on our app.  This URL results in a search with ~12 misspelled 
 words.  We have spellcheck.count set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

2011-06-02 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13042988#comment-13042988
 ] 

Robert Muir commented on SOLR-2462:
---

OK, this is looking better, but we can now keep the PQ at a fixed size, pretend 
for this example that maximumRequiredSuggestions = 20.

the idea is after your offer(), you check if the pq's size is  
maximumRequiredSuggestions (say its 21), and then if so, you poll() to remove 
the lowest element (its now 20 again)

this way, the size never grows past 20... and i then think you can then remove 
the rankedPossibilities  1 check because it will never get larger than the 
user's requested amount.


 Using spellcheck.collate can result in extremely high memory usage
 --

 Key: SOLR-2462
 URL: https://issues.apache.org/jira/browse/SOLR-2462
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 3.1
Reporter: James Dyer
Priority: Critical
 Fix For: 3.1.1, 4.0

 Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch


 When using spellcheck.collate, class SpellPossibilityIterator creates a 
 ranked list of *every* possible correction combination.  But if returning 
 several corrections per term, and if several words are misspelled, the 
 existing algorithm uses a huge amount of memory.
 This bug was introduced with SOLR-2010.  However, it is triggered anytime 
 spellcheck.collate is used.  It is not necessary to use any features that 
 were added with SOLR-2010.
 We were in Production with Solr for 1 1/2 days and this bug started taking 
 our Solr servers down with infinite GC loops.  It was pretty easy for this 
 to happen as occasionally a user will accidently paste the URL into the 
 Search box on our app.  This URL results in a search with ~12 misspelled 
 words.  We have spellcheck.count set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

2011-06-02 Thread James Dyer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13043018#comment-13043018
 ] 

James Dyer commented on SOLR-2462:
--

Robert,

I'm not sure we can do this last suggestion.  If as the queue grows we poll(), 
we will in fact be discarding all of the best (lowest-ranked) suggestions.  
What we would need to do is insert the 21st element, letting it fall into its 
place in order, and then we'd need an operation to remove the _worst_ 
suggestion from the tail of the queue.  Looking at PriorityQueue's API docs I'm 
not sure there is a method exposed that does that.

Did I miss something here?

 Using spellcheck.collate can result in extremely high memory usage
 --

 Key: SOLR-2462
 URL: https://issues.apache.org/jira/browse/SOLR-2462
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 3.1
Reporter: James Dyer
Priority: Critical
 Fix For: 3.1.1, 4.0

 Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch


 When using spellcheck.collate, class SpellPossibilityIterator creates a 
 ranked list of *every* possible correction combination.  But if returning 
 several corrections per term, and if several words are misspelled, the 
 existing algorithm uses a huge amount of memory.
 This bug was introduced with SOLR-2010.  However, it is triggered anytime 
 spellcheck.collate is used.  It is not necessary to use any features that 
 were added with SOLR-2010.
 We were in Production with Solr for 1 1/2 days and this bug started taking 
 our Solr servers down with infinite GC loops.  It was pretty easy for this 
 to happen as occasionally a user will accidently paste the URL into the 
 Search box on our app.  This URL results in a search with ~12 misspelled 
 words.  We have spellcheck.count set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

2011-06-02 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13043058#comment-13043058
 ] 

Robert Muir commented on SOLR-2462:
---

this the general workflow

1. check if competitive
2. offer
3. if  size, poll.

there are more examples than mine in directspellchecker, we use this in lots of 
places.

 Using spellcheck.collate can result in extremely high memory usage
 --

 Key: SOLR-2462
 URL: https://issues.apache.org/jira/browse/SOLR-2462
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 3.1
Reporter: James Dyer
Priority: Critical
 Fix For: 3.1.1, 4.0

 Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch


 When using spellcheck.collate, class SpellPossibilityIterator creates a 
 ranked list of *every* possible correction combination.  But if returning 
 several corrections per term, and if several words are misspelled, the 
 existing algorithm uses a huge amount of memory.
 This bug was introduced with SOLR-2010.  However, it is triggered anytime 
 spellcheck.collate is used.  It is not necessary to use any features that 
 were added with SOLR-2010.
 We were in Production with Solr for 1 1/2 days and this bug started taking 
 our Solr servers down with infinite GC loops.  It was pretty easy for this 
 to happen as occasionally a user will accidently paste the URL into the 
 Search box on our app.  This URL results in a search with ~12 misspelled 
 words.  We have spellcheck.count set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

2011-06-02 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13043100#comment-13043100
 ] 

Robert Muir commented on SOLR-2462:
---

James sorry for my short-barely-comprehensible response... in addition what i 
should have said is that if poll() returns the best suggestion, you need to 
reverse the comparator for this to work :)

 Using spellcheck.collate can result in extremely high memory usage
 --

 Key: SOLR-2462
 URL: https://issues.apache.org/jira/browse/SOLR-2462
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 3.1
Reporter: James Dyer
Priority: Critical
 Fix For: 3.1.1, 4.0

 Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
 SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch


 When using spellcheck.collate, class SpellPossibilityIterator creates a 
 ranked list of *every* possible correction combination.  But if returning 
 several corrections per term, and if several words are misspelled, the 
 existing algorithm uses a huge amount of memory.
 This bug was introduced with SOLR-2010.  However, it is triggered anytime 
 spellcheck.collate is used.  It is not necessary to use any features that 
 were added with SOLR-2010.
 We were in Production with Solr for 1 1/2 days and this bug started taking 
 our Solr servers down with infinite GC loops.  It was pretty easy for this 
 to happen as occasionally a user will accidently paste the URL into the 
 Search box on our app.  This URL results in a search with ~12 misspelled 
 words.  We have spellcheck.count set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org