[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-27 Thread Koji Sekiguchi (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677639#action_12677639 ] Koji Sekiguchi commented on LUCENE-1500: Peter, thank you. bq. In the thread you

[jira] Updated: (LUCENE-1550) Add N-Gram String Matching for Spell Checking

2009-02-27 Thread Thomas Morton (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Morton updated LUCENE-1550: -- Attachment: LUCENE-1550.patch Patch includes implementation of n-gram string matching. This i

[jira] Created: (LUCENE-1550) Add N-Gram String Matching for Spell Checking

2009-02-27 Thread Thomas Morton (JIRA)
Add N-Gram String Matching for Spell Checking - Key: LUCENE-1550 URL: https://issues.apache.org/jira/browse/LUCENE-1550 Project: Lucene - Java Issue Type: New Feature Components: contrib/

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-27 Thread Peter Wolanin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677629#action_12677629 ] Peter Wolanin commented on LUCENE-1500: --- Koji - thanks - I was aware that not all wo

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-27 Thread Koji Sekiguchi (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677626#action_12677626 ] Koji Sekiguchi commented on LUCENE-1500: bq. for that field type. I will investiga

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-27 Thread Peter Wolanin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677620#action_12677620 ] Peter Wolanin commented on LUCENE-1500: --- Ah, it occurs to me that we first saw this

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-27 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677608#action_12677608 ] Hoss Man commented on LUCENE-1500: -- Peter: i tried some experiments with teh analyzer spe

[jira] Commented: (LUCENE-1186) [PATCH] Clear ThreadLocal instances in close()

2009-02-27 Thread Robert Starzer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677601#action_12677601 ] Robert Starzer commented on LUCENE-1186: "EG are you picturing a single opaque cla

[jira] Commented: (LUCENE-1186) [PATCH] Clear ThreadLocal instances in close()

2009-02-27 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677600#action_12677600 ] Michael McCandless commented on LUCENE-1186: bq. -> is this true (only for Seg

[jira] Commented: (LUCENE-1186) [PATCH] Clear ThreadLocal instances in close()

2009-02-27 Thread Robert Starzer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677584#action_12677584 ] Robert Starzer commented on LUCENE-1186: ok thanks! https://issues.apache.org/jir

Re: Bitmap index

2009-02-27 Thread Earwin Burrfoot
> Maybe we can use the > compression technology mentioned in this Wikipedia article to further > optimize filters and their DocIdSetIterators. We already use WAH-encoded bitmap filters over here for roughly a year. And yes, they are nice. -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) H

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-27 Thread Peter Wolanin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677561#action_12677561 ] Peter Wolanin commented on LUCENE-1500: --- I'm still trying to get a handle on how the

Re: Bitmap index

2009-02-27 Thread Michael McCandless
Right, I think Lucene could decide under-the-hood what's the best data structure when writing the column-stride field. Sort of like how BitVector has two ways (sparse vs unsparse) of storing itself on disk. Mike Otis Gospodnetic wrote: So that would require Lucene to dynamically/periodically

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-27 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677551#action_12677551 ] Michael McCandless commented on LUCENE-1500: But that class either uses TermVe

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-27 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677549#action_12677549 ] Hoss Man commented on LUCENE-1500: -- bq. The extent of my tracing suggests it's coming whe

Re: Bitmap index

2009-02-27 Thread Otis Gospodnetic
So that would require Lucene to dynamically/periodically check field values and their frequencies and switch from a regular inverted index to a bitmap index or just create an additional bitmap index for those fields and their values? Otis - Original Message > From: Michael McCandles

Re: Bitmap index

2009-02-27 Thread Otis Gospodnetic
OK, so that bit about filters, OpenBitSet and friends was my feeling/understanding, too. That sort of matches what that Wikipedia page describes as in-memory usage of bitmaps a la PostgreSQL. The reason I mentioned Solr is because I was thinking of low-cardinality fields, perhaps the same on

Re: Bitmap index

2009-02-27 Thread Michael McCandless
I think with column stride fields we should use Bitmap Index to represent fields that have few values across many docs. Mike Uwe Schindler wrote: In my opinion, we currently use some type of bitmap index with our filters. OpenBitSet and SortedVIntList used in filters can be seen as bitmap

[jira] Commented: (LUCENE-1186) [PATCH] Clear ThreadLocal instances in close()

2009-02-27 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677543#action_12677543 ] Michael McCandless commented on LUCENE-1186: bq. could you please explain to m

RE: Bitmap index

2009-02-27 Thread Uwe Schindler
In my opinion, we currently use some type of bitmap index with our filters. OpenBitSet and SortedVIntList used in filters can be seen as bitmap indexes specifying if a document is a hit of the filter or not. Maybe we can use the compression technology mentioned in this Wikipedia article to further

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-27 Thread Peter Wolanin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677540#action_12677540 ] Peter Wolanin commented on LUCENE-1500: --- I am using Solr, but with a single value fi

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-27 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677535#action_12677535 ] Michael McCandless commented on LUCENE-1500: I thought the bug was in the anal

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-27 Thread Peter Wolanin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677531#action_12677531 ] Peter Wolanin commented on LUCENE-1500: --- The bug we are seeing now happens on pretty

[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

2009-02-27 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677533#action_12677533 ] Michael McCandless commented on LUCENE-1516: bq. Does this mean that the point

[jira] Updated: (LUCENE-1314) IndexReader.clone

2009-02-27 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1314: --- Attachment: LUCENE-1314.patch Attached patch. I plan to commit in a day or two. O

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-27 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677524#action_12677524 ] Michael McCandless commented on LUCENE-1500: bq. This feels to me like one of

[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

2009-02-27 Thread Jeremy Volkman (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677523#action_12677523 ] Jeremy Volkman commented on LUCENE-1516: I noticed the comments about IW.getReader

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-27 Thread Peter Wolanin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677517#action_12677517 ] Peter Wolanin commented on LUCENE-1500: --- Well, this patch does not (obviously) solve

Bitmap index

2009-02-27 Thread Otis Gospodnetic
Hi, I've had http://en.wikipedia.org/wiki/Bitmap_index open in my browser for weeks, thinking I'd bring it up here -- would a bitmap index make sense anywhere in Lucene (or perhaps Solr)? Otis - To unsubscribe, e-mail: java-d

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-27 Thread Mark Harwood (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677507#action_12677507 ] Mark Harwood commented on LUCENE-1500: -- OK - choices are: 1) Throw a RuntimeExceptio

Re: segments.gen file

2009-02-27 Thread Michael McCandless
Michael Busch wrote: On 2/26/09 1:50 PM, Michael McCandless wrote: Michael Busch wrote: On 2/24/09 4:05 AM, Michael McCandless wrote: I believe we still need this, for remote filesystems (like NFS) that have inconsistent client-side caching. The fsync() ensures the local IO system has

Re: segments.gen file

2009-02-27 Thread Michael Busch
On 2/26/09 1:50 PM, Michael McCandless wrote: Michael Busch wrote: On 2/24/09 4:05 AM, Michael McCandless wrote: I believe we still need this, for remote filesystems (like NFS) that have inconsistent client-side caching. The fsync() ensures the local IO system has moved the bytes/file me

[jira] Commented: (LUCENE-1186) [PATCH] Clear ThreadLocal instances in close()

2009-02-27 Thread Robert Starzer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677423#action_12677423 ] Robert Starzer commented on LUCENE-1186: you could use e.g. spring and specific sp

Re: Getting tokens from search results. Simple concept

2009-02-27 Thread HPDrifter
Yes, I have but it is too memory intensive. I used highlighter as my first attempt but it was not a good solution because, I have to send the entire text to highlighter. What I did instead is similar to your suggestion. 1. use the analyzer to return me a token stream. 2. search the token stre

[jira] Commented: (LUCENE-1186) [PATCH] Clear ThreadLocal instances in close()

2009-02-27 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677376#action_12677376 ] Michael McCandless commented on LUCENE-1186: bq. IMHO, some kind of IOC contai

Re: 2.4.1 release?

2009-02-27 Thread Michael McCandless
OK, we're now down to 3 2.4.1 issues: https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=true&mode=hide&sorter/order=DESC&sorter/field=priority&resolution=-1&pid=12310110&fixfor=12313516 I've got 2 of them and I think Mark Harwood has the 3rd (LUCENE-1500). Once we get this dow

[jira] Resolved: (LUCENE-1548) LevenshteinDistance code normalization is incorrect

2009-02-27 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1548. Resolution: Fixed Fix Version/s: 2.9 Thanks Thomas! > LevenshteinDistance

[jira] Assigned: (LUCENE-1548) LevenshteinDistance code normalization is incorrect

2009-02-27 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1548: -- Assignee: Michael McCandless > LevenshteinDistance code normalization is incor

[jira] Commented: (LUCENE-1548) LevenshteinDistance code normalization is incorrect

2009-02-27 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677373#action_12677373 ] Michael McCandless commented on LUCENE-1548: Looks good, I'll commit. Thanks

[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException

2009-02-27 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677372#action_12677372 ] Michael McCandless commented on LUCENE-1500: Mark, do you want/have time to ta

[jira] Updated: (LUCENE-1549) Strengthen CheckIndex a bit

2009-02-27 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1549: --- Attachment: LUCENE-1549.patch Attached patch. I plan to commit in a day or so, to 2

[jira] Created: (LUCENE-1549) Strengthen CheckIndex a bit

2009-02-27 Thread Michael McCandless (JIRA)
Strengthen CheckIndex a bit --- Key: LUCENE-1549 URL: https://issues.apache.org/jira/browse/LUCENE-1549 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4

[jira] Resolved: (LUCENE-1546) Add IndexReader.flush(commitUserData)

2009-02-27 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1546. Resolution: Fixed Committed revision 748493. Thanks Jason! > Add IndexReader.flu

[jira] Commented: (LUCENE-1186) [PATCH] Clear ThreadLocal instances in close()

2009-02-27 Thread Robert Starzer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677340#action_12677340 ] Robert Starzer commented on LUCENE-1186: great! thanks! IMHO, some kind of IOC con

[jira] Updated: (LUCENE-1186) [PATCH] Clear ThreadLocal instances in close()

2009-02-27 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1186: --- Attachment: LUCENE-1186.patch New patch, giving credit to Christian. Thanks Christi

[jira] Issue Comment Edited: (LUCENE-1186) [PATCH] Clear ThreadLocal instances in close()

2009-02-27 Thread Robert Starzer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677281#action_12677281 ] rviper edited comment on LUCENE-1186 at 2/27/09 3:53 AM: -

[jira] Updated: (LUCENE-1186) [PATCH] Clear ThreadLocal instances in close()

2009-02-27 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1186: --- Attachment: LUCENE-1186.patch Re-using a single analyzer should work around this...

[jira] Updated: (LUCENE-1186) [PATCH] Clear ThreadLocal instances in close()

2009-02-27 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-1186: --- Fix Version/s: 2.9 2.4.1 > [PATCH] Clear ThreadLocal instances in

[jira] Assigned: (LUCENE-1186) [PATCH] Clear ThreadLocal instances in close()

2009-02-27 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1186: -- Assignee: Michael McCandless > [PATCH] Clear ThreadLocal instances in close()

Re: Getting tokens from search results. Simple concept

2009-02-27 Thread Erik Hatcher
Have you looked at the contrib Highlighter? Or using an Analyzer directly to give you the offsets? Erik On Feb 26, 2009, at 9:32 AM, HPDrifter wrote: When I get a search result based on my index, I need the exact tokens which were identified in the index as part of the result.

[jira] Issue Comment Edited: (LUCENE-1186) [PATCH] Clear ThreadLocal instances in close()

2009-02-27 Thread Robert Starzer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677281#action_12677281 ] rviper edited comment on LUCENE-1186 at 2/27/09 12:38 AM: --

Re: Integrating Language Models into Lucene

2009-02-27 Thread José Ramón Pérez Agüera
you have a Lucene LM implementation only for research purposes in http://ilps.science.uva.nl/resources/lm-lucene is a very old implementation but maybe could be useful to you jose On Thu, Feb 26, 2009 at 9:25 AM, Paul Elschot wrote: > On Thursday 26 February 2009 02:21:41 Koren Krupko wrote: >

[jira] Commented: (LUCENE-1186) [PATCH] Clear ThreadLocal instances in close()

2009-02-27 Thread Robert Starzer (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677281#action_12677281 ] Robert Starzer commented on LUCENE-1186: i'm using quartz schedules to trigger ind