[jira] Updated: (LUCENE-841) Replace UTF8 characters in stemmer code with integer values.

2007-03-21 Thread Paul Cowan (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Cowan updated LUCENE-841: -- Attachment: lucene-841.patch Patch which replaces all non-ASCII characters in the 4 mentioned stemmer f

[jira] Updated: (LUCENE-806) Synchronization bottleneck in FieldSortedHitQueue with many concurrent readers

2007-03-21 Thread Paul Cowan (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Cowan updated LUCENE-806: -- Attachment: lucene-806-proposed-direction.patch Hi all, Attached is a patch which BEGINS to address (b

[jira] Commented: (LUCENE-841) Replace UTF8 characters in stemmer code with integer values.

2007-03-21 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12482923 ] Hoss Man commented on LUCENE-841: - there are lots of OSes and editors where changing the file encoding is somewhat h

[jira] Commented: (LUCENE-841) Replace UTF8 characters in stemmer code with integer values.

2007-03-21 Thread Karl Wettin (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12482918 ] Karl Wettin commented on LUCENE-841: Escaped unicode, integer value, what not, just not raw UTF8 please. I for o

[jira] Commented: (LUCENE-841) Replace UTF8 characters in stemmer code with integer values.

2007-03-21 Thread Daniel Naber (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12482914 ] Daniel Naber commented on LUCENE-841: - Which environments still don't handle UTF-8? Using anything that escapes t

[jira] Commented: (LUCENE-841) Replace UTF8 characters in stemmer code with integer values.

2007-03-21 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12482902 ] Doron Cohen commented on LUCENE-841: > All environments does not handle that With Eclipse, modifying the text fi

[jira] Updated: (LUCENE-840) contrib/benchmark unit tests

2007-03-21 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-840: --- Attachment: 840-benchmark-tests.patch Patch updated to apply cleanly on current code. Should be appl

[jira] Commented: (LUCENE-841) Replace UTF8 characters in stemmer code with integer values.

2007-03-21 Thread Hoss Man (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12482850 ] Hoss Man commented on LUCENE-841: - Karl: it would probably be better to use the unicode escape sequences rather then

Re: How does segment merging work

2007-03-21 Thread Matt Chaput
robert engels wrote: It seeks back at the end to the location and writes the size. Ah! Sorry I didn't get that. Thanks for your help! Matt - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PR

[jira] Resolved: (LUCENE-839) WildcardQuery do not find documents if leading and trailing * is used

2007-03-21 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-839. Resolution: Duplicate Was fixed by LUCENE-813. > WildcardQuery do not find documents if leading an

Re: How does segment merging work

2007-03-21 Thread robert engels
It seeks back at the end to the location and writes the size. On Mar 21, 2007, at 12:03 PM, Matt Chaput wrote: Aside from the useful exchange I had with Robert, I'd still like to know how Lucene knows what value to write in the "term count" part of the term dictionary header when it's mergin

[jira] Resolved: (LUCENE-838) WildcardQuery do not find documents

2007-03-21 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-838. Resolution: Invalid This is a known behavior - wild card queries are not analyzed. In Lucene FAQ:

How does segment merging work

2007-03-21 Thread Matt Chaput
Aside from the useful exchange I had with Robert, I'd still like to know how Lucene knows what value to write in the "term count" part of the term dictionary header when it's merging segments -- even if I decide forgo it in my own re-implementation. Of course, I can always just dive into the c

[jira] Commented: (LUCENE-839) WildcardQuery do not find documents if leading and trailing * is used

2007-03-21 Thread Doron Cohen (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12482829 ] Doron Cohen commented on LUCENE-839: Michael, This problem was already fixed since 2.1.0. * * * In general,

[jira] Created: (LUCENE-841) Replace UTF8 characters in stemmer code with integer values.

2007-03-21 Thread Karl Wettin (JIRA)
Replace UTF8 characters in stemmer code with integer values. Key: LUCENE-841 URL: https://issues.apache.org/jira/browse/LUCENE-841 Project: Lucene - Java Issue Type: Improvement

[jira] Resolved: (LUCENE-837) contrib/benchmark QueryMaker and Task Refactorings

2007-03-21 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved LUCENE-837. Resolution: Fixed Lucene Fields: (was: [New]) committed revision 520890 > contrib

Re: Is this correct: term.field() == fieldName ?

2007-03-21 Thread mark harwood
>>Is it correct to compare using '==' or equals should be used instead? In this context it is OK. Term fieldnames are deliberately interned using String.intern() so this equality test can be used. The intention is to make comparisons faster. Cheers, Mark - Original Message From: dmitr

Is this correct: term.field() == fieldName ?

2007-03-21 Thread dmitri
In the org.apache.lucene.search.PrefixFilter I've found: .. if (term != null && term.text().startsWith(prefixText) && term.field() == prefixField) { Is it correct to compare using '==' or equals should be used instead? --

[jira] Reopened: (LUCENE-839) WildcardQuery do not find documents if leading and trailing * is used

2007-03-21 Thread Michael Schlegel (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Schlegel reopened LUCENE-839: - > WildcardQuery do not find documents if leading and trailing * is used > ---

Re: [jira] Updated: (LUCENE-725) NovelAnalyzer - wraps your choice of Lucene Analyzer and filters out all "boilerplate" text

2007-03-21 Thread jian chen
Hi, Mark, Thanks a lot for your explanation. This code is very useful so it could even be in a separate library for text extraction. Again, thanks for taking time to answer my question. Jian On 3/21/07, markharw00d <[EMAIL PROTECTED]> wrote: The Analyzer keeps a window of (by default) the l

[jira] Reopened: (LUCENE-838) WildcardQuery do not find documents

2007-03-21 Thread Michael Schlegel (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Schlegel reopened LUCENE-838: - > WildcardQuery do not find documents > --- > >

[jira] Commented: (LUCENE-839) WildcardQuery do not find documents if leading and trailing * is used

2007-03-21 Thread Michael Schlegel (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12482671 ] Michael Schlegel commented on LUCENE-839: - I use release 2.1.0. Here is an example to demonstrate the proble

[jira] Commented: (LUCENE-838) WildcardQuery do not find documents

2007-03-21 Thread Michael Schlegel (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12482670 ] Michael Schlegel commented on LUCENE-838: - Hi ! Sorry, but it was my first bug request. You resolved the bug

Re: [jira] Updated: (LUCENE-725) NovelAnalyzer - wraps your choice of Lucene Analyzer and filters out all "boilerplate" text

2007-03-21 Thread markharw00d
The Analyzer keeps a window of (by default) the last 300 documents. Every token created in these cached documents is stored for reference and as new documents arrive their token sequences are examined to see if any of the sequences was seen before, in which case the analyzer does not emit them