Re: ThaiAnalyzer for Lucene

2006-04-11 Thread Samphan Raruenrom
I've finished the work. It no longer use ICU4j. Here :- http://issues.apache.org/jira/browse/LUCENE-503?page=all To contribue the code, what should I do next? Otis Gospodnetic wrote: Hi Samphan, Please create an "issue" in JIRA, and attach your code to it. We can put the analyzers in the con

[jira] Commented: (LUCENE-503) Contrib: ThaiAnalyzer to enable Thai full-text search in Lucene

2006-04-11 Thread Samphan Raruenrom (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-503?page=comments#action_12374136 ] Samphan Raruenrom commented on LUCENE-503: -- I've changed the code to use java.text.BreakIterator instead of ICU4j to remove the dependency on ICU4j. The ThaiAnayzer i

[jira] Updated: (LUCENE-503) Contrib: ThaiAnalyzer to enable Thai full-text search in Lucene

2006-04-11 Thread Samphan Raruenrom (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-503?page=all ] Samphan Raruenrom updated LUCENE-503: - Attachment: ThaiWordFilter.java ThaiWordFilter which use java.text.BreakIterator to break Thai words into tokens > Contrib: ThaiAnalyzer to enable Th

[jira] Updated: (LUCENE-503) Contrib: ThaiAnalyzer to enable Thai full-text search in Lucene

2006-04-11 Thread Samphan Raruenrom (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-503?page=all ] Samphan Raruenrom updated LUCENE-503: - Attachment: ThaiAnalyzer.java ThaiAnalyzer which simply return a TokenFilter chain with ThaiWordFilter in the middle > Contrib: ThaiAnalyzer to enab

Re: bytecount as prefix

2006-04-11 Thread Chris Hostetter
1) not only does ConstantScoreRangeQuery uses a RangeFilter, but TestConstantScoreRangeQuery and TestRangeFilter share a base class that creates the index. 2) perhaps the issue is that corruption is happening when segments are merged -- and most tests don't surface the problem becuse they tend to

Re: bytecount as prefix

2006-04-11 Thread Marvin Humphrey
On Apr 11, 2006, at 12:05 PM, Marvin Humphrey wrote: TestRangeFilter. A phantom blank Term shows up out of nowhere in the middle of the merge process. When you stick a System.err.println into TermInfosWriter's writeTerm, you ordinarily see it adding Terms in proper sort order: [j

when and how to use GCJIndexInput, GCJDirectory

2006-04-11 Thread Charlie
Would you please shed some light on when and how to use GCJIndexInput, GCJDirectory? And demo, test actually use them? Even Doug has one line comment below, but I hardly can figure out any clue. /** Native file-based [EMAIL PROTECTED] IndexInput} implementation, using GCJ. * * @author Doug Cut

Re: bytecount as prefix

2006-04-11 Thread Marvin Humphrey
On Apr 11, 2006, at 2:27 PM, Marvin Humphrey wrote: "all but last", "all but first" and "all but ends" pass! Scratch that, it's totally untrue. I'd forgotten that these compound test cases bail as soon as there's a single failure. "all but last" also fails to return any docs at all. M

Re: bytecount as prefix

2006-04-11 Thread Marvin Humphrey
On Apr 11, 2006, at 2:08 PM, Yonik Seeley wrote: On 4/11/06, Marvin Humphrey <[EMAIL PROTECTED]> wrote: What do the failing tests have in common? On TestIndexModifier, only a small portion of the deletions fail, and they're all for fairly high values of delId -- sometimes the highest, but not

Re: bytecount as prefix

2006-04-11 Thread Yonik Seeley
On 4/11/06, Marvin Humphrey <[EMAIL PROTECTED]> wrote: > What do the failing tests have in common? > > On TestIndexModifier, only a small portion of the deletions fail, and > they're all for fairly high values of delId -- sometimes the highest, > but not always. For RangeFilter and ConstantScoreRa

Re: bytecount as prefix

2006-04-11 Thread Marvin Humphrey
On Apr 11, 2006, at 12:18 PM, Doug Cutting wrote: Marvin Humphrey wrote: I'm back working on converting Lucene to using a byte count instead of a char count at as a prefix at the head of each String. Three tests are failing: TestIndexModifier, TestConstantScoreRangeQuery, and TestRang

Re: bytecount as prefix

2006-04-11 Thread Doug Cutting
Marvin Humphrey wrote: I'm back working on converting Lucene to using a byte count instead of a char count at as a prefix at the head of each String. Three tests are failing: TestIndexModifier, TestConstantScoreRangeQuery, and TestRangeFilter. Why those and not others? - private static f

Jbuilder to build lucene, package gnu.gcj does not exist

2006-04-11 Thread Charlie
Hi, In Jbuilder, I can use ant to build lucene-core-1.9.2-dev.jar, but if use Jbuilder itself to build, then I am getting these errors. "GCJIndexInput.java": package gnu.gcj does not exist at line 20, column 16 "GCJIndexInput.java": cannot find symbol; symbol : class RawData, location: class o

bytecount as prefix

2006-04-11 Thread Marvin Humphrey
Greets, I'm back working on converting Lucene to using a byte count instead of a char count at as a prefix at the head of each String. Three tests are failing: TestIndexModifier, TestConstantScoreRangeQuery, and TestRangeFilter. Why those and not others? Marvin Humphrey Rectangular Rese

Re: ant one test

2006-04-11 Thread Erik Hatcher
There is no standard way to do this with Ant per se, but I always build this capability in. For Lucene, it's this incantation: ant -Dtestcase=TestQueryParser test where you can put the simple class name of any JUnit test case after testcase=... Erik On Apr 11, 2006, at 2

ant one test

2006-04-11 Thread Marvin Humphrey
Greets, Quick question: how to I build then run just one test in the test suite? For a Perl distribution, it's one of these: make; perl -Mblib t/test_file.t make test TEST_FILES=t/test_file.t What's the Java equivalent? I'm working on the bytecount-as-String-prefix problem again, a

HITS

2006-04-11 Thread Anton Feldmann
Hi i would like to know, how do you get the Hits. Do you use tokens? If you use Tokens could i write a tokenizer to mkae tokens out of scentence? thanks Anton Feldmann - To unsubscribe, e-mail: [EMAIL PROTECTED] For additiona

Re: Contextual suggestions (for spelling)

2006-04-11 Thread karl wettin
11 apr 2006 kl. 18.01 skrev Rajesh Munavalli: Hi Karl, Could you elaborate on the kind of features you would use to train Markov chains? I worked some on CRF's (Conditional Random Fields) for one of the information extraction projects. It would be useful to know your approach as

RE: Contextual suggestions (for spelling)

2006-04-11 Thread Rajesh Munavalli
Hi Karl, Could you elaborate on the kind of features you would use to train Markov chains? I worked some on CRF's (Conditional Random Fields) for one of the information extraction projects. It would be useful to know your approach as well as I might be able to pitch in some ideas.

Re: SpanNearQuery with minimum slop

2006-04-11 Thread Doug Cutting
Erik Hatcher wrote: I have a potential need for a SpanNearQuery with an exact non-zero gap specified Ironically, you can now easily specify this with PhraseQuery, but not with SpanNearQuery. You can construct a phrase query with explicit positions, e.g.: PhraseQuery pq = new PhraseQuery()

Re: Contextual suggestions (for spelling)

2006-04-11 Thread karl wettin
11 apr 2006 kl. 02.20 skrev [EMAIL PROTECTED]: Any comments on this? Questions? You've sketched a slightly degenerate version of Shannon's noisy channel model. Here are three refs to help you sort out the usual approach: Wow! Thanks! --

[jira] Updated: (LUCENE-544) MultiFieldQueryParser field boost multiplier

2006-04-11 Thread Karl Wettin (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-544?page=all ] Karl Wettin updated LUCENE-544: --- Attachment: MultiFieldQueryParser.java The updated code > MultiFieldQueryParser field boost multiplier > > >

[jira] Created: (LUCENE-544) MultiFieldQueryParser field boost multiplier

2006-04-11 Thread Karl Wettin (JIRA)
MultiFieldQueryParser field boost multiplier Key: LUCENE-544 URL: http://issues.apache.org/jira/browse/LUCENE-544 Project: Lucene - Java Type: Improvement Components: QueryParser Reporter: Karl Wettin Priority

[jira] Closed: (LUCENE-322) [PATCH] Add IndexSearcher.numDocs() method

2006-04-11 Thread Yonik Seeley (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-322?page=all ] Yonik Seeley closed LUCENE-322: --- Fix Version: 2.0 Resolution: Won't Fix Assign To: (was: Lucene Developers) > [PATCH] Add IndexSearcher.numDocs() method >

SpanNearQuery with minimum slop

2006-04-11 Thread Erik Hatcher
I have a potential need for a SpanNearQuery with an exact non-zero gap specified, and possibly a need for a non-zero minimum gap. Would this be as easy as modifying SpanNearQuery to have a minimum and maximum slop feature, and modifying NearSpans.checkSlop() to add a condition that the diff

[jira] Commented: (LUCENE-322) [PATCH] Add IndexSearcher.numDocs() method

2006-04-11 Thread Alexey Panchenko (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-322?page=comments#action_12374026 ] Alexey Panchenko commented on LUCENE-322: - Yes, after getIndexReader() method is added this patch is not needed and is issue can be closed. > [PATCH] Add IndexSearche

[jira] Commented: (LUCENE-130) org.apache.lucene.search.Query.toString(String field) ignores it's only parameter

2006-04-11 Thread Daniel Naber (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-130?page=comments#action_12373990 ] Daniel Naber commented on LUCENE-130: - I fixed that, thanks. > org.apache.lucene.search.Query.toString(String field) ignores it's only > parameter >

[jira] Commented: (LUCENE-130) org.apache.lucene.search.Query.toString(String field) ignores it's only parameter

2006-04-11 Thread Nadav Har'El (JIRA)
[ http://issues.apache.org/jira/browse/LUCENE-130?page=comments#action_12373980 ] Nadav Har'El commented on LUCENE-130: - Daniel, sorry for the mess, but I actually misspelled the word "omitted" in that sentence. Should have just one "m"... > org.apache