Re: TokenStream and Token APIs

2008-10-13 Thread DM Smith
On Oct 13, 2008, at 3:34 PM, Doug Cutting wrote: Michael Busch wrote: public abstract boolean nextToken() throws IOException; What's the point of a separate Token and TokenStream if there's only a single Token per TokenStream? If that's really the direction we'll go, then all of the

[jira] Issue Comment Edited: (LUCENE-1410) PFOR implementation

2008-10-13 Thread Paul Elschot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12639195#action_12639195 ] [EMAIL PROTECTED] edited comment on LUCENE-1410 at 10/13/08 2:17 PM: ---

[jira] Updated: (LUCENE-1410) PFOR implementation

2008-10-13 Thread Paul Elschot (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Elschot updated LUCENE-1410: - Attachment: LUCENE-1410d.patch 1410d patch: as 1410c, with the following further changes: - move

Re: TokenStream and Token APIs

2008-10-13 Thread Doug Cutting
Michael Busch wrote: public abstract boolean nextToken() throws IOException; What's the point of a separate Token and TokenStream if there's only a single Token per TokenStream? If that's really the direction we'll go, then all of the Token methods should be on TokenStream, and Token sho

Re: How can I find the field from search result

2008-10-13 Thread Chris Hostetter
1) take a look at the Searcher.explain method and the Explanation class. 2) http://people.apache.org/~hossman/#java-dev Please Use "[EMAIL PROTECTED]" Not "[EMAIL PROTECTED]" Your question is better suited for the [EMAIL PROTECTED] mailing list ... not the [EMAIL PROTECTED] list. java-dev is fo

[jira] Resolved: (LUCENE-1419) Expert API to specify indexing chain

2008-10-13 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch resolved LUCENE-1419. --- Resolution: Fixed Lucene Fields: [New, Patch Available] (was: [Patch Available, New])

Lucene Indexer Encoding problem

2008-10-13 Thread svirid
Good day guys, hope u can help me. I am trying to index French and Russian documents with Lucene and have no luck. I am new in JAVA so basically I really need your help. I was able to get text from pdfs, when I save it its all fine I can clearly see russian charachters in txt file but when I ad

[jira] Commented: (LUCENE-1420) Similarity.lengthNorm and positionIncrement=0

2008-10-13 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12639112#action_12639112 ] Michael McCandless commented on LUCENE-1420: Looks good, thanks Andrzej. I pl

[jira] Assigned: (LUCENE-1420) Similarity.lengthNorm and positionIncrement=0

2008-10-13 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1420: -- Assignee: Michael McCandless > Similarity.lengthNorm and positionIncrement=0 >

[jira] Updated: (LUCENE-1420) Similarity.lengthNorm and positionIncrement=0

2008-10-13 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki updated LUCENE-1420: -- Attachment: similarity.patch This patch adds Similarity,length(fieldName, numTokens, n

[jira] Created: (LUCENE-1421) Ability to group search results by field

2008-10-13 Thread Artyom Sokolov (JIRA)
Ability to group search results by field Key: LUCENE-1421 URL: https://issues.apache.org/jira/browse/LUCENE-1421 Project: Lucene - Java Issue Type: New Feature Reporter: Artyom Sokolov I

[jira] Created: (LUCENE-1420) Similarity.lengthNorm and positionIncrement=0

2008-10-13 Thread Andrzej Bialecki (JIRA)
Similarity.lengthNorm and positionIncrement=0 - Key: LUCENE-1420 URL: https://issues.apache.org/jira/browse/LUCENE-1420 Project: Lucene - Java Issue Type: Improvement Components: Index

Re: TokenStream and Token APIs

2008-10-13 Thread Michael McCandless
This looks good! One question on back compatibility: currently, TokenStream.nextToken takes a Token arg in, and returns a Token back, such that the method is encouraged but not required to use the passed-in Token as its prototype. You are adding a boolean nextToken() method, which then f

Re: Similarity.lengthNorm and positionIncrement=0

2008-10-13 Thread Michael McCandless
OK, this & Andrzej's logic makes sense -- let's add it as an option, but leave the default to the current approach of counting all tokens towards length norm. Mike Nadav Har'El wrote: On Sun, Oct 12, 2008, Michael McCandless wrote about "Re: Similarity.lengthNorm and positionIncrement=0

[jira] Commented: (LUCENE-1419) Expert API to specify indexing chain

2008-10-13 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638975#action_12638975 ] Michael McCandless commented on LUCENE-1419: This looks great Michael! > Expe

Re: Similarity.lengthNorm and positionIncrement=0

2008-10-13 Thread Nadav Har'El
On Sun, Oct 12, 2008, Michael McCandless wrote about "Re: Similarity.lengthNorm and positionIncrement=0": > > I agree we should make this possible. A field should not be > "penalized" just because many of its terms had synonyms. I guess it won't do any harm to make this an option, but we need