[jira] Updated: (LUCENE-1224) NGramTokenFilter creates bad TokenStream

2008-03-12 Thread Hiroaki Kawai (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroaki Kawai updated LUCENE-1224: -- Attachment: NGramTokenFilter.patch NGramTokenFilter creates bad TokenStream

[jira] Created: (LUCENE-1224) NGramTokenFilter creates bad TokenStream

2008-03-12 Thread Hiroaki Kawai (JIRA)
NGramTokenFilter creates bad TokenStream Key: LUCENE-1224 URL: https://issues.apache.org/jira/browse/LUCENE-1224 Project: Lucene - Java Issue Type: Bug Components: contrib/*

[jira] Commented: (LUCENE-1221) DocumentsWriter truncates term text at \uFFFF

2008-03-12 Thread Marcel Reutegger (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1256#action_1256 ] Marcel Reutegger commented on LUCENE-1221: -- Indeed there are some characters that

an API for synonym in Lucene-core

2008-03-12 Thread Mathieu Lecarme
Why Lucen doesn't have a clean synonym API? WordNet contrib is not an answer, it provides an Interface for its own needs, and most of the world don't speak english. Compass provides a tool, just like Solr. Lucene is the framework for applications like Solr, Nutch or Compass, why don't backport

[jira] Resolved: (LUCENE-1217) use isBinary cached variable instead of instanceof in Filed

2008-03-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1217. Resolution: Fixed Lucene Fields: [New, Patch Available] (was: [Patch

[jira] Updated: (LUCENE-1219) support array/offset/ length setters for Field with binary data

2008-03-12 Thread Eks Dev (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eks Dev updated LUCENE-1219: Attachment: LUCENE-1219.patch latest patch updated to the trunk (Lucene-1217 is there. Michael you did

[jira] Resolved: (LUCENE-1221) DocumentsWriter truncates term text at \uFFFF

2008-03-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1221. Resolution: Invalid OK thanks Marcel. DocumentsWriter truncates term text at

[jira] Created: (LUCENE-1225) NGramTokenizer creates bad TokenStream

2008-03-12 Thread Hiroaki Kawai (JIRA)
NGramTokenizer creates bad TokenStream -- Key: LUCENE-1225 URL: https://issues.apache.org/jira/browse/LUCENE-1225 Project: Lucene - Java Issue Type: Bug Components: contrib/*

[jira] Updated: (LUCENE-1225) NGramTokenizer creates bad TokenStream

2008-03-12 Thread Hiroaki Kawai (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroaki Kawai updated LUCENE-1225: -- Attachment: NGramTokenizer.patch This patch will fix the issue. NGramTokenizer creates bad

[jira] Updated: (LUCENE-1217) use isBinary cached variable instead of instanceof in Field

2008-03-12 Thread Doug Cutting (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doug Cutting updated LUCENE-1217: - Description: Field class can hold three types of values, See: AbstractField.java protected

TokenFilter question

2008-03-12 Thread Hiroaki Kawai
I was trying to apply both org.apache.solr.analysis.WordDelimiterFilter and org.apache.lucene.analysis.ngram.NGramTokenFilter. Can I achive this with lucene's TokenStream? While thinking about TokenFilters, I came to an idea that the TokenStream should have a structured representation. It is

unique-id to doc-num

2008-03-12 Thread Jae Kwon
I'd like to have an up-to-date map from unique-ids to lucene internal doc-nums. This will allow me to create a custom filter based on the result of an external process (like mysql). There doesn't seem to be a straightforward efficient way AFAIK. I'll be looking for a way but any help or guidance

[jira] Resolved: (LUCENE-1214) Possible hidden exception on SegmentInfos commit

2008-03-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1214. Resolution: Fixed Possible hidden exception on SegmentInfos commit

Looking to Index Various Document Types.

2008-03-12 Thread DURGA DEEP
HI Folks, I was looking at the Lucene FAQ and I found this very interesting. How can I index OpenOffice.org files? These files (.sxw, .sxc, etc) are ZIP archives that contain XML files. Uncompress the file using Java's ZIP support, then parse meta.xml to get title etc. and content.xml to get

Re: an API for synonym in Lucene-core

2008-03-12 Thread Grant Ingersoll
On Mar 12, 2008, at 5:47 AM, Mathieu Lecarme wrote: Why Lucen doesn't have a clean synonym API? Because no one has donated one WordNet contrib is not an answer, it provides an Interface for its own needs, and most of the world don't speak english. Compass provides a tool, just

Re: [jira] Updated: (LUCENE-1217) use isBinary cached variable instead of instanceof in Field

2008-03-12 Thread eks dev
fix typo that's been bugging me excuse my ignorance, but i do not understand this entry. Typo we need to fix, which one? __ Sent from Yahoo! Mail. The World's Favourite Email http://uk.docs.yahoo.com/nowyoucan.html

Re: [jira] Updated: (LUCENE-1217) use isBinary cached variable instead of instanceof in Field

2008-03-12 Thread Chris Hostetter
: fix typo that's been bugging me : : excuse my ignorance, but i do not understand this entry. Typo we need to fix, which one? If you view the change history for the issue, you'll see that comment was attached to change to the summary and description of the bug where Filed was fixed to be

RE: Looking to Index Various Document Types.

2008-03-12 Thread Steven A Rowe
'sup, DD: You should have posted your question, which is about *using* Lucene, to the java-user mailing list; the java-dev mailing list is instead intended for discussion of *development of* Lucene. Here's a Lius tutorial, in both French and English: http://www.doculibre.com/lius/ And here's

Re: an API for synonym in Lucene-core

2008-03-12 Thread Otis Gospodnetic
Grant, I think Mathieu is hinting at his JIRA contribution (I looked at it briefly the other day, but haven't had the chance to really understand it). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Mathieu Lecarme [EMAIL PROTECTED] To:

[jira] Commented: (LUCENE-1216) CharDelimiterTokenizer

2008-03-12 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12578023#action_12578023 ] Otis Gospodnetic commented on LUCENE-1216: -- This looks useful. Could you please

[jira] Issue Comment Edited: (LUCENE-1216) CharDelimiterTokenizer

2008-03-12 Thread Otis Gospodnetic (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12578023#action_12578023 ] otis edited comment on LUCENE-1216 at 3/12/08 2:37 PM: ---

Re: Going to Java 5. Was: Re: A bit of planning

2008-03-12 Thread Otis Gospodnetic
I agree with Grant and would prefer to see 3.0 to seeing 4.0 (down with inflation!) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Grant Ingersoll [EMAIL PROTECTED] To: java-dev@lucene.apache.org Sent: Monday, March 10, 2008 4:05:54 PM

[jira] Updated: (LUCENE-1226) IndexWriter.addIndexes(IndexReader[]) fails to create compound files

2008-03-12 Thread Michael Busch (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch updated LUCENE-1226: -- Attachment: lucene-1226.patch IndexWriter.addIndexes(IndexReader[]) fails to create compound

[jira] Created: (LUCENE-1226) IndexWriter.addIndexes(IndexReader[]) fails to create compound files

2008-03-12 Thread Michael Busch (JIRA)
IndexWriter.addIndexes(IndexReader[]) fails to create compound files Key: LUCENE-1226 URL: https://issues.apache.org/jira/browse/LUCENE-1226 Project: Lucene - Java Issue

Re: [jira] Updated: (LUCENE-1226) IndexWriter.addIndexes(IndexReader[]) fails to create compound files

2008-03-12 Thread Michael McCandless
Woops! Thanks Michael. Mike On Mar 12, 2008, at 5:46 PM, Michael Busch (JIRA) wrote: [ https://issues.apache.org/jira/browse/LUCENE-1226? page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Busch updated LUCENE-1226: --

[jira] Resolved: (LUCENE-1223) lazy fields don't enforce binary vs string value

2008-03-12 Thread Michael McCandless (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-1223. Resolution: Fixed lazy fields don't enforce binary vs string value

Build failed in Hudson: Lucene-trunk #399

2008-03-12 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/399/changes Changes: [mikemccand] LUCENE-1223: fix lazy field loading to not allow string field to be loaded as binary, nor vice/versa [mikemccand] LUCENE-1214: preseve original exception in SegmentInfos write commit [mikemccand]

[jira] Updated: (LUCENE-1216) CharDelimiterTokenizer

2008-03-12 Thread Hiroaki Kawai (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroaki Kawai updated LUCENE-1216: -- Attachment: CharDelimiterTokenizer.java Update CharDelimiterTokenizer.java 1. replaced TAB -

[jira] Updated: (LUCENE-1216) CharDelimiterTokenizer

2008-03-12 Thread Hiroaki Kawai (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroaki Kawai updated LUCENE-1216: -- Attachment: TestCharDelimiterTokenizer.java Add test file (TestCharDelimiterTokenizer.java)