Puzzled about Hits

2003-03-18 Thread Peter Courcoux
Hi all, I have just created a simple index and search routine as a start with Lucene. I have created a new index with 8 pages using the StandardAnalyzer. A given search returns 6 hits. 0. doc id = 1 : highest score : exact match 4 times in text. 1. doc id = 2 : next ranked score

AW: Advice on Stop words

2003-03-18 Thread Marcel Stör
-Ursprüngliche Nachricht- Von: James Berrettini [mailto:[EMAIL PROTECTED] Gesendet: Dienstag, 18. März 2003 16:53 An: 'Lucene User listserv' Betreff: Advice on Stop words Hi, I'm in the middle of a project to improve the Lucene functionality that we've embedded in our

Indexing and searching non-latin languages using utf-8

2003-03-18 Thread MERCIER ALEXANDRE
Hi all, I've a matter with indexing then searching docs written in non-latin languages and encoded in utf-8 (Russian, by example). I have a web application, with a simple form to search in the contents of the docs. When I submit the form, I encode the query term in utf-8 with encodeURI(String)

RE: Indexing and searching non-latin languages using utf-8

2003-03-18 Thread Eric Isakson
Have you verified that your form inputs are getting to your query objects without the String being mangled due to encoding problems? I'm getting japanese in UTF-8 and use the technique described at http://w6.metronet.com/~wjm/tomcat/2001/Aug/msg00230.html to get the data from the browser to

RE: Indexing and searching non-latin languages using utf-8

2003-03-18 Thread Eric Isakson
There are a bunch of other issues... I should have qualified that. There really aren't any issues with the Lucene core to support Japanese, just other issues in my app that uses Lucene and working with my content providers to ensure consistent use of encodings, etc. I have found what I think

Parsing and Indexing XML Docs

2003-03-18 Thread David Kendig
I am having problems with the lucene-sandbox/contributions/XML-Indexing-Demo. I get the following error when I index my XML documents with the SAX parser in Java 1.4.1 java.lang.StringIndexOutOfBoundsException: String index out of range: 200 at

Re: Parsing and Indexing XML Docs

2003-03-18 Thread Otis Gospodnetic
Doesn't that look like an error in Crimson? If I were you I'd use Xerces instead, I always had a better feeling about Xerces, and I think that demo code doesn't have anything Crimson-specific hard-coded in it. Otis --- David Kendig [EMAIL PROTECTED] wrote: I am having problems with the

Re: Parsing and Indexing XML Docs

2003-03-18 Thread David Kendig
Bummer, I get the same thing with Xerces. I do not suspect the XML file itself since it is from a separate app that has been operational for over a year. Does anyone maintain the sandbox contributions? Dave Traceback (innermost last): File ./indexTest.py, line 22, in ?

Re: Parsing and Indexing XML Docs

2003-03-18 Thread Rishabh Bajpai
jst a hunch, bt worth trying... when u implement the methods of the parser, try handling (try-catch)the one that is handling the characters that are parsedi had this problem when some of the values werent returning values - it will be a valid xml with no/null values, bt when u try to get