Wildcard and Fuzzy queries in GermanAnalyzer

2003-02-24 Thread Volker Luedeling
Hi, I have noticed that FuzzyQueries and WildcardQueries don't do stemming. Since all terms in the index are in stemmed forms, this causes some problems: Etagenwohnung gets stemmed to nwohnung. So a search for Etagenwohnung will find Etagenwohnung and nwohnung. Fuzzy search for Etagenwohnung~

Correction: Wildcard and Fuzzy queries in GermanAnalyzer

2003-02-24 Thread Volker Luedeling
I made a small mistake in my example. My application converted all characters to lowercase while indexing. When I comment this out, Etagenwohnung remains unchanged after stemming. So, my example is bad. However, the basic problem remains (at least for all words that do not start with a capital

Best HTML Parser !!

2003-02-24 Thread Pierre Lacchini
Hello, i'm trying to index html file with Lucene. Do u know what's the best HTML Parser in Java ? The most Powerful ? I need to extract meta-tag, and many other differents text fields... Thx for ur help ;)

Re: Best HTML Parser !!

2003-02-24 Thread Otis Gospodnetic
It's not possible to generalize like that. I like NekoHTML. Otis --- Pierre Lacchini [EMAIL PROTECTED] wrote: Hello, i'm trying to index html file with Lucene. Do u know what's the best HTML Parser in Java ? The most Powerful ? I need to extract meta-tag, and many other differents text

Re: IndexWriter addDocument NullPointerException

2003-02-24 Thread Otis Gospodnetic
My guess is that your 2 getDocument calls are the source, that is, that those PDF and TXT classes don't return a proper Document. I also don't see the output created by log(doc: +doc); Otis if(path.matches(\\d+_\\d{4}_[a-z]{2,3}\\.pdf)) { doc =

Re: IndexWriter addDocument NullPointerException

2003-02-24 Thread Günter Kukies
log(doc: +doc); is handled by tomcat and directed into special log-files, so you can't see them. System.err.println(hallo1 +doc); ex.printStackTrace(); System.err.println(hallo2); this is printing the relevant output. doc is never null,

Score per Term

2003-02-24 Thread Andrzej Bialecki
Hello, Is there any simple way to get the information from the search results on which of the query terms contributed the most to the document's score? I'm working on an application which could use this sort of information to give a hint to the user why particular document scores the way it

Re: IndexWriter addDocument NullPointerException

2003-02-24 Thread Otis Gospodnetic
If I were you I would make things simpler for myself by converting the code to something that I could run from the command line instead of having to go through Tomcat. You really need to capture your exception stack trace with lne numbers, and then we can try helping. Otis --- Günter_Kukies

Sorting Hits

2003-02-24 Thread Pierre Lacchini
Heya, is it possible to sort the Hits Array on a given Field ? (for example a field containing the Date) Thx for ur help !

Re: Score per Term

2003-02-24 Thread Doug Cutting
Check out the new Explanation API in the latest CVS sources. It permits one to get a detailed explanation of how a query was scored against a document. Note that these explanations are designed for user perusal, not for further computation, and are as expensive to construct as re-running the

Re: IndexWriter addDocument NullPointerException

2003-02-24 Thread Günter Kukies
I switched off the -server switch from the java commandline options and everything works fine now. I changed nothing in my code. So is it principly possible to throw an Exception with not stack trace? Any comments about this phenomenon? Günter - Original Message - From: Otis

Re: Score per Term

2003-02-24 Thread Andrzej Bialecki
Doug Cutting wrote: Check out the new Explanation API in the latest CVS sources. It permits one to get a detailed explanation of how a query was scored against a document. Note that these explanations are designed for user perusal, not for further computation, and are as expensive to

2 questions regarding phrase query indexing

2003-02-24 Thread alex wong
My first question I tried to write phrase query below is my attempt when i do a search the search content is in but it does not work it any idea what is wrong? I m using the index created by the Lucene Demo PhraseQuery query = new PhraseQuery(); BooleanQuery bQuery = new BooleanQuery();

Indexing Tips and Hints

2003-02-24 Thread Michael Barry
All, I'm in need of some pointers, hints or tips on indexing large collections of data. I know I saw some tips on this list before but when I tried searching the list, I came up blank. I have a large collection of XML files (336000 files around 5K apiece) that I'm indexing and its taking

Re: Indexing Tips and Hints

2003-02-24 Thread Terry Steichen
Mike, By way of comparison, I've got a collection of about 50,000 XML files, each of which averages about 8K. It takes about 1.25 hours to index (on a 1.8Ghz machine). I use basically the standard configuration (mergeFactor, etc.) and I've got about 30 fields per document. I add about 200 new

Re: Indexing Tips and Hints

2003-02-24 Thread Andrzej Bialecki
Hello, Since you are trying this anyway, and looking for ways to improve indexing times... Could you perhaps try to replace use of java.io.RandomAccessFile in FSDirectory implementation, with the attached implementation? It supposedly increases I/O throughput by orders of magnitude, by using

Re: Indexing Tips and Hints

2003-02-24 Thread Otis Gospodnetic
Things to consider: - disk speed and whether it is busy satisfying other processes' requests - CPU speed - amount or free RAM in the machine and amount of RAM given to your JVM - the bottleneck - could be a slow XML parser, for instance, profile it I'm about to submit another Lucene article to

Re: Indexing Tips and Hints

2003-02-24 Thread Terry Steichen
Hi Andrzej, Thanks for the code. I'll try it as soon as I have time. If you had a copy of the modified FSDirectory implementation you could also share, that would make testing it a bit quicker and easier. BTW, when you said it supposedly increases I/O, I gather that you are not the author?

AW: Best HTML Parser !!

2003-02-24 Thread Borkenhagen, Michael (ofd-ko zdfin)
I prefer JTidy http://lempinen.net/sami/jtidy/. Michael -Ursprüngliche Nachricht- Von: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Gesendet: Montag, 24. Februar 2003 15:03 An: Lucene Users List; [EMAIL PROTECTED] Betreff: Re: Best HTML Parser !! It's not possible to generalize like that.

AW: IndexWriter addDocument NullPointerException

2003-02-24 Thread Borkenhagen, Michael (ofd-ko zdfin)
Yes it is possible. Instead of catching an Exception you can do anything else, e.g. try { ...} catch (MyException e) { System.err.prinltn(e.class.forName()); } But this is off-topic here, it´s an gereral question about java. Michael -Ursprüngliche Nachricht- Von: Günter Kukies