date:20030224

Wildcard and Fuzzy queries in GermanAnalyzer

2003-02-24 Thread Volker Luedeling

Hi, I have noticed that FuzzyQueries and WildcardQueries don't do stemming. Since all terms in the index are in stemmed forms, this causes some problems: Etagenwohnung gets stemmed to nwohnung. So a search for Etagenwohnung will find Etagenwohnung and nwohnung. Fuzzy search for Etagenwohnung~

Correction: Wildcard and Fuzzy queries in GermanAnalyzer

2003-02-24 Thread Volker Luedeling

I made a small mistake in my example. My application converted all characters to lowercase while indexing. When I comment this out, Etagenwohnung remains unchanged after stemming. So, my example is bad. However, the basic problem remains (at least for all words that do not start with a capital

Best HTML Parser !!

2003-02-24 Thread Pierre Lacchini

Hello, i'm trying to index html file with Lucene. Do u know what's the best HTML Parser in Java ? The most Powerful ? I need to extract meta-tag, and many other differents text fields... Thx for ur help ;)

Re: Best HTML Parser !!

2003-02-24 Thread Otis Gospodnetic

It's not possible to generalize like that. I like NekoHTML. Otis --- Pierre Lacchini [EMAIL PROTECTED] wrote: Hello, i'm trying to index html file with Lucene. Do u know what's the best HTML Parser in Java ? The most Powerful ? I need to extract meta-tag, and many other differents text

Re: IndexWriter addDocument NullPointerException

2003-02-24 Thread Otis Gospodnetic

My guess is that your 2 getDocument calls are the source, that is, that those PDF and TXT classes don't return a proper Document. I also don't see the output created by log(doc: +doc); Otis if(path.matches(\\d+_\\d{4}_[a-z]{2,3}\\.pdf)) { doc =

Re: IndexWriter addDocument NullPointerException

2003-02-24 Thread Günter Kukies

log(doc: +doc); is handled by tomcat and directed into special log-files, so you can't see them. System.err.println(hallo1 +doc); ex.printStackTrace(); System.err.println(hallo2); this is printing the relevant output. doc is never null,

Score per Term

2003-02-24 Thread Andrzej Bialecki

Hello, Is there any simple way to get the information from the search results on which of the query terms contributed the most to the document's score? I'm working on an application which could use this sort of information to give a hint to the user why particular document scores the way it

Re: IndexWriter addDocument NullPointerException

2003-02-24 Thread Otis Gospodnetic

If I were you I would make things simpler for myself by converting the code to something that I could run from the command line instead of having to go through Tomcat. You really need to capture your exception stack trace with lne numbers, and then we can try helping. Otis --- Günter_Kukies

Sorting Hits

2003-02-24 Thread Pierre Lacchini

Heya, is it possible to sort the Hits Array on a given Field ? (for example a field containing the Date) Thx for ur help !

Re: Score per Term

2003-02-24 Thread Doug Cutting

Check out the new Explanation API in the latest CVS sources. It permits one to get a detailed explanation of how a query was scored against a document. Note that these explanations are designed for user perusal, not for further computation, and are as expensive to construct as re-running the

Re: IndexWriter addDocument NullPointerException

2003-02-24 Thread Günter Kukies

I switched off the -server switch from the java commandline options and everything works fine now. I changed nothing in my code. So is it principly possible to throw an Exception with not stack trace? Any comments about this phenomenon? Günter - Original Message - From: Otis

Re: Score per Term

2003-02-24 Thread Andrzej Bialecki

Doug Cutting wrote: Check out the new Explanation API in the latest CVS sources. It permits one to get a detailed explanation of how a query was scored against a document. Note that these explanations are designed for user perusal, not for further computation, and are as expensive to

2 questions regarding phrase query indexing

2003-02-24 Thread alex wong

My first question I tried to write phrase query below is my attempt when i do a search the search content is in but it does not work it any idea what is wrong? I m using the index created by the Lucene Demo PhraseQuery query = new PhraseQuery(); BooleanQuery bQuery = new BooleanQuery();

Indexing Tips and Hints

2003-02-24 Thread Michael Barry

All, I'm in need of some pointers, hints or tips on indexing large collections of data. I know I saw some tips on this list before but when I tried searching the list, I came up blank. I have a large collection of XML files (336000 files around 5K apiece) that I'm indexing and its taking

Re: Indexing Tips and Hints

2003-02-24 Thread Terry Steichen

Mike, By way of comparison, I've got a collection of about 50,000 XML files, each of which averages about 8K. It takes about 1.25 hours to index (on a 1.8Ghz machine). I use basically the standard configuration (mergeFactor, etc.) and I've got about 30 fields per document. I add about 200 new

Re: Indexing Tips and Hints

2003-02-24 Thread Andrzej Bialecki

Hello, Since you are trying this anyway, and looking for ways to improve indexing times... Could you perhaps try to replace use of java.io.RandomAccessFile in FSDirectory implementation, with the attached implementation? It supposedly increases I/O throughput by orders of magnitude, by using

Re: Indexing Tips and Hints

2003-02-24 Thread Otis Gospodnetic

Things to consider: - disk speed and whether it is busy satisfying other processes' requests - CPU speed - amount or free RAM in the machine and amount of RAM given to your JVM - the bottleneck - could be a slow XML parser, for instance, profile it I'm about to submit another Lucene article to

Re: Indexing Tips and Hints

2003-02-24 Thread Terry Steichen

Hi Andrzej, Thanks for the code. I'll try it as soon as I have time. If you had a copy of the modified FSDirectory implementation you could also share, that would make testing it a bit quicker and easier. BTW, when you said it supposedly increases I/O, I gather that you are not the author?

AW: Best HTML Parser !!

2003-02-24 Thread Borkenhagen, Michael (ofd-ko zdfin)

I prefer JTidy http://lempinen.net/sami/jtidy/. Michael -Ursprüngliche Nachricht- Von: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Gesendet: Montag, 24. Februar 2003 15:03 An: Lucene Users List; [EMAIL PROTECTED] Betreff: Re: Best HTML Parser !! It's not possible to generalize like that.

AW: IndexWriter addDocument NullPointerException

2003-02-24 Thread Borkenhagen, Michael (ofd-ko zdfin)

Yes it is possible. Instead of catching an Exception you can do anything else, e.g. try { ...} catch (MyException e) { System.err.prinltn(e.class.forName()); } But this is off-topic here, it´s an gereral question about java. Michael -Ursprüngliche Nachricht- Von: Günter Kukies

Wildcard and Fuzzy queries in GermanAnalyzer

Correction: Wildcard and Fuzzy queries in GermanAnalyzer

Best HTML Parser !!

Re: Best HTML Parser !!

Re: IndexWriter addDocument NullPointerException

Re: IndexWriter addDocument NullPointerException

Score per Term

Re: IndexWriter addDocument NullPointerException

Sorting Hits

Re: Score per Term

Re: IndexWriter addDocument NullPointerException

Re: Score per Term

2 questions regarding phrase query indexing

Indexing Tips and Hints

Re: Indexing Tips and Hints

Re: Indexing Tips and Hints

Re: Indexing Tips and Hints

Re: Indexing Tips and Hints

AW: Best HTML Parser !!

AW: IndexWriter addDocument NullPointerException

20 matches

Site Navigation

Mail list logo

Footer information