Re: Suggestion for Token.java

2004-04-13 Thread Tatu Saloranta
On Tuesday 13 April 2004 15:31, Holger Klawitter wrote: > Hi Erik, > > > What is wrong with simply creating a new token that replaces an > > incoming one for synonyms? > > I'm just playing devil's advocate here since you can already get > > the termText() through the public _method_. > > Well,

Re: ANN: Docco 0.3

2004-04-13 Thread Peter Becker
The underlying assumption was that File.isDirectory() does return false on symlinks, but we never tested under UNIX or Linux and JavaDoc is not very explicit about this (as so often). If that is wrong, can someone mail me some hint how to do it properly? I assume it involves getCanonicalPath()

Re: Simple spider demo

2004-04-13 Thread Stephane James Vaucher
I've uploaded it to the wiki: http://wiki.apache.org/jakarta-lucene/HttpUnitExample It's not anywhere close to production quality, especially since it's based on a unit test framework. sv On Tue, 13 Apr 2004, Stephane James Vaucher wrote: > I'm wondering if there is interest for a simple sp

Re: Suggestion for Token.java

2004-04-13 Thread Holger Klawitter
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Erik, > What is wrong with simply creating a new token that replaces an > incoming one for synonyms? > I'm just playing devil's advocate here since you can already get > the termText() through the public _method_. Well, you're right; I forgot

Re: ANN: Docco 0.3

2004-04-13 Thread Stephane James Vaucher
Looks cool, but I've got a question: How do you handle symlinks on *nix? I think it's stuck in a loop When indexing my home dir, I see it indexing: /home/vauchers/.Cirano-gnome/.gnome-desktop/Home directory/.Cirano-gnome/... cheers, sv On Wed, 14 Apr 2004, Peter Becker wrote: > Hello, > > we

i411 Faceted Metadata Search

2004-04-13 Thread William W
Hi, Who knows the diference between i411 Faceted Metadata Search and Lucene Search Engine. Thanks, William. _ Tax headache? MSN Money provides relief with tax tips, tools, IRS forms and more! http://moneycentral.msn.com/tax/works

Simple spider demo

2004-04-13 Thread Stephane James Vaucher
I'm wondering if there is interest for a simple spider demo. I've got an example of how to use HttpUnit to spider on a web site and have it index it on disk (only html page now). I can send it to the list if anyone is interested (it's one class, < 200 loc). cheers, sv ---

ANN: Docco 0.3

2004-04-13 Thread Peter Becker
Hello, we released Docco 0.3 along with two updates for its plugins. Docco is a personal document retrieval tool based on Apache's Lucene indexing engine and Formal Concept Analysis. It allows you to create an index for files on your file system which you can then search for keywords. It can i

Re: Suggestion for Token.java

2004-04-13 Thread Erik Hatcher
What is wrong with simply creating a new token that replaces an incoming one for synonyms? I'm just playing devil's advocate here since you can already get the termText() through the public _method_. Erik On Apr 13, 2004, at 9:52 AM, Holger Klawitter wrote: -BEGIN PGP SIGNED MESSAGE--

Suggestion for Token.java

2004-04-13 Thread Holger Klawitter
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi there, Just a short suggestion: It would be useful to make Token.termText public (or to provide a reader/ writer pair). That way one can create TokenFilters altering termText (for Synonyms for example) in other packages as org.apache.lucene.anal

Re: index update (was Re: Large InputStream.BUFFER_SIZE causes OutOfMemoryError.. FYI)

2004-04-13 Thread Stephane James Vaucher
I'm actually pretty lazy about index updates, and haven't had the need for efficiency, since my requirement is that new documents should be available on a next working day basis. I reindex everything from scatch every night (400,000 docs) and store it in an timestamped index. When the reindexin

Re: Closing IndexWriter object after each file causes NullPointerException?

2004-04-13 Thread Brisbart Franck
If you close an IndexWriter more than once, the release of the writeLock creates a NullPointerException. You should clean your code and close your writer only once. Anyway, I don't know why there's no test on the 'writeLock' as in the 'finalize' method. I think it's a little error, so I suggest

index update (was Re: Large InputStream.BUFFER_SIZE causes OutOfMemoryError.. FYI)

2004-04-13 Thread petite_abeille
On Apr 13, 2004, at 02:45, Kevin A. Burton wrote: He mentioned that I might be able to squeeze 5-10% out of index merges this way. Talking of which... what strategy(ies) do people use to minimize downtime when updating an index? My current "strategy" is as follow: (1) use a temporary RAMDirect