New Word Document text extractor released

2004-03-03 Thread Ryan Ackley
Version 0.4 of the TextMining.org text extraction library has been released! I have finally gotten around to releasing a new version of the textmining.org text extractor. This is a pure java library for extracting text from Word 6.0/97/2000/XP/2003. Some highlights from this release: -I removed

Re: Sys properties Was: java.io.tmpdir as lock dir .... once again

2004-03-03 Thread Erik Hatcher
On Mar 3, 2004, at 4:25 PM, hui wrote: Anoterh similar issue. If we could have a parameter to control the max number of the files within the index, that is going to avoid the problem of running of the file handler issue. When the file number within one index reaches the limit, optimization is goi

Re: FuzzyQuery info

2004-03-03 Thread Otis Gospodnetic
For solving misspellings, you could try encoding and indexing the content using soundex or double metaphone algoright. There is at least one free product that uses Lucene and those two algos, and it's linked from one of the pages on Lucene site. Phonetix? Otis --- Supun Edirisinghe <[EMAIL PROT

RE: Sys properties Was: java.io.tmpdir as lock dir .... once again

2004-03-03 Thread hui
Anoterh similar issue. If we could have a parameter to control the max number of the files within the index, that is going to avoid the problem of running of the file handler issue. When the file number within one index reaches the limit, optimization is going to be called. Right now, if the file n

Re: Sys properties Was: java.io.tmpdir as lock dir .... once again

2004-03-03 Thread Stephane James Vaucher
How about (looking big rather than small): - MaxClause from BooleanQuery (I know there has been discussions on the dev list, but I haven't been following it) - default commit_lock_name - default commit_lock_timeout - default maxFieldLength - default maxMergeDocs - default mergeFactor - default mi

Re: Sys properties Was: java.io.tmpdir as lock dir .... once again

2004-03-03 Thread Doug Cutting
Stephane James Vaucher wrote: As I've stated in my earlier mail, I like this change. More importantly, could this become a "standard" way of changing configurations at runtime? For example, the default merge factor could also be set in this manner. Sure, that's reasonable, so this would be someth

Sys properties Was: java.io.tmpdir as lock dir .... once again

2004-03-03 Thread Stephane James Vaucher
As I've stated in my earlier mail, I like this change. More importantly, could this become a "standard" way of changing configurations at runtime? For example, the default merge factor could also be set in this manner. sv On Wed, 3 Mar 2004, Michael Duval wrote: > > I agree with both the prop

Re: java.io.tmpdir as lock dir .... once again

2004-03-03 Thread Michael Duval
I agree with both the property name change and also making it static. Mike Doug Cutting wrote: Michael Duval wrote: > I've hacked the code for the time being by updating FSDirectory and replaced all System.getProperty("java.io.tmpdir") calls with a call to a new method "getLockDir()". This me

Re: Best Practices for indexing in Web application

2004-03-03 Thread Michael Steiger
Doug Cutting wrote: Michael Steiger wrote: I'm wondering that there are no samples for this job. I do not think that I am the first one looking for this. If you found this confusing, and would have been helped by some examples, please take the time to donate some good examples. Lucene is f

Re: FuzzyQuery info

2004-03-03 Thread Supun Edirisinghe
thanks again for the info, Erik. I am looking at a great big index; I expect that we will always have . So, the FuzzyQuery is less viable now. as for what I'm trying to do: We have a site search that uses Lucene in a basic way. It needs to be improved and I'm trying to research all the features

Fw: [Plucene] Plucene::Plugin::WeightedQueryParser

2004-03-03 Thread Michael A. Schoen
This recently submitted query parser for Plucene (the perl port of lucene) seems very handy. Does something similar already exist for Lucene? Any volunteers? It seems like a pretty straightforward extension to MultiFieldQueryParser. - Original Message - From: "Simon Cozens" <[EMAIL PROTE

Re: java.io.tmpdir as lock dir .... once again

2004-03-03 Thread Doug Cutting
Michael Duval wrote: > I've hacked the code for the time being by updating FSDirectory and replaced all System.getProperty("java.io.tmpdir") calls with a call to a new method "getLockDir()". This method checks for a "lucene.lockdir" prop before the "java.io.tmpdir" prop giving the end user a bi

Re: Best Practices for indexing in Web application

2004-03-03 Thread Doug Cutting
Michael Steiger wrote: I'm wondering that there are no samples for this job. I do not think that I am the first one looking for this. If you found this confusing, and would have been helped by some examples, please take the time to donate some good examples. Lucene is free, but requires donati

Re: Problem with search results

2004-03-03 Thread Doug Cutting
Morus Walter wrote: Now I think this can be fixed in the query parser alone by simply allowing '-' within words. That is change <#_TERM_CHAR: ( <_TERM_START_CHAR> | <_ESCAPED_CHAR> ) > to <#_TERM_CHAR: ( <_TERM_START_CHAR> | <_ESCAPED_CHAR> | "-" ) > As a result, query parser will read '-' within w

Highlighting problem

2004-03-03 Thread umamahesh bayireddya
hi I am trying to use highlighting package (http://home.clara.net/markharwood/lucene/highlight.htm) in that i have to use query.rewrite(reader) method which is not in QueryParser object. how to fix this _ INDIA TODAY @ Rs. 5 for 5

Re: Highlighting problem

2004-03-03 Thread Vladimir Yuryev
Hi! For you Mark Harwood has made a file HighlightExtractorTest in which the principle of work Highlight is specified. Besides by replacing tags for example on < B style = " color:black; background-color:#66 " >, receive yellow Highlight. If to apply conformity found word and color, it will

Re: Best Practices for indexing in Web application

2004-03-03 Thread Morus Walter
Michael Steiger writes: > > > > Depends on your application, but if you can, it's better to keep IndexSearcher > > open until the index changes. > > Otherwise you will have to open all the index files for each search. > > Good tip. So I have to synchronize (logically) my search routine with > an