date:20040303

New Word Document text extractor released

2004-03-03 Thread Ryan Ackley

Version 0.4 of the TextMining.org text extraction library has been released! I have finally gotten around to releasing a new version of the textmining.org text extractor. This is a pure java library for extracting text from Word 6.0/97/2000/XP/2003. Some highlights from this release: -I removed

Re: Sys properties Was: java.io.tmpdir as lock dir .... once again

2004-03-03 Thread Erik Hatcher

On Mar 3, 2004, at 4:25 PM, hui wrote: Anoterh similar issue. If we could have a parameter to control the max number of the files within the index, that is going to avoid the problem of running of the file handler issue. When the file number within one index reaches the limit, optimization is goi

Re: FuzzyQuery info

2004-03-03 Thread Otis Gospodnetic

For solving misspellings, you could try encoding and indexing the content using soundex or double metaphone algoright. There is at least one free product that uses Lucene and those two algos, and it's linked from one of the pages on Lucene site. Phonetix? Otis --- Supun Edirisinghe <[EMAIL PROT

RE: Sys properties Was: java.io.tmpdir as lock dir .... once again

2004-03-03 Thread hui

Anoterh similar issue. If we could have a parameter to control the max number of the files within the index, that is going to avoid the problem of running of the file handler issue. When the file number within one index reaches the limit, optimization is going to be called. Right now, if the file n

Re: Sys properties Was: java.io.tmpdir as lock dir .... once again

2004-03-03 Thread Stephane James Vaucher

How about (looking big rather than small): - MaxClause from BooleanQuery (I know there has been discussions on the dev list, but I haven't been following it) - default commit_lock_name - default commit_lock_timeout - default maxFieldLength - default maxMergeDocs - default mergeFactor - default mi

Re: Sys properties Was: java.io.tmpdir as lock dir .... once again

2004-03-03 Thread Doug Cutting

Stephane James Vaucher wrote: As I've stated in my earlier mail, I like this change. More importantly, could this become a "standard" way of changing configurations at runtime? For example, the default merge factor could also be set in this manner. Sure, that's reasonable, so this would be someth

Sys properties Was: java.io.tmpdir as lock dir .... once again

2004-03-03 Thread Stephane James Vaucher

As I've stated in my earlier mail, I like this change. More importantly, could this become a "standard" way of changing configurations at runtime? For example, the default merge factor could also be set in this manner. sv On Wed, 3 Mar 2004, Michael Duval wrote: > > I agree with both the prop

Re: java.io.tmpdir as lock dir .... once again

2004-03-03 Thread Michael Duval

I agree with both the property name change and also making it static. Mike Doug Cutting wrote: Michael Duval wrote: > I've hacked the code for the time being by updating FSDirectory and replaced all System.getProperty("java.io.tmpdir") calls with a call to a new method "getLockDir()". This me

Re: Best Practices for indexing in Web application

2004-03-03 Thread Michael Steiger

Doug Cutting wrote: Michael Steiger wrote: I'm wondering that there are no samples for this job. I do not think that I am the first one looking for this. If you found this confusing, and would have been helped by some examples, please take the time to donate some good examples. Lucene is f

Re: FuzzyQuery info

2004-03-03 Thread Supun Edirisinghe

thanks again for the info, Erik. I am looking at a great big index; I expect that we will always have . So, the FuzzyQuery is less viable now. as for what I'm trying to do: We have a site search that uses Lucene in a basic way. It needs to be improved and I'm trying to research all the features

Fw: [Plucene] Plucene::Plugin::WeightedQueryParser

2004-03-03 Thread Michael A. Schoen

This recently submitted query parser for Plucene (the perl port of lucene) seems very handy. Does something similar already exist for Lucene? Any volunteers? It seems like a pretty straightforward extension to MultiFieldQueryParser. - Original Message - From: "Simon Cozens" <[EMAIL PROTE

Re: java.io.tmpdir as lock dir .... once again

2004-03-03 Thread Doug Cutting

Michael Duval wrote: > I've hacked the code for the time being by updating FSDirectory and replaced all System.getProperty("java.io.tmpdir") calls with a call to a new method "getLockDir()". This method checks for a "lucene.lockdir" prop before the "java.io.tmpdir" prop giving the end user a bi

Re: Best Practices for indexing in Web application

2004-03-03 Thread Doug Cutting

Michael Steiger wrote: I'm wondering that there are no samples for this job. I do not think that I am the first one looking for this. If you found this confusing, and would have been helped by some examples, please take the time to donate some good examples. Lucene is free, but requires donati

Re: Problem with search results

2004-03-03 Thread Doug Cutting

Morus Walter wrote: Now I think this can be fixed in the query parser alone by simply allowing '-' within words. That is change <#_TERM_CHAR: ( <_TERM_START_CHAR> | <_ESCAPED_CHAR> ) > to <#_TERM_CHAR: ( <_TERM_START_CHAR> | <_ESCAPED_CHAR> | "-" ) > As a result, query parser will read '-' within w

Highlighting problem

2004-03-03 Thread umamahesh bayireddya

hi I am trying to use highlighting package (http://home.clara.net/markharwood/lucene/highlight.htm) in that i have to use query.rewrite(reader) method which is not in QueryParser object. how to fix this _ INDIA TODAY @ Rs. 5 for 5

Re: Highlighting problem

2004-03-03 Thread Vladimir Yuryev

Hi! For you Mark Harwood has made a file HighlightExtractorTest in which the principle of work Highlight is specified. Besides by replacing tags for example on < B style = " color:black; background-color:#66 " >, receive yellow Highlight. If to apply conformity found word and color, it will

Re: Best Practices for indexing in Web application

2004-03-03 Thread Morus Walter

Michael Steiger writes: > > > > Depends on your application, but if you can, it's better to keep IndexSearcher > > open until the index changes. > > Otherwise you will have to open all the index files for each search. > > Good tip. So I have to synchronize (logically) my search routine with > an

New Word Document text extractor released

Re: Sys properties Was: java.io.tmpdir as lock dir .... once again

Re: FuzzyQuery info

RE: Sys properties Was: java.io.tmpdir as lock dir .... once again

Re: Sys properties Was: java.io.tmpdir as lock dir .... once again

Re: Sys properties Was: java.io.tmpdir as lock dir .... once again

Sys properties Was: java.io.tmpdir as lock dir .... once again

Re: java.io.tmpdir as lock dir .... once again

Re: Best Practices for indexing in Web application

Re: FuzzyQuery info

Fw: [Plucene] Plucene::Plugin::WeightedQueryParser

Re: java.io.tmpdir as lock dir .... once again

Re: Best Practices for indexing in Web application

Re: Problem with search results

Highlighting problem

Re: Highlighting problem

Re: Best Practices for indexing in Web application

17 matches

Site Navigation

Mail list logo

Footer information