Version 0.4 of the TextMining.org text extraction library has been released!
I have finally gotten around to releasing a new version of the
textmining.org text extractor. This is a pure java library for extracting
text from Word 6.0/97/2000/XP/2003.
Some highlights from this release:
-I removed
On Mar 3, 2004, at 4:25 PM, hui wrote:
Anoterh similar issue. If we could have a parameter to control the max
number of the files within the index, that is going to avoid the
problem of
running of the file handler issue.
When the file number within one index reaches the limit, optimization
is
goi
For solving misspellings, you could try encoding and indexing the
content using soundex or double metaphone algoright.
There is at least one free product that uses Lucene and those two
algos, and it's linked from one of the pages on Lucene site.
Phonetix?
Otis
--- Supun Edirisinghe <[EMAIL PROT
Anoterh similar issue. If we could have a parameter to control the max
number of the files within the index, that is going to avoid the problem of
running of the file handler issue.
When the file number within one index reaches the limit, optimization is
going to be called.
Right now, if the file n
How about (looking big rather than small):
- MaxClause from BooleanQuery (I know there has been discussions on
the dev list, but I haven't been following it)
- default commit_lock_name
- default commit_lock_timeout
- default maxFieldLength
- default maxMergeDocs
- default mergeFactor
- default mi
Stephane James Vaucher wrote:
As I've stated in my earlier mail, I like this change. More importantly,
could this become a "standard" way of changing configurations at runtime?
For example, the default merge factor could also be set in this manner.
Sure, that's reasonable, so this would be someth
As I've stated in my earlier mail, I like this change. More importantly,
could this become a "standard" way of changing configurations at runtime?
For example, the default merge factor could also be set in this manner.
sv
On Wed, 3 Mar 2004, Michael Duval wrote:
>
> I agree with both the prop
I agree with both the property name change and also making it static.
Mike
Doug Cutting wrote:
Michael Duval wrote:
> I've hacked the code for the time being by updating FSDirectory and
replaced all System.getProperty("java.io.tmpdir")
calls with a call to a new method "getLockDir()". This me
Doug Cutting wrote:
Michael Steiger wrote:
I'm wondering that there are no samples for this job. I do not think
that I am the first one looking for this.
If you found this confusing, and would have been helped by some
examples, please take the time to donate some good examples. Lucene is
f
thanks again for the info, Erik.
I am looking at a great big index; I expect that we will always have .
So, the FuzzyQuery is less viable now.
as for what I'm trying to do: We have a site search that uses Lucene in
a basic way. It needs to be improved and I'm trying to research all the
features
This recently submitted query parser for Plucene (the perl port of lucene)
seems very handy.
Does something similar already exist for Lucene? Any volunteers? It seems
like a pretty straightforward extension to MultiFieldQueryParser.
- Original Message -
From: "Simon Cozens" <[EMAIL PROTE
Michael Duval wrote:
> I've hacked the code for the time being by updating FSDirectory and
replaced all System.getProperty("java.io.tmpdir")
calls with a call to a new method "getLockDir()". This method checks
for a "lucene.lockdir" prop before the
"java.io.tmpdir" prop giving the end user a bi
Michael Steiger wrote:
I'm wondering that there are no samples for this job. I do not think
that I am the first one looking for this.
If you found this confusing, and would have been helped by some
examples, please take the time to donate some good examples. Lucene is
free, but requires donati
Morus Walter wrote:
Now I think this can be fixed in the query parser alone by simply allowing
'-' within words.
That is change
<#_TERM_CHAR: ( <_TERM_START_CHAR> | <_ESCAPED_CHAR> ) >
to
<#_TERM_CHAR: ( <_TERM_START_CHAR> | <_ESCAPED_CHAR> | "-" ) >
As a result, query parser will read '-' within w
hi
I am trying to use highlighting package
(http://home.clara.net/markharwood/lucene/highlight.htm)
in that i have to use query.rewrite(reader) method which is not in
QueryParser object.
how to fix this
_
INDIA TODAY @ Rs. 5 for 5
Hi!
For you Mark Harwood has made a file HighlightExtractorTest in which
the principle of work Highlight is specified.
Besides by replacing tags for example on < B style = "
color:black; background-color:#66 " >, receive yellow Highlight.
If to apply conformity found word and color, it will
Michael Steiger writes:
> >
> > Depends on your application, but if you can, it's better to keep IndexSearcher
> > open until the index changes.
> > Otherwise you will have to open all the index files for each search.
>
> Good tip. So I have to synchronize (logically) my search routine with
> an
17 matches
Mail list logo