Lucene does support stemming, but that is not what your example requires
(stemming equates "roaming", "roam", "roamed", etc.). For stemming,
look at PorterStemFilter or better, the Snowball stemmers in the
sandbox. For your similar word list, I think you are looking for the
class FuzzyTermEnum.
On Wed, 24 Nov 2004 13:04:20 +0530, Santosh <[EMAIL PROTECTED]> wrote:
> I have gon through IndexReader , I got method : delete(int docNum) ,
> but from where I will get document number? Is this predifined? or we have
> to give a number prior to indexing?
The number(aka doc-id) is given by
A good way to do this is to add a keyword field with whatever unique id
you have for the document. Then you can delete the term containing a
unique id to delete the document from the index (look at
IndexReader.delete(Term)). You can look at the demo class IndexHTML to
see how it does incremental
Terence Lai writes:
>
> Look likes that the wildcard query disappeared. In fact, I am expecting
> text:"java* developer" to be returned. It seems to me that the QueryParser
> cannot handle the wildcard within a quoted String.
>
That's not just QueryParser.
Lucene itself doesn't handle wildcard
I have gon through IndexReader , I got method : delete(int docNum) ,
but from where I will get document number? Is this predifined? or we have
to give a number prior to indexing?
- Original Message -
From: "Luke Francl" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
can lucene will be able to do stemming?
If I am searching for "roam" then I know that it can give result for "foam"
using fuzzy query. But my requirement is if I search for "roam" can I get the
similar wordlist as output. so that I can show the end user in the column
--- do you me
Hi Guys
Apologies
I have a MERGERINDEX [ Merged 1000 subindexes] ,
The Question is
Does Somebody have any solution for recorrecting the Mergerindex [ in
case of Corruption ]
If so Please Let the Form know about this,so developers like us would use
the same.
Thx in Advanc
Hi all,
I am trying to use the QueryParser.parse() to parse a query string like "java*
developer". Note that I want the wildcard string, java*, followed by the word
developer. The following is the code.
-
String qryStr = "\"java* developer\"";
String fieldname = "text";
StandardAnal
Hi Ken,
I'm glad our replies were helpful. It sounds like you looked at the
code in MaxDisjunctionQuery, so you probably noticed that it also
implements skipTo(). Your suggestion sounds like a good thing to do. I
thought about that when writing MaxDisjunctionQuery, but didn't need the
generalit
On Tue, 23 Nov 2004 22:47:21 +0100, Paul <[EMAIL PROTECTED]> wrote:
> Hi,
> I'm creating a document and adding it with a writer to the index. For
> some reason I need to add data to this specific document later on
> (minutes, not hours or days). Is it possible to retrieve it and add
> additonal dat
Hi,
Thanks the pointers in your replies. Would it be possible to include
some sort of accrual scorer interface somewhere in the Lucene Query
APIs? This could be passed into a query similar to
MaxDisjunctionQuery; and combine the sum, max, tieBreaker, etc.,
according to the implementor's discreti
On Nov 23, 2004, at 6:02 PM, Kevin A. Burton wrote:
Erik Hatcher wrote:
Also, there is a DBDirectory in the sandbox to store a Lucene index
inside Berkeley DB.
I assume this would prevent prefix queries from working...
Huh? Why would you assume that? As far as I know, and I've tested
this some,
Thanks Chuck! I missed the call: getIndexOffset.
I am profiling it again to pin-point where the performance problem is.
-John
On Tue, 23 Nov 2004 16:13:22 -0800, Chuck Williams <[EMAIL PROTECTED]> wrote:
> Are you sure you have a performance problem with
> TermInfosReader.get(Term)? It looks to
Are you sure you have a performance problem with
TermInfosReader.get(Term)? It looks to me like it scans sequentially
only within a small buffer window (of size
SegmentTermEnum.indexInterval) and that it uses binary search otherwise.
See TermInfosReader.getIndexOffset(Term).
Chuck
> -Origi
Hi:
I am trying to index 1M documents, with batches of 500 documents.
Each document has an unique text key, which is added as a
Field.KeyWord(name,value).
For each batch of 500, I need to make sure I am not adding a
document with a key that is already in the current index.
To do this
Erik Hatcher wrote:
Also, there is a DBDirectory in the sandbox to store a Lucene index
inside Berkeley DB.
I assume this would prevent prefix queries from working...
Kevin
--
Use Rojo (RSS/Atom aggregator). Visit http://rojo.com. Ask me for an
invite! Also see irc.freenode.net #rojo if you wan
: Note that I said FilteredQuery, not QueryFilter.
Doh .. right sorry, I confused myself by thinking you were still refering
to your comments 2004-03-29 comparing DateFilter with RangeQuery wrapped
in a QueryFilter.
: I debate (with myself) on whether add-ons that can be done with other
: code i
Hi,
I'm creating a document and adding it with a writer to the index. For
some reason I need to add data to this specific document later on
(minutes, not hours or days). Is it possible to retrieve it and add
additonal data?
I found the document(int n) - method within the IndexReader (btw: the
descr
On Nov 23, 2004, at 3:41 PM, Erik Hatcher wrote:
On Nov 23, 2004, at 2:16 PM, Chris Hostetter wrote:
First: Is there any reason Matt Quail's "LongField" class hasn't been
added to CVS (or has it and I'm just not seeing it?)
Laziness is the only reason, at least on my part. I think adding it
is a
On Nov 23, 2004, at 2:16 PM, Chris Hostetter wrote:
: I did a little code cleanup, Chris, renaming some RangeFilter
variables
: and correcting typos in the Javadocs. Let me know if everything
looks
: ok.
Wow ... that was fast. Things look fine to me (typo's in javadocs are
my
specialty) but
Hmmm, scratch that. I explained the tradeoff of a
filter vs a range query - not between the different
types of filters you talk about.
--- Yonik Seeley <[EMAIL PROTECTED]> wrote:
> I think it depends on the query. If the query (q1)
> covers a large number of documents and the fiter
> covers a ve
I think it depends on the query. If the query (q1)
covers a large number of documents and the fiter
covers a very small number, then using a RangeFilter
will probably be slower than a RangeQuery.
-Yonik
> See, this is what I'm not getting: what is the
> advantage of the second
> world? :) ... i
On Nov 23, 2004, at 10:01 AM, Praveen Peddi wrote:
Chris's RangeFilter does not cache anything where as QueryFilter does
caching. Is it better to add the caching funtionality to RangeFilter
also? or does it not make any difference?
Caching is a different _aspect_. Filtering and caching are not r
Hi,
I read a lot of mails about the time consuming pdf-parsing and tried
myself some solutions. My example PDF file has 181 pages in 1,5 MB
(mostly text nearly no grafics).
-with pdfbox.org's toolkit it took 17m32s to parse&read it's content
-after installing ghostscript and ps2text / ps2ascii my p
To update a document you need to insert the modified document, then delete the
old one.
Here is some code that I use to get you going in the right direction (it wont
compile, but if you follow it closely you will see how I take an array of
lucene documents with new properties and add them, then
On Tue, 2004-11-23 at 13:59, Santosh wrote:
> I am using lucene for indexing, when I am creating Index the docuemnts are
> added. but when I want to modify the single existing document and reIndex
> again, it is taking as new document and adding one more time, so that I am
> getting same documen
I am using lucene for indexing, when I am creating Index the docuemnts are
added. but when I want to modify the single existing document and reIndex
again, it is taking as new document and adding one more time, so that I am
getting same document twice in the results.
To overcome this I am deleti
: Done. I deprecated DateField and DateFilter, and added the RangeFilter
: class contributed by Chris.
:
: I did a little code cleanup, Chris, renaming some RangeFilter variables
: and correcting typos in the Javadocs. Let me know if everything looks
: ok.
Wow ... that was fast. Things look fi
Chris's RangeFilter does not cache anything where as QueryFilter does
caching. Is it better to add the caching funtionality to RangeFilter also?
or does it not make any difference?
Praveen
- Original Message -
From: "Erik Hatcher" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROT
On Nov 23, 2004, at 4:18 AM, Doug Cutting wrote:
Hoss wrote:
The attachment contains my RangeFilter, a unit test that demonstrates
it,
and a Benchmarking unit test that does a side-by-side comparison with
RangeQuery [6]. If developers feel that this class is useful, then
by all
means roll it int
On Nov 22, 2004, at 9:25 PM, Hoss wrote:
I'm rather new to Lucene (and this list), so if I'm grossly
misunderstanding things, forgive me.
You're spot on!
But I was surprised then to see the following quote from "Erik
Hatcher" in
the archives:
"In fact, DateFilter by itself is practically of no
Hoss wrote:
The attachment contains my RangeFilter, a unit test that demonstrates it,
and a Benchmarking unit test that does a side-by-side comparison with
RangeQuery [6]. If developers feel that this class is useful, then by all
means roll it into the code base. (90% of it is cut/pasted from
Dat
Hi Dmitry,
Thank you so much for your reply.
I'd like to answer your specific questions.
>>It also depends on whether you are using "compound files" or not (this
is a flag on the IndexWriter). >>With compound files flag on, segments
have fixed number of files, regardless of how many fields
Also, there is a DBDirectory in the sandbox to store a Lucene index
inside Berkeley DB.
Erik
On Nov 22, 2004, at 6:06 PM, Kevin A. Burton wrote:
It seems that when compared to other datastores that Lucene starts to
fall down. For example lucene doesn't perform online index
optimization
Hoss writes:
>
> (c) Filtering. Filters in general make a lot of sense to me. They are a
> way to specify (at query time) that only a certain subset of the index
> should be considered for results. The Filter class has a very straight
> forward API that seems very easy to subclass to get the be
Chris,
On Tuesday 23 November 2004 03:25, Hoss wrote:
> (NOTE: numbers in [] indicate Footnotes)
>
> I'm rather new to Lucene (and this list), so if I'm grossly
> misunderstanding things, forgive me.
>
> One of my main needs as I investigate Search technologies is to restrict
> results based on
On Tuesday 23 November 2004 00:06, Kevin A. Burton wrote:
> I'm wondering about the potential for a generic JDBCDirectory for
> keeping the lucene index within a database.
Such a thing already exists: http://ppinew.mnis.com/jdbcdirectory/, but I
don't know about its scalability.
Regards
Daniel
37 matches
Mail list logo