Users mailing list archive at Nabble.com.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Regards,
Shai Erera
]
--
Regards,
Shai Erera
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Regards,
Shai Erera
StandardAnalyzer I have got no option.
Thanks a lot for your reply.
Please suggest me how can I go ahead.
SHAKTI SAREEN
GE-GDC
STC HYDERABAD
994894
-Original Message-
From: Shai Erera [mailto:[EMAIL PROTECTED]
Sent: Thursday, November 22, 2007 9:25 PM
To: java-user
uncertain about one detail: How do I achieve
a
search for multiple keywords. Not just green tree but also short road,
sky, bird? Is there a chance to add those keywords to the Query q =
qp.parse(\green tree\); command?
Shai Erera wrote:
How about using MultiFieldQueryParser. Here is a short
: [EMAIL PROTECTED]
--
Regards,
Shai Erera
]
--
Regards,
Shai Erera
that leveraged the SpanQuery family of queries to do
something like this.
-Hoss
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Regards,
Shai Erera
: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Regards,
Shai Erera
I just noticed MultiPhraseQuery has a setSlop method, so I think this Query
is what you're looking for.
On Dec 15, 2007 7:04 AM, Shai Erera [EMAIL PROTECTED] wrote:
You can look at org.apache.lucene.search.MultiPhraseQuery which does
something similar to what you ask. From its javadoc
code
and assign somehow Field.Store, Field.Index and etc... based on string
value.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Regards,
Shai Erera
like to do is for it to return mouse cat apple and mouse cat house and
not cat house mouse
it will only do this if the field is untokenized I believe.
is there any way to get the desired behavior?
best,
-C.B.
--
Regards,
Shai Erera
Hi
I didn't find a proper API on InderWriter or IndexReader to retrieve the
total number of deleted documents.
Will IndexReader.maxDocs() - IndexReader.numDocs() give the correct result?
or this is just a heuristic?
Thanks,
Shai
Thanks
I guess I should have looked in the code before asking those silly questions
:-)
I wonder why there isn't a specific API for that though ...
On Jan 11, 2008 7:36 PM, Steven A Rowe [EMAIL PROTECTED] wrote:
Hi Shai,
On 01/11/2008 at 7:42 AM, Shai Erera wrote:
Will IndexReader.maxDocs
: [EMAIL PROTECTED]
--
Regards,
Shai Erera
appreciated.
Best Regards,
C.B.
--
Regards,
Shai Erera
PM, Shai Erera [EMAIL PROTECTED] wrote:
What is the default Operator of your QueryParser? Is it AND_OPERATOR or
OR_OPERATOR. If it's OR ... then it's strange. If it's AND, then once
you
add more terms than what exists, it won't find anything.
On Feb 13, 2008 6:54 PM, Cam Bazz [EMAIL
, BooleanClause.Occur.SHOULD));
}
private static void add(BooleanQuery q, String name, String value) {
q.add(new BooleanClause(new TermQuery(new Term(name, value)),
BooleanClause.Occur.SHOULD));
}
On Thu, Feb 14, 2008 at 8:44 AM, Shai Erera [EMAIL PROTECTED] wrote:
Is this Speller
.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
--
Regards,
Shai Erera
commands, e-mail: [EMAIL PROTECTED]
--
Regards,
Shai Erera
I think you should write your own Analyzer and use:
* StandardTokenizer for tokenization and ACRONYM detection.
* StopFilter for stopwrods handling.
The Analyzer you write should override tokenStream() and do something like:
FWIW, I had implemented a sort-by-payload feature which performs quite well.
It has a very small memory footprint (actually close to 0), and reads values
from a payload. Payloads, at least from my experience, perform better than
stored fields.
On a comparison I've once made, the sort-by-payload
Maybe add to each doc a field numVolunteers and then constraint the query to
vol:krish and vol:raj and numvol:2 (something like that)?
On Wed, Jul 22, 2009 at 9:49 AM, ba3 sbadhrin...@gmail.com wrote:
Hi,
In the documents which contain the volunteer information :
Doc1 :
volunteer krish
From my experience, you shouldn't have any problems indexing that amount of
content even into one index. I've successfully indexed 450 GB of data w/
Lucene, and I believe it can scale much higher if rich text documents are
indexed. Though I haven't tried yet, I believe it can scale into the 1-5 TB
There shouldn't be a problem to search such index. It depends on the machine
you use. If it's a strong enough machine, I don't think you should have any
problems.
But like I said, you can always try it out on your machine before you make a
decision.
Also, Lucene has a Benchmark package which
Perhaps I misunderstood something, but how do you update a document?
I mean, if a document contains vol:a, vol:b and vol:c and then you want to
add vol:d to it, don't you remove the document and add it back?
If that's what you do, then you can also update the numvols field, right?
Or .. you
Hi Robert,
What you could do is use the Stemmer (as a TokenFilter I assume) and produce
two tokens always - the stem and the original. Index both of them in the
same position.
Then tell your users that if they search for [testing], it will find results
for 'testing', 'test' etc (the stems) and
Can you be more specific? What do you mean by re-rank? Reverse the sort?
give different weights?
Shai
On Wed, Jul 22, 2009 at 4:35 PM, henok sahilu henok_sah...@yahoo.comwrote:
hello there
i like to re-rank lucene TopDoc result set.
where shall i start
thanks
Hi Alex,
You can start with this article:
http://www.manning.com/free/green_HotBackupsLucene.html (you'll need to
register w/ your email). It describes how one can write Hot Backups w/
Lucene, and capture just the delta since the last backup.
I'm about to try it myself, so if you get to do it
sahilu henok_sah...@yahoo.comwrote:
i like to write a code that re assign weight to documets so that they can
be reranked
--- On Wed, 7/22/09, Shai Erera ser...@gmail.com wrote:
From: Shai Erera ser...@gmail.com
Subject: Re: reranking Lucene TopDocs
To: java-user@lucene.apache.org
Date
It's not accurate to say that Lucene scans the index for each search.
Rather, every Query reads a set of posting lists, each are typically read
from disk. If you pass Query[] which have nothing to do in common (for
example no terms in common), then you won't gain anything, b/c each Query
will
Queries cannot be ordered sequentially. Let's say that you run 3 Queries,
w/ one term each a, b and c. On disk, the posting lists of the terms
can look like this: post1(a), post1(c), post2(a), post1(b), post2(c),
post2(b) etc. They are not guaranteed to be consecutive. The code makes sure
the
that the
presence
of the $ will short-circuit stemming, but you'll have to be sure that
whatever
analyzer you use doesn't strip it.
Best
Erick
On Wed, Jul 22, 2009 at 9:16 AM, Shai Erera ser...@gmail.com wrote:
Hi Robert,
What you could do is use the Stemmer (as a TokenFilter I
of Lucene for some
time, so I could be way off.
But you'd sure want to use a different token G
Erick
On Wed, Jul 22, 2009 at 4:12 PM, Shai Erera ser...@gmail.com wrote:
Actually my stemming Analyzer adds a similar character to stems, to
distinguish between original tokens (like
off the top of my head, if you have in hand all the doc IDs that were
returned so far, you can do this:
1) Build a Filter which will return any doc ID that is not in that list. For
example, pass it the list of doc IDs and every time next() or skipTo is
called, it will skip over the given doc IDs.
Generally you shouldn't hit OOM. But it may change depending on how you use
the index. For example, if you have millions of documents spread across the
100 GB, and you use sorting for various fields, then it will consume lots of
RAM. Also, if you run hundreds of queries in parallel, each with a
There are a couple of things I can think of:
1) From IndexReader's javadoc (
http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/index/IndexReader.html#deleteDocument%28int%29):
An IndexReader can be opened on a directory for which an IndexWriter is
opened already, but it cannot be used to
:
indexWriter.setMaxBufferedDocs(10);
No difference - it continues to create one document in each RAM segment
before the first merge.
-venkat
-Original Message-
From: Shai Erera [mailto:ser...@gmail.com]
Sent: Saturday, July 25, 2009 10:55 PM
To: java-user@lucene.apache.org
Subject: Re: Number
You write that you index the string under the url field. Do you also index
it under title? If not, that can explain why title:Rahul Dravid does not
work for you.
Also, did you try to look at the index w/ Luke? It will show you what are
the terms in the index.
Another thing which is always good
, and have a given title.
On Sun, Aug 2, 2009 at 4:07 PM, Shai Erera ser...@gmail.com wrote:
You write that you index the string under the url field. Do you also
index
it under title? If not, that can explain why title:Rahul Dravid does
not
work for you.
Also, did you try to look
You can always create your own Analyzer which creates a TokenStream just
like StandardAnalyzer, but instead of using StandardFilter, write another
TokenFilter which receives the HOST token type, and breaks it further to its
components (e.g., extract en, wikipedia and org). You can also return
the
I can think of another approach - during indexing, capture the word
aboutus and index it as about us and aboutus in the same position.
That way both queries will work. You'd need to write your own TokenFilter,
maybe a SynonymTokenFilter (since this reminds me of synonyms usage) that
accept a list
I don't see that you use the Analyzer anywhere (i.e. it's created by not
used?).
Also, the wildcard query you create may be very inefficient, as it will
expand all the terms under the DEFAULT_FIELD. If the DEFAULT_FIELD is the
field where all your default searchable terms are indexed, there could
If you don't know which tokens you'll face, then it's really a much harder
problem. If you know where the token is, e.g. it's always in
http://some.example.site/a/b/here will be the token to break/index.html,
then it eases the task a bit. Otherwise you'll need to search every single
token
Interesting ... I don't have access to a Japanese dictionary, so I just
extract bi-grams. But I guess that in this case, if one can access an
English dictionary (are you aware of an open-source one, or free one
BTW?), one can use the method you mention.
But still, doing this for every Token you
From: Shai Erera ser...@gmail.com
To: java-user@lucene.apache.org
Sent: Tuesday, August 4, 2009 10:31:46 AM
Subject: Re: Searching doubt
Hi Darren,
The question was, how given a string aboutus in a document, you can
return
that document as a result to the query about us (note the space). So
If you pass reader.maxDoc(), it will create a heap (array) of size
reader.maxDoc() and is not recommended.
Instead, if you display the first page of results, you should pass 10
(assuming you display 10 results).
You can call TopFieldDocs.totalHits to get the total number of matching
results.
Then
Robert - can you elaborate on what you mean by just treat it at the script
level?
On Thu, Aug 6, 2009 at 10:55 PM, Robert Muir rcm...@gmail.com wrote:
Bradford, there is an arabic analyzer in trunk. for farsi there is
currently a patch available:
Thanks Robert for the explanation. I thought that you meant something
different, like doing stemming in some sophisticated manner by somehow
detecting the language. Doing these normalizations makes sense of course,
especially if the letters look similar.
Thanks again,
Shai
On Thu, Aug 6, 2009
you should also make sure the data is indexed twice, once w/ the original
case and once w/o. It's like putting a TokenFilter after WhitespaceTokenizer
which returns two tokens - lowercased and the original, both in the same
position (set posIncr to 0).
On Wed, Aug 12, 2009 at 6:20 AM, Max Lynch
If this file has a predefined construct, e.g.:
title: someting
location: new york
then you can write a simple parser that extracts that information.
But I think otherwise this falls outside the scope of Lucene, unless I
misunderstood you.
If I had to give it a long shot though, I'd try to
Is that a local file system, or a network share?
On Thu, Aug 13, 2009 at 1:07 PM, rishisinghal singhal.ri...@gmail.comwrote:
Is there any chance that two writers are open on this directory?
No, thats not true.
something external to Lucene is removing files from the directory.
No this also
experience that? Can you try to create the index somewhere else, or
on another drive?
Shai
On Thu, Aug 13, 2009 at 3:00 PM, rishisinghal singhal.ri...@gmail.comwrote:
It is a local file system.
We are using lucene 2.4 and java 1.5
Regards,
Rishi
Shai Erera wrote:
Is that a local file
this set of
documents added in sync with the index reader on the index (before it has
been written to).
What I'd like is to have an access to the stuff the index writer has
written but not yet commited. Is there something that can access that data?
Daniel Shane
Shai Erera wrote:
How many
2.4]
Checking only these segments: _61:
No problems were detected with this index.
Regards,
Rishi
Shai Erera wrote:
I noticed the exception is Caused by: java.io.FileNotFoundException:
/SYS$SYSDEVICE/RISHI/melon_1600/_61.cfs (i/o error (errno:5))
I searched for i/o error
I think you should also delete files that don't exist anymore in the index,
from the backup?
Shai
On Fri, Aug 14, 2009 at 10:02 PM, Michael McCandless
luc...@mikemccandless.com wrote:
Could you boil this down to a small standalone program showing the problem?
Optimizing in between backups
Hi
If I can guarantee only one JVM will update an index (not at a time - truly
just one JVM), can I disable locks, or is it really necessary only for
read-only devices? If I disable locks, will I see any performance
improvements?
Thanks
Shai
Thanks Mike.
Shai
On Sat, Aug 15, 2009 at 12:49 PM, Michael McCandless
luc...@mikemccandless.com wrote:
You could also use NoLockFactory.
Disabling locks just means Lucene stops checking if another writer has
the index open (the write.lock file).
It's extremely dangerous to do, unless
Hi
I'd like to extend Lucene's FieldCache such that it will read native values
from a different place (in my case, payloads). That is, instead of iterating
on a field's terms and parsing each String to long (for example), I'd like
to iterate over one term (sort:long, again - an example) and
Can you please elaborate more on the use case? Why if a certain document is
irrelevant to a certain query, you'd like to give it a score? Are you
perhaps talking about certain documents which should always appear in search
results, no matter what the query is? And instead of always showing them,
That's strange ... how do you execute your searches - each search opens up
an IndexReader? Do you make sure to close them? Maybe those are file
descriptors of files you index?
Forgive the silly questions, but I've never seen Lucene run into
out-of-files handles ...
Shai
On Wed, Aug 26, 2009 at
Thanks a lot for the response !
I wanted to avoid two things:
* Writing the logic that invokes cache-refresh upon IndexReader reload.
* Write my own TopFieldCollector which uses this cache.
I guess I don't have any other choice by to write both of them, or try to
make TFC more customizable such
When you run optimize(), you consume CPU and do lots of IO operations which
can really mess up the OS IO cache. Optimize is a very heavy process and
therefore is recommended to run at off hours. Sometimes, when your index is
large enough, it's recommended to run it during weekends, since the
If I'm not mistaken, IndexReader reads the .del file into memory, and
therefore subsequent updates to it won't be visible to it.
Shai
On Tue, Sep 1, 2009 at 3:54 PM, Ted Stockwell emorn...@yahoo.com wrote:
Hi All,
I am interested in using Lucene to index RDF (Resource Description Format)
What do you mean by first result in the group? What is a group?
On Wed, Sep 2, 2009 at 1:36 PM, Ganesh emailg...@yahoo.co.in wrote:
Hello all,
I want to retrieve the first result in the group. How to acheive this?
Currently i am parsing all the results, using a hash and avoiding duplicate
Thanks
I plan to look into two things, and then probably create two separate
issues:
1) Refactor the FieldCache API (and TopFieldCollector) such that one can
provide its own Cache of native values. I'd hate to rewrite the
FieldComparators logic just because the current API is not extendable.
Thanks Mike. I did not phrase well my understanding of Cache reload. I
didn't mean literally as part of the reopen, but *because* of the reopen.
Because FieldCache is tied to an IndexReader instance, after reopen it gets
refreshed. If I keep my own Cache, I'll need to code that logic, and I
prefer
I didn't say we won't need CSF, but that at least conceptually, CSF and my
sort-by-payload are the same. If however it turns out that CSF performs
better, then I'll definitely switch my sort-by-payload package to use it. I
thought that CSF is going to be implemented using payloads, but perhaps I'm
I can think of a way where you rely solely on scores and therefore there is
still chance to get results not ordered the way you want, but you can try it
- run the query [foo bar OR foo bar^10]. That way, your first result
should be scored by [foo], [bar] and [foo bar]. Also, the phrase is added
a
I agree. If you need sort-by-score, it's better to use the fast search
methods. IndexSearcher will create the appropriate TSDC instance for you,
based on the Query that was passed.
If you need to create multiple Collectors and pass a kind of Multi-Collector
to IndexSearcher, then you should
the right TSDC ... I like it,
option 1 it is minimum user code.
Cheers, eks
- Original Message
From: Shai Erera ser...@gmail.com
To: java-user@lucene.apache.org
Sent: Wednesday, 30 September, 2009 17:12:38
Subject: Re: TSDC, TopFieldCollector co
I agree. If you need sort
)
are these objects mutable?
- Original Message
From: Shai Erera ser...@gmail.com
To: java-user@lucene.apache.org
Sent: Wednesday, 30 September, 2009 18:11:03
Subject: Re: TSDC, TopFieldCollector co
BTW eks, you asked about reusing TSDC. PQ has a clear() method, so it can
Hi
I index documents with numeric fields using the new Numeric package. I
execute two types of queries: range queries (for example, [1 TO 20}) and
equality queries (for example 24.75). Don't mind the syntax.
Currently, to execute the equality query, I create a NumericRangeQuery with
the
-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Shai Erera [mailto:ser...@gmail.com]
Sent: Wednesday, November 11, 2009 2:55 PM
To: java-user@lucene.apache.org
Subject: Equality Numeric Query
Hi
I index documents with numeric
Thanks a lot for the super fast response !
Shai
On Wed, Nov 11, 2009 at 4:21 PM, Uwe Schindler u...@thetaphi.de wrote:
No.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Shai Erera [mailto:ser
Hi
I started to migrate my Analyzers, Tokenizer, TokenStreams and TokenFilters
to the new API. Since the entire set of classes handled Token before, I
decided to not change it for now, and was happy to discover that Token
extends AttributeImpl, which makes the migration easier.
So I started w/
(Token)
-- hasA(Term) -- getA(Term) -- cast to Token ...
I don't know if this is a bug or not, but it's strange.
Shai
On Sun, Nov 22, 2009 at 1:12 PM, Shai Erera ser...@gmail.com wrote:
Hi
I started to migrate my Analyzers, Tokenizer, TokenStreams and TokenFilters
to the new API. Since
) -- getA(Term) -- cast to Token ...
I don't know if this is a bug or not, but it's strange.
Shai
On Sun, Nov 22, 2009 at 1:12 PM, Shai Erera ser...@gmail.com wrote:
Hi
I started to migrate my Analyzers, Tokenizer, TokenStreams and
TokenFilters
to the new API. Since
But I do use addAttribute(Token.class), so I don't understand why you say
it's not possible. And I completely don't understand why the new API allows
me to just work w/ interfaces and not impls ... A while ago I got the
impression that we're trying to get rid of interfaces because they're not
easy
ok so from what I understand, I should stop working w/ Token, and move to
working w/ the Attributes.
addAttribute indeed does not work. Even though it does not through an
exception, if I call in.addAttribute(Token.class), I get a new instance of
Token and not the once that was added by in. So
with restoreState to the TokenStream. CachingTokenFilter does this.
So the new API uses the State object to put away tokens for later
reference.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Shai
is that?
Shai
On Sun, Nov 22, 2009 at 3:57 PM, Shai Erera ser...@gmail.com wrote:
Perhaps I misunderstand something. The current use case I'm trying to solve
is - I have an abbreviations TokenFilter which reads a token and stores it.
If the next token is end-of-sentence, it checks whether
-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Shai Erera [mailto:ser...@gmail.com]
Sent: Sunday, November 22, 2009 3:28 PM
To: java-user@lucene.apache.org
Subject: Re: How to deal with Token in the new TS API
What I've done
-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Shai Erera [mailto:ser...@gmail.com]
Sent: Sunday, November 22, 2009 3:28 PM
To: java-user@lucene.apache.org
Subject: Re: How to deal with Token in the new TS API
What
://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Shai Erera [mailto:ser...@gmail.com]
Sent: Sunday, November 22, 2009 7:53 PM
To: java-user@lucene.apache.org
Subject: Re: How to deal with Token in the new TS API
Yes I can clone the term itself
(TermAttribute.class);
By that you guarantee, that both are from the same implementation type.
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Shai Erera [mailto:ser
of clones, I'll create
Token and populate it w/ what I need, just for convenience ...
Thanks,
Shai
On Sun, Nov 22, 2009 at 9:23 PM, Shai Erera ser...@gmail.com wrote:
I assume termAtt is the input's TermAttribute, right? Therefore it has no
copyTo ...
What I've done so far is create a TermAttribute
Hi
First you can use MatchAllDocsQuery, which matches all documents. It will
save a HUGE posting list (TAG:TAG), and performs much faster. For example
TAG:TAG computes a score for each doc, even though you don't need it.
MatchAllDocsQuery doesn't.
Second, move away from Hits ! :) Use Collectors
Robert, what if I need to do additional filtering after CollationKeyFilter,
like stopwords removal, abbreviations handling, stemming etc? Will that be
possible if I use CollationKeyFilter?
I also noticed CKF creates a String out of the char[]. If the code already
does that, why not use
.
An easy example, the lowercase of ß is ß itself, it is already lowercase.
it will not match with 'SS' if you use lowercase filter.
if you use case folding, these two will match.
On Mon, Nov 30, 2009 at 2:53 PM, Shai Erera ser...@gmail.com wrote:
Robert, what if I need to do additional
double normalization and folding for
better performance.
On Mon, Nov 30, 2009 at 3:41 PM, Shai Erera ser...@gmail.com wrote:
Thanks Robert. In my Analyzer I do case folding according to Unicode
tables.
So ß is converted to SS. I do the same for diacritic removal and
Hiragana/Katakan folding
Hi
We've run into problems w/ LockFactory usage on our system. The problem is
that the system can be such that the index is configured on a local file
system, or a remote, shared one. If remote/share, the protocol is sometimes
SMB 1.0/2.0, Samba, NFS 3/4 etc. In short, we have no control on the
I have multiple JVMs on different machines accessing the shared file system.
I don't really have multiple IndexWriters on the same JVM, I asked this just
out of curiosity.
So I don't understand from your reply if it's safe to use NoLockFactory, or
I should use SimpleFSLockFactory and unlock if
]!
Mike
On Wed, Dec 2, 2009 at 8:24 AM, Shai Erera ser...@gmail.com wrote:
I have multiple JVMs on different machines accessing the shared file
system.
I don't really have multiple IndexWriters on the same JVM, I asked this
just
out of curiosity.
So I don't understand from your reply
Hi Max,
In 3.0.0 (actually in 2.9.0 already), Lucene moved to execute its searches
one sub-reader at a time. As a consequence, absolute docIDs are not passed
to the collect method anymore, but instead the relative docIDs of that
reader. An example, suppose you have 2 segments, with 6 documents
Hi
I remember a while ago a discussion around the efficiency of TermDocs.seek
and how it is inefficient and it's better to call IndexReader.termDocs
instead (actually someone was proposing to remove seek entirely from the
interface because of that). I've looked at FieldCacheImpl's
...@mikemccandless.com wrote:
On Sun, Jan 17, 2010 at 5:01 AM, Shai Erera ser...@gmail.com wrote:
I remember a while ago a discussion around the efficiency of
TermDocs.seek
and how it is inefficient and it's better to call IndexReader.termDocs
instead (actually someone was proposing to remove seek
We've worked around that problem by doing two things:
1) We notify all nodes in the cluster when the index has committed (we use
JMS for that).
2) On each node there is a daemon which waits on this JMS queue, and once
the index has committed it reopens an IR, w/o checking isCurrent(). I think
that
a reopen (normal or NRT) and warming is taking place. (NOTE:
I'm one of the authors on Lucene in Action 2nd edition!). But it
doesn't do the communication part, to know when it's time to reopen.
Mike
On Wed, Jan 20, 2010 at 9:32 AM, Shai Erera ser...@gmail.com wrote:
We've worked around
What Analyzer are you using? zzBuffer belongs to the tokenizer's automaton
that is generated by JFlex. I've checked StandardTokenizerImpl and zzBuffer
can grow, beyond the default 16KB, but yours look to be a lot bigger (33 MB
!?). The only explanation I have to this is that you're trying to (or
1 - 100 of 327 matches
Mail list logo