int gap = (pp[pp.length-1] - pp[0]) - (pp.length - 1);
Don't want to cause an IndexOutOfBoundsException
Right... that's what I meant with (boundary cases)...
In my particular case I add album catalogsno to my index as a keyword
field , but of course if the cat log number contains a space as they often
do (i.e. cad 6) there is a mismatch. Ive now changed my indexing to index
the value as 'cad6' removing spaces. Now if the query sent to the query
Hi,
Code here ignores PhraseQuery (PQ) 's positions:
int[] pp = PQ.getPositions();
These positions have extra gaps when stop words are removed.
To accommodate for this, the overall extra gap can be added to the slope:
int gap = (pp[pp.length] - pp[0]) - (pp.length - 1); // (+/-
Sequence of operations seems logical, I don't see straight why this does
not work.
Could you minimize this to a small stand-alone program that does not work
as expected? This will allow to recreate the problem here and debug it.
It is interesting that facet 3.5 is used with core 3.4 and queries
Could you minimize this to a small stand-alone program that does not work
as expected?
This will be hard, because of the bug only appearing after a couple of days
or more and i'm starting to think that it is triggered by high data
volumes. I'll try to minimize the code and serve more data
However there are at least two issues with this:
1) the info would be in the lower level of the internal index writer, and
not in that of the categories logic.
2) one cannot just call super.openIndexWriter(directory, openMode) and
modify the result before returning it, because once IW is
I'm having an issue with using NRT and Tax. After a couple of days of
running continuously , the taxonomyreader doesn't return results anymore
(but taxindex has them).
Taxonomy Reader does not support NRT - see
https://issues.apache.org/jira/browse/LUCENE-3441 (Add NRT support to
To my understanding this stems from V(q) · V(d) (see the *Conceptual
Scoring Formula*) - the elements in those vectors are *Tf-idf* values, and
so, implementation wise (see the *Practical Scoring Function*), idf(t) is
multiplied by itself: once for the query and once for the document.
HTH,
Doron
Looking into this with Shai I think we see how this can happen, in this code
of LTW:
private ParentArray getParentArray() throws IOException {
if (parentArray==null) { // [1]
if (reader == null) {
reader = openReader();
}
parentArray = new ParentArray(); // [2]
On Tue, Oct 4, 2011 at 11:29 AM, Mihai Caraman caraman.mi...@gmail.comwrote:
I also think that there is nothing special in the second restart, except
that that by that time there were other servlets up (?) which were able
to
trigger simultaneous AddDoc requests, exposing this bug...
LUCENE-3484 is resolved.
Mihai, could you give it a try and see if this solves the NPE problem in
your setup?
You would need to download a nightly build that contains the fix - see the
issue for revision numbers...
On Tue, Oct 4, 2011 at 7:51 PM, Mihai Caraman caraman.mi...@gmail.comwrote:
Hi Rich,
SeetSpotSimilarity looks promising. Does it not favor shorter docs by not
normalizing or does it make some attempt to standardized.
- using e.g. SeetSpotSimilarity which do not favor shorter documents.
SweetSpotSimilarity (I misspelled it previously) defines a range of lengths
Hi Greg,
I created http://issues.apache.org/jira/browse/LUCENE-3120 for this problem,
and attached there a more general test that exposes this problem, based on
your test case.
I am not sure yet that this is indeed a problem to be fixed with regard to
span queries (see more there in JIRA) but
Hi Greg,
On Thu, May 19, 2011 at 12:26 PM, Gregory Tarr gregory.t...@detica.comwrote:
We let our users decide whether they want to force the order or not, so
in effect they pass in inOrder.
I would have to detect a repeated term and change the parameter as a
result of that in order to
Hi Rich,
If I understand correctly you are concerned that short documents are
preferred too much over long ones, is this really the case?
It would help to understand what goes on to look at the Explanation of the
score for say two result documents - one that you think is ranked too low,
and one
IIUC what you are trying to achieve I think the following could help,
without setting all words in a line to be in the same position:
At indexing, set a position increment of N (e.g. 100) at line start tokens.
This would set a position gap of N between last token of line x to first
token of line
On Mon, Jan 10, 2011 at 7:44 PM, Ryan Aylward r...@glassdoor.com wrote:
We do leverage synonyms but they are not appropriate for this case. We use
synonyms for words that are truly synonymous for the entire index such as
inc and incorporated. Those words are always interchangeable. However,
I have a app that seems to be locking on some search calls. I am
including
the stacktrace for the blocked and blocker thread.
Is it daedlock for sure?
No search deadlock fixes were done since 2.1.0, so perhaps it is something
else...
TP-Processor177 daemon prio=10
I could make an exception in the patch creation program to detect
that there is a lucene directly, and diff the .cfs files, even if
they have different names, but was seeing if I can avoid that
so the patch program can be agnostic about the contents of the
directory tree.
Doing only this is
Also, when taking the Similarity suggestion below note two things in
Lucene's default behavior that you seem to wish to avoid:
The first is IDF - but only for multi-term queries - otherwise ignore this
comment.
For multi term queries to only consider term frequency and doc length, you
may want to
Searcher is quite light.
It is the index reader that is heavier.
So create a single index reader, for each of the similarities
to be use concurrently, create a searcher over that single
reader, set its similarity, and so on.
Doron
On Mon, Apr 27, 2009 at 7:53 PM, Rakesh Sinha
On Fri, Apr 24, 2009 at 12:28 AM, Steven Bethard beth...@stanford.eduwrote:
On 4/23/2009 2:08 PM, Marcus Herou wrote:
But perhaps one could use a FieldCache somehow ?
Some code snippets that may help. I add the PageRank value as a field of
the documents I index with Lucene like this:
On Thu, Apr 23, 2009 at 11:52 PM, bill.che...@sungard.com wrote:
I figured it out. We are using Hibernate Search and in my ORM class I
am doing the following:
@Field(index=Index.TOKENIZED,store=Store.YES)
protected String objectId;
So when I persisted a new object to our database I was
On Thu, Apr 23, 2009 at 10:39 PM, bill.che...@sungard.com wrote:
I'm getting a strange error when I make a Lucene (2.2.0) query:
java.lang.RuntimeException: there are more terms than documents in field
objectId, but it's impossible to sort on tokenized fields
Is it possible that, for at
I think we are doing similar things, at least I am trying to implement
document boosting with pagerank. Having issues of howto appky the scoring
of
specific docs without actually reindex them. I feel something should be
done
at query time which looks at external data but do not know howto
:
On 4/21/2009 10:09 AM, Doron Cohen wrote:
It could, but (historically and) currently it doesn't... :)
I actually have code for this.
Would you like open a JIRA issue for this - I'll attach my wrapper there?
Done.
https://issues.apache.org/jira/browse/LUCENE-1608
Steve
On Tue, Apr 21
CustomScoreQuery expects the VSQs to have a score for document matching the
(main) subQuery - this does not hold for arbitrary queries.
On Sat, Apr 18, 2009 at 2:35 AM, Steven Bethard beth...@stanford.eduwrote:
CustomScoreQuery only allows the secondary queries to be of type
ValueSourceQuery
*IndexWriter.deleteDocumentshttp://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/index/IndexWriter.html#deleteDocuments%28org.apache.lucene.search.Query%29
*(Queryhttp://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/search/Query.html
query) may be handy too
(but note that it
Depending on the problem you are trying to solve there may be other
solutions to it, not requiring setting wrong (?) values for term
frequencies.
If you can explain what you are trying to solve, people on the list may
be able to suggest such alternatives.
- Doron
On Sun, Apr 19, 2009 at 2:39 PM,
didn't quite understand how to
use
it.
Is there a better way to approach it?
I hope I explained it well.
Thanks,
Liat
2009/4/21 Doron Cohen cdor...@gmail.com
Depending on the problem you are trying to solve there may be other
solutions to it, not requiring setting wrong (?) values
It could, but (historically and) currently it doesn't... :)
I actually have code for this.
Would you like open a JIRA issue for this - I'll attach my wrapper there?
Doron
On Tue, Apr 21, 2009 at 7:58 PM, Steven Bethard beth...@stanford.eduwrote:
On 4/21/2009 12:47 AM, Doron Cohen wrote
On Tue, Aug 19, 2008 at 2:15 AM, Antony Bowesman [EMAIL PROTECTED] wrote:
Thanks for you time and I appreciate your valuable insight Doron.
Antony
I'm glad I could help!
Doron
payload and the other part for storing, i.e. something like this:
Token token = new Token(...);
token.setPayload(...);
SingleTokenTokenStream ts = new SingleTokenTokenStream(token);
Field f1 = new Field(f,some-stored-content,Store.YES,Index.NO);
Field f2 = new Field(f, ts);
On Mon, Aug 18, 2008 at 7:28 AM, blazingwolf7 [EMAIL PROTECTED]wrote:
Thanks for the info. But do you know where this is actually perform in
Lucene? I mean the method involved, that will calculate the value before
storing it into the index. I track it to one method known as lengthNorm()
in
Implementing payloads via Tokens explicitly prevents the use of payloads
for untokenized fields, as they only support field.stringValue(). There
seems no way to override this.
I assume you already know this but just to make sure what I meant was clear
- on tokenization but still indexing
Norms information comes mainly from lengths of documents - allowing the
search time scoring to take into account the effect of document lengths
(actually
field length within a document). In practice, norms stored within the index
may include
other information, such as index time boosts - for a
Hi Sergey, seems like case 4 and 5 are equivalent,
both meaning case insensitive right. Otherwise please
explain the difference.
If it is required to support both case sensitive
(cases 1,2,3) and case insensitive (case 4/5) then
both forms must be saved in the index - in two separate
fields (as
IIRC first versions of patches that added payloads support had this notion
of payload by field rather than by token, but later it was modified to be by
token only.
I have seen two code patterns to add payloads to tokens.
The first one created the field text with a reserved separator/delimiter
In example I want to show what I stored field as Field.Index.NO_NORMS
As I understand it means what field contains original string
despite what analyzer I chose(StandardAnalyzer by default).
This would be achieved by UN_TOKENIZED.
The NO_NORMS just guides Lucene to avoid normalizing
The code seems correct (although it doesn't show
which analyzer was used at indexing).
Note that when adding numbers like this there's no real point in analyzing
them,
so I would add that field as UN_TOKENIZED. This would be more efficient,
and would also comply with the query parser who does not
I can't see how to accomplish this without writing some special code,
and not just because of query parsing.
Phrases are searched by iterating the participating term
positions and when a match is found say for b c there is no way
to know whether another query a b c d matches exactly the
I think it should look something like this
white house NOT russian white house~1
a b c~1 just matches more 'easily' than a b c.
It will match for instance a b d c. The NOT however
excludes all documents which match this, unlike requested logic.
In fact,
Q1: a b NOT a b c~1
is worse
I believe Highlighter.setMaxDocBytesToAnalyze(int byteCount) should be used
for this.
On Mon, Aug 11, 2008 at 11:40 AM, [EMAIL PROTECTED] wrote:
Hello
I am using Highlighter to highlight query terms in documents getting from a
database founded from lucene search.
My problem is that when i
doc.add(new Field(ID_FIELD, id, Field.Store.YES, Field.Index.NO));
writer.deleteDocuments(new Term(ID_FIELD, id));
int i = reader.deleteDocuments(new Term(ID_FIELD, id)); //i returns 0
Both failed. I try to delete one id value that I know for sure it was added
in the first step.
For
writer = new IndexWriter(C:\\, new StandardAnalyzer(), true);
Term term = new Term(line, KOREA);
PhraseQuery query = new PhraseQuery();
query.add(term);
StandardAnalyzer - used here while indexing - applies lowercasing.
The query is created programatically - i.e. without a QueryParser
Ok, I'm not near any documentation now, but I think
throwing an exception is overkill. As I remember
all you have to do is return false from your collector
and that'll stop the search. But verify that.
That would have been much cleaner, however collect() is a void,
so throwing a (runtime)
When combining any sub queries, a scorer has at least two things to decide:
which docs to match, and once matched, how to score. Boolean Queries applies
specific logics for this, and some queries allow some control of the way to
score. For current CustomScoreQuery things are more straightforward -
Nothing built in that I'm aware of will do this, but it can be done by
searching with your own HitCollector.
There is a related feature - stop search after a specified time - using
TimeLimitedCollector.
It is not released yet, see issue LUCENE-997.
In short, the collector's collect() method is
On Tue, Jun 10, 2008 at 3:50 AM, Otis Gospodnetic
[EMAIL PROTECTED] wrote:
Hi Glen,
Thanks for sharing. Does your benchmarking tool build on top of
contrib/benchmark? (not sure if that one lets you specify the number of
concurrent threads -- if it does not, perhaps this is an opportunity
Hi John,
IndexReader newInner=in.reopen();
if (in!=newInner)
{
in.close();
this.in=newInner;
// code to clean up my data
_cache.clear();
_indexData.load(this, true);
init(_fieldConfig);
}
Just to be sure on this, could you
: The crux of the issue seems to be that lucene cannot open segments file
that
: is inside the jar (under luceneFiles/index directory)
i'm not entirely sure why it would have problems finding the segments
file, but a larger problem is that Lucene needs random access which (last
time i
Hi Jarvis,
I have a problem that how to combine two score to sort the search
result documents.
for example I have 10 million pages in lucene index , and i know their
pagerank scores. i give a query to it , every docs returned have a
lucene-score, mark it as R (relevant score), and
hi Jake, yes it was commited in Lucene - this is visible in the JIRA issue
when if you switch to the Subversion Commits tab. where you can also
see the actual diffs that took place.
Best,
Doron
On Tue, Mar 18, 2008 at 7:14 PM, Jake Mannix [EMAIL PROTECTED] wrote:
Hey folks,
I was wondering
Hi Daniel, LUCENE-1228 fixes a problem in IndexWriter.commit().
I suspect this can be related to the problem you see though I am not sure.
Could you try with the patch there?
Thanks,
Doron
On Thu, Mar 13, 2008 at 10:46 AM, Michael McCandless
[EMAIL PROTECTED] wrote:
Daniel Noll wrote:
On
On Thu, Mar 13, 2008 at 9:30 PM, Doron Cohen [EMAIL PROTECTED] wrote:
Hi Daniel, LUCENE-1228 fixes a problem in IndexWriter.commit().
I suspect this can be related to the problem you see though I am not sure.
Could you try with the patch there?
Thanks,
Doron
Daniel, I was wrong about
Take a look at the quality package under contrib/benchmark.
Regards,
Doron
On Sat, Feb 9, 2008 at 2:59 AM, Panos Konstantinidis [EMAIL PROTECTED]
wrote:
Hello I am a new lucene user. I am trying to calculate the
recall/precision of
a query and I was wondering if lucene provides an easy way
Should be the parenthesis which are part of the query syntax
Try escaping - \( \)
Also see
http://lucene.apache.org/java/2_3_0/queryparsersyntax.html#Escaping%20Special%20Characters
Doron
On Sun, Feb 10, 2008 at 9:03 AM, saikrishna venkata pendyala
[EMAIL PROTECTED] wrote:
Hi,
I am facing
PhraseQuery.extractTerms() returns the terms making up the phrase,
and so it is not adequate for 'finding' a single term that represents
the phrase query, one that represents the searched entire text.
It seems you are trying to obtain a string that can be matched against
the displayed text for
I was once involved in modified a search index
implementation (not Lucene) to write posting lists so that
they can be traversed (only) in reverse order. Docids
were preserved but you got higher IDs first. This was
a non-trivial code change.
Now the suggestion to (optionally) order merged
segments
This may help:
http://www.nabble.com/Updating-Lucene-Index-with-Unstored-fields-tt15188818.html#a15188818
Doron
On Thu, Jan 31, 2008 at 2:42 AM, John Wang [EMAIL PROTECTED] wrote:
Hi all:
We have a large index and it is difficult to reindex.
We want to add another field to the index
Hi Grant, I initially thought of doing so, but after working on the Million
Queries Track where running the 10,000 queries could take more than a day
(depending on the settings) and where indexing was done once and took few
days I felt that a more tight control is needed than that provided by the
Hi Ajay,
IndexReader.unlock() is a brute force call to be used by applications/users
knowing that a lock can be safely removed.
finalize() on the other hand is a method that Java will call when garbage
collecting a no-more-referenced object. So it is often a place for cleanup
code. However the
You can add phrase on the writer field.
I.e. with high boost of 3 and low boost of 2,
writing 'h' for 'heading' and 'w' for 'writer',
try this query:
h:sachin^3 d:tendulkar^3 w:sachin^2 w:tendulkar^2 w:h:Sachin
Tendulkar^6
On Jan 29, 2008 9:23 AM, Sure [EMAIL PROTECTED] wrote:
Hi All,
Hi Marjan,
Lucene process the query in what can be called
one-doc-at-a-time.
For the example query - x y - (not the phrase query x y) - all
documents containing either x or y are considered a match.
When processing the query - x y - the posting lists of these two
index terms are traversed, and
Hi Chris,
A null pointer exception can be causes by not checking
newToken for null after this line:
Token newToken = input.next()
I think Hoss meant to call next() on the input as long as returned
tokens do not satisfy the check for being a named entity.
Also, this code assumes white space
Hi Michael, I think you mean the exception thrown when you
search and sort with a field that was not yet indexed:
RuntimeException: field BBC does not appear to be indexed
I think the current behavior is correct, otherwise an application
might (by a bug) attempt to sort by a wrong field,
This is done by Lucene's scorers. You should however start
in http://lucene.apache.org/java/docs/scoring.html, - scorers
are described in the Algorithm section. Offsets are used
by Phrase Scorers and by Span Scorer.
Doron
On Jan 8, 2008 11:24 PM, Marjan Celikik [EMAIL PROTECTED] wrote:
Doron
On Jan 8, 2008 11:48 PM, chris.b [EMAIL PROTECTED] wrote:
Wrapping the whitespaceanalyzer with the ngramfilter it creates unigrams
and
the ngrams that i indicate, while maintining the whitespaces. :)
The reason i'm doing this is because I only wish to index names with more
than one token.
Or, very similar, wrap the 'real' analyzer A with your analyzer that
delegates to A but also keeps the returned tokens, possibly by
using a CachingTokenFilter.
On Jan 7, 2008 7:11 AM, Daniel Noll [EMAIL PROTECTED] wrote:
On Monday 07 January 2008 11:35:59 chris.b wrote:
is it possible to add
This is not a self contained program - it is incomplete, and it depends
on files on *your* disk...
Still, can you show why you're saying it indexes stopwords?
Can you print here few samples of IndexReader.terms().term()?
BR, Doron
On Dec 27, 2007 10:22 AM, Liaqat Ali [EMAIL PROTECTED] wrote:
On Dec 27, 2007 11:49 AM, Liaqat Ali [EMAIL PROTECTED] wrote:
I got your point. The program given does not give not any error during
compilation and it is interpreted well. But the it does not create any
index. when the StandardAnalyzer() is called without Stopwords list it
works well, but
PROTECTED] wrote:
Doron Cohen wrote:
On Dec 27, 2007 11:49 AM, Liaqat Ali [EMAIL PROTECTED] wrote:
I got your point. The program given does not give not any error during
compilation and it is interpreted well. But the it does not create any
index. when the StandardAnalyzer() is called
can we modify the StopyAnalyzer to insert Stop Words of
another language, instead of English, like Urdu given below:
public static final String[] URDU_STOP_WORDS = { پر, کا, کی, کو };
new StandardAnalyzer(URDU_STOP_WORDS) should work.
Regards,
Doron
On Dec 26, 2007 10:33 PM, Liaqat Ali [EMAIL PROTECTED] wrote:
Using javac -encoding UTF-8 still raises the following error.
urduIndexer.java : illegal character: \65279
?
^
1 error
What I am doing wrong?
If you have the stop-words in a file, say one word in a line,
they can be read like
document.add(new Field(contents,sb.toString(),
Field.Store.NO, Field.Index.TOKENIZED));
In addition, for tokenized but not stored like here, the Field()
constructor that takes a Reader param can be handy here.
Regards, Doron
etc... when will they be gone?
thanks
- Original Message
From: Doron Cohen [EMAIL PROTECTED]
To: java-user@lucene.apache.org
Sent: Wednesday, December 19, 2007 1:13:56 PM
Subject: Re: document deletion problem
On Dec 19, 2007 5:45 PM, Tushar B wrote:
Hi Doron,
I
On Dec 19, 2007 5:45 PM, Tushar B [EMAIL PROTECTED] wrote:
Hi Doron,
I was just playing around with deletion because I wanted to delete
documents due to spurious entries in one particular field. Could you tell me
how do I file a JIRA issue?
See Lucene's wiki, at page HowToContribute.
Hi Rakesh,
Perhaps the confusion comes from the asymmetry
between +X and -X. I.e., for the query:
A B -C +D
one might think that, similar to how -C only disqualifies docs
containing C (but not qualifying docs not containing C), also
+D only disqualifies docs not containing D. But this is
Hi Rakseh,
It just occurred to me that your code has
String searchCriteria = Indoor*;
Assuming StandardAnalyzer used at indexing time, all text words were
lowercased. Now, QueryParser by default does not lowercase wildcard
queries. You can however instruct it to do so by calling:
See in Lucene FAQ:
Are Wildcard, Prefix, and Fuzzy queries case sensitive?
On Dec 17, 2007 11:27 AM, Helmut Jarausch [EMAIL PROTECTED]
wrote:
Hi,
please help I am totally puzzled.
The same query, once with a direct call to FuzzyQuery
succeeds while the same query with QueryParser fails.
It seems that documents having less fields satisfying
the query worth more than those satisfying more fields
of the query, because the first ones are more to
the point.
At least it seems like it in the example.
If this makes sense I would try to compose a top level
boolean query out of the
Yes that's right, my mistake.
In fact even after reading your comment I was puzzled
because PhraseScorer indeed requires *all* phrase-positions
to be satisfied in order to match. The answer is that
the OR logic is taken care of by MultipleTermPositions,
so the scorer does not need to be aware of
while (termDocs.next()) {
termDocs.next();
}
For one, this loop calls next() twice in each iteration,
so every second is skipped... ?
chris.b [EMAIL PROTECTED] wrote on 10/12/2007 12:58:15:
Here goes,
I'm developing an application using lucene which will
Seen as that solved all my problems (i think),
Glad it helped! (btw it's always like this with,
debugging - others see stuff in my code that I don't)
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail:
smokey [EMAIL PROTECTED] wrote on 04/12/2007 16:54:32:
Thanks for the information on o.a.l.search.spans.
I was thinking of parsing the phrase query string into a
sequence of terms,
then constructing a phrase query object using add(Term term,
int position)
method in
1) Downloaded http://www.ehatchersolutions.com/downloads/
LuceneInAction.zip - sorry, lucenebook.com is broken at the moment :(
This one works too -
http://www.manning.com/hatcher2/ -- Downloads -- Source Code
-
To
I didn't have performance issues when using the spell checker.
Can you describe what you tried and how long it took, so
people can relate to that.
AFAIK the spell checker in o.a.l.search.spell does not expand
a query by adding all the permutations of potentially misspelled
word. It is based on
It doesn't make sense to optimize() after every document add.
Lucene in fact implements a logic in the spirit of what you
describe below, when it decides to merge segments on the fly.
There are various ways to tell Lucene how often to flush
recently added/updated documents and what to merge.
But
See below -
smokey [EMAIL PROTECTED] wrote on 03/12/2007 05:14:23:
Suppose I have an index containing the terms impostor,
imposter, fraud, and
fruad, then presumably regardless of whether I spell impostor and fraud
correctly, Lucene SpellChecker will offer the improperly
spelled versions as
MultiReader is more efficient and is preferred when possible.
MultiSearcher allows further functionality.
Every time an index has more than a single segment (which is.
to say almost every index except for after calling optimize()),
Opening an IndexReader (or an IndexSearcher) above that index),
This is from Lucene's CHANGES.txt:
LUCENE-773: Deprecate the FSDirectory.getDirectory(*) methods that
take a boolean create argument. Instead you should use
IndexWriter's create argument to create a new index.
(Mike McCandless)
So you should create the FSDir with
You can also rely on that by default documents are
collected in-docid-order. You can therefore use your own
hit collector that when collecting doc with id n2,
assuming the previous doc collected had id n1,
would (know to) assign score 0 to all docs
with: n1 id n2.
In other words, you can know
Try java -verbose to see more info on class loading.
Also try java -classpath=yourClassPath from command line.
Note that separators in the classpath may differ between operating
systems - e.g. ; in Windows but : in Linux...
Doron
Liaqat Ali [EMAIL PROTECTED] wrote on 19/11/2007 15:43:30:
Hi
is that the network can give you
reasonable results even for words which haven't been used
before (as least
that is what the book seems to claim).
Regards,
Lukas
On 10/16/07, Doron Cohen [EMAIL PROTECTED] wrote:
Where and how do you store this type of info:
If user U1 search for query Q7 boost
Lukas Vlcek [EMAIL PROTECTED] wrote on 25/10/2007 10:25:23:
Doron,
You definitely added few important (crucial) questions. There
are important
concerns and I am glad to hear that Lucene community is
debating them. I am
not an Lucene viscera expert thus I can hardly compare simple
search
You could use ValueSourceQuery for this - see
o.a.l.search.function. The trick is to create
your ValueSource class that is using two
FieldCacheSource objects - one for each location.
See http://issues.apache.org/jira/browse/LUCENE-1019
for a related example.
Note however that this solution would
Hi Grant,
Grant Ingersoll wrote:
I think the answer is:
[{ MAddDocs AddDoc } : 5000] : 4
Is this the functional equivalent of doing:
{ MAddDocs AddDoc } : 2
in parallel?
Yes, this is correct, it reads as create 4 threads, each
adding 5000 docs to the index, and start/run the 4
Where and how do you store this type of info:
If user U1 search for query Q7 boost doc D5 by B17
If user U2 search for query Q3 boost doc D15 by B2
Seems lots of info, and it must be persistent.
Perhaps o.a.l.search.function can help - assuming you
have this info available at search time, and
Hi Scott,
Would indexing time field boosts work for you?
http://lucene.apache.org/java/docs/scoring.html#Score%20Boosting
Doron
Scott Phillips wrote:
Hi everyone,
I have a question that I can't quite seem to find the answer to by
googling or searching the archives of this mailing list. The
For an already optimized index calling optimize() is a no-op.
You may try this: after opening the writer and setting compound=false, add
a dummy (even empty) document to the index, then optimize(), and finally
optionally remove the dummy document.
Note that calling optimize() might be lengthy as
1 - 100 of 311 matches
Mail list logo