On 1/15/15 11:23 AM, danield wrote:
Hi Mike,
Thank you for your reply. Yes, I had thought of this, but it is not a
solution to my problem, and this is because the Term Frequency and therefore
the results will still be wrong, as prepending or appending a string to the
term will still make it a
Is your index on a remote file system?
You could be hitting https://issues.apache.org/jira/browse/LUCENE-5541
Best to upgrade Lucene (we stopped relying on File.exists a while back).
Mike McCandless
http://blog.mikemccandless.com
On Thu, Jan 15, 2015 at 12:10 PM, Ian Koelliker
Thanks for the information. We are going to be upgrading to a newer version of
Lucene, but we cannot upgrade to version 4.x yet due to the fact that 4.x
cannot read older index formats. The best we can do currently is upgrade to the
latest 3.x which appears to have the same problem.
Hello,
We are seeing some weird instances of index corruption periodically when using
Lucene 2.9.4. There are two specific cases we are seeing.
1) We are using the compound format and have noticed that sometimes we get
errors when searching noting that files are missing (i.e. .fnm, .fdt,
Oh thanks Mike, it did say somewhere. I guess it wouldn't hurt to make that
explanation more prominent, as I clearly missed it.
Never mind, I am working on my own solution for this, through subclassing
QueryParser, BooleanQuery, BooleanScorer, Similarity and a bunch of other
classes.
Cheers,
Hi Mike,
Thank you for your reply. Yes, I had thought of this, but it is not a
solution to my problem, and this is because the Term Frequency and therefore
the results will still be wrong, as prepending or appending a string to the
term will still make it a different term.
Similarily, I could
Hello,
i have a question how Lucene indexes? I have sentence and tokenized it at
tokens and index save only tokens?Or original sentence too ? When i want to
see for example sentence with id 1, it lucene build this sentence from
tokens where are saved in index? Or the sentence is indexed too ?And
File a Jira for this particular doc fix since it is significant and not
just mere worksmithing. Better yet, submit a patch since that's Javadoc,
although the exact form of the doc fix might be debatable, so I general
description of the problem should be sufficient, unless you feel motivated.
--
Basically there is a stored fork and an indexed fork.
If you specify the input should be stored, a verbatim
copy is put in a special segment file with the
extension .fdt.
This is entirely orthogonal to indexing the tokens,
which are what search operates on.
So you can store and index, store but
Normally Lucene will count your d1 as having length=2.
However, if la was added as a synonym for los angeles, such that
it overlaps its position, then the default similarity discounts that
and will count it as length=1.
But for that to work, the position of the 2nd token must be the same
as the
Hi,
I am using lucene to index documents that have a multivalued text field
named ‘city’.
Each document might have multiple values for this field, like la, los
angeles etc.
Assuming
document d1 contains city = la ; city = los angeles
document d2 contains city = la mirada
document d3 contains city
How are you storing the id field? A wild guess might be that this
error might be caused by having some documents with id stored,
perhaps, as a StringField or TextField and some as an IntField.
--
Ian.
On Wed, Jan 14, 2015 at 2:07 PM, Sascha Janz sascha.j...@gmx.net wrote:
hello,
i am
they are all stored like this
Field fid = new Field(id, , Field.Store.YES, Field.Index.NOT_ANALYZED,
Field.TermVector.NO);
fid.setStringValue(Integer.toString(id));
because of reusing fid i have to set the value this way.
Gesendet: Donnerstag, 15. Januar 2015 um 13:12 Uhr
Von: Ian Lea
i'm struggling with a migration from lucene 2.4 to 2.9
I'm trying to migrate from SortComparatorSource to FieldComparator.
I cannot make it works right after a lot of tests.
I noted that inside the setNextReader method not all the stored field's
terms are retrieved.
For example i have one
On 1/15/15 4:34 AM, rama44ster wrote:
Hi,
I am using lucene to index documents that have a multivalued text field
named ‘city’.
Each document might have multiple values for this field, like la, los
angeles etc.
Assuming
document d1 contains city = la ; city = los angeles
document d2 contains
15 matches
Mail list logo