I did ask if there was anything else relevant you'd forgotten to mention ...
How fast are general file operations on the NFS files? Your times are
still extremely long and my guess is that your network/NFS setup are
to blame.
Can you run your code on the server that is exporting the index, if
Hi,
If I index several similar values in a multivalued field (e.g. many authors to
one book), is there any way to know which of these matched during a query?
e.g.
Book The art of Stuff, with authors Bob Thingummy and Belinda Bootstrap
If we queried for +(author:Be*) and matched this
Thanks for the response. I wrote some new custom payload functions to verify
that I'm getting the value correctly and I think I am, but I did unearth
this clue.
In the docs below, the score should be the sum of all the payloads for the
term bing. It appears to be using the value for the first
all statistics in lucene are per field so is document frequency
simon
On Fri, Mar 22, 2013 at 10:48 AM, Nicole Lacoste niki.laco...@gmail.com wrote:
Hi
I am trying to figure out if the document-frequency of a term used in
calculating the score. Is it per field? Or is independent of the
OK, I've played with all this solutions and basically only one gave me
satisfying results. Using build()
with TermFreqPayload argument gave me horrible performance, because it
takes more than 5 mins
to iterate through all Terms in the index and to filter them based on the
doc id. Not sure if this
I don't think there is a way of identifying which of the values of a
multivalued field matched. But... I haven't checked the code to be
absolutely certain whether their isn't some expert way.
Also, realize that multiple values could match, such as if you queried for
B*.
-- Jack Krupansky
Most likely, the cause is what I said. I guess when you try to convert
bytes to number you didn't use the payload.offset to locate the right start
of bytes. Before 4.1, the start of payload is the expected value. But since
4.1, you must use the offset and length to get the correct bytes you
Hi all,
I'm evaluating using Lucene for some data that would not be stored anywhere
else, and I'm concerned about reliabilty. Having a database storing the
data in addition to Lucene would be a problem, and I want to know if Lucene
is reliable enough.
Reading this article,
You might be able to get close if you use PostingsHighlighter: it
tells you the offset of each matched Passage, and you can correlate
that to which field value (assuming you stored the multi-valued
fields).
You must index offsets into your postings.
But there are caveats ... if you use
I am new to Lucene and going through the Lucene in Action 2nd edition book. I
have a quick question on the best way to add fields to a document now that
Field.Index is deprecated.
Here is what I am doing and what most example online suggest:
doc.add(new Field(id, dbID, Store.YES,
Hello,
I'm looking for an analyzer that allows performing accent insensitive search in
latin languages. I'm currently using the StandardAnalyzer but it doesn't
fulfill this need. Could you please point me to the one I need to use? I've
checked the javadoc for the various analyzer packages but
We badly need Lucene in Action 3rd edition!
The easiest approach is to use one of the new XXXField classes under
oal.document, eg StringField for your example.
If none of the existing XXXFields fit, you can make a custom
FieldType, tweak all of its settings, and then create a Field from
that.
Try the ASCII Folding FIlter:
https://lucene.apache.org/core/4_2_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html
-- Jack Krupansky
-Original Message-
From: Jerome Blouin
Sent: Friday, March 22, 2013 12:22 PM
To: java-user@lucene.apache.org
Subject:
I understand that I can't configure it on an analyzer so on which class can I
apply it?
Thank,
Jerome
-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Friday, March 22, 2013 12:38 PM
To: java-user@lucene.apache.org
Subject: Re: Accent insensitive analyzer
Start with the Standard Tokenizer:
https://lucene.apache.org/core/4_2_0/analyzers-common/org/apache/lucene/analysis/standard/StandardTokenizer.html
-- Jack Krupansky
-Original Message-
From: Jerome Blouin
Sent: Friday, March 22, 2013 12:53 PM
To: java-user@lucene.apache.org
Subject:
Hi Jerome,
How about this one?
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ISOLatin1AccentFilterFactory
Regards,
Sujit
On Mar 22, 2013, at 9:22 AM, Jerome Blouin wrote:
Hello,
I'm looking for an analyzer that allows performing accent insensitive search
in latin
Thanks. I'll check that later.
-Original Message-
From: Sujit Pal [mailto:sujitatgt...@gmail.com] On Behalf Of SUJIT PAL
Sent: Friday, March 22, 2013 2:52 PM
To: java-user@lucene.apache.org
Subject: Re: Accent insensitive analyzer
Hi Jerome,
How about this one?
can you send this to d...@lucene.apache.org?
simon
On Fri, Mar 22, 2013 at 7:52 PM, Ravikumar Govindarajan
ravikumar.govindara...@gmail.com wrote:
Most of us, writing custom codec use segment-name as a handle and push data
to a different storage
Would it be possible to get a hook in the
On Fri, Mar 22, 2013 at 2:00 PM, Pablo Guerrero sir...@gmail.com wrote:
Hi all,
I'm evaluating using Lucene for some data that would not be stored anywhere
else, and I'm concerned about reliabilty. Having a database storing the
data in addition to Lucene would be a problem, and I want to know
On Fri, Mar 22, 2013 at 5:28 PM, Michael McCandless
luc...@mikemccandless.com wrote:
We badly need Lucene in Action 3rd edition!
go mike go!!!
;)
The easiest approach is to use one of the new XXXField classes under
oal.document, eg StringField for your example.
If none of the existing
+1
I own a copy of 2nd Edition and will gladly purchase 3rd Edition when it's
out.
--
typos, misspels, and other weird words brought to you courtesy of my mobile
device and its auto-(in)correct feature.
On Mar 22, 2013 3:21 PM, Uwe Schindler u...@thetaphi.de wrote:
Come on! :-)
-
Uwe
21 matches
Mail list logo