Re: high memory usage by indexreader

2013-03-22 Thread Ian Lea
I did ask if there was anything else relevant you'd forgotten to mention ... How fast are general file operations on the NFS files? Your times are still extremely long and my guess is that your network/NFS setup are to blame. Can you run your code on the server that is exporting the index, if

Multi-value fields in Lucene 4.1

2013-03-22 Thread Chris Bamford
Hi, If I index several similar values in a multivalued field (e.g. many authors to one book), is there any way to know which of these matched during a query? e.g. Book The art of Stuff, with authors Bob Thingummy and Belinda Bootstrap If we queried for +(author:Be*) and matched this

Re: PayloadFunctions don't work the same since 4.1

2013-03-22 Thread jimtronic
Thanks for the response. I wrote some new custom payload functions to verify that I'm getting the value correctly and I think I am, but I did unearth this clue. In the docs below, the score should be the sum of all the payloads for the term bing. It appears to be using the value for the first

Re: question about document-frequency in score

2013-03-22 Thread Simon Willnauer
all statistics in lucene are per field so is document frequency simon On Fri, Mar 22, 2013 at 10:48 AM, Nicole Lacoste niki.laco...@gmail.com wrote: Hi I am trying to figure out if the document-frequency of a term used in calculating the score. Is it per field? Or is independent of the

Re: Getting documents from suggestions

2013-03-22 Thread Bratislav Stojanovic
OK, I've played with all this solutions and basically only one gave me satisfying results. Using build() with TermFreqPayload argument gave me horrible performance, because it takes more than 5 mins to iterate through all Terms in the index and to filter them based on the doc id. Not sure if this

Re: Multi-value fields in Lucene 4.1

2013-03-22 Thread Jack Krupansky
I don't think there is a way of identifying which of the values of a multivalued field matched. But... I haven't checked the code to be absolutely certain whether their isn't some expert way. Also, realize that multiple values could match, such as if you queried for B*. -- Jack Krupansky

Re: PayloadFunctions don't work the same since 4.1

2013-03-22 Thread Duke DAI
Most likely, the cause is what I said. I guess when you try to convert bytes to number you didn't use the payload.offset to locate the right start of bytes. Before 4.1, the start of payload is the expected value. But since 4.1, you must use the offset and length to get the correct bytes you

Lucene reliability as primary store

2013-03-22 Thread Pablo Guerrero
Hi all, I'm evaluating using Lucene for some data that would not be stored anywhere else, and I'm concerned about reliabilty. Having a database storing the data in addition to Lucene would be a problem, and I want to know if Lucene is reliable enough. Reading this article,

Re: Multi-value fields in Lucene 4.1

2013-03-22 Thread Michael McCandless
You might be able to get close if you use PostingsHighlighter: it tells you the offset of each matched Passage, and you can correlate that to which field value (assuming you stored the multi-valued fields). You must index offsets into your postings. But there are caveats ... if you use

Field.Index deprecation ?

2013-03-22 Thread jeffthorne
I am new to Lucene and going through the Lucene in Action 2nd edition book. I have a quick question on the best way to add fields to a document now that Field.Index is deprecated. Here is what I am doing and what most example online suggest: doc.add(new Field(id, dbID, Store.YES,

Accent insensitive analyzer

2013-03-22 Thread Jerome Blouin
Hello, I'm looking for an analyzer that allows performing accent insensitive search in latin languages. I'm currently using the StandardAnalyzer but it doesn't fulfill this need. Could you please point me to the one I need to use? I've checked the javadoc for the various analyzer packages but

Re: Field.Index deprecation ?

2013-03-22 Thread Michael McCandless
We badly need Lucene in Action 3rd edition! The easiest approach is to use one of the new XXXField classes under oal.document, eg StringField for your example. If none of the existing XXXFields fit, you can make a custom FieldType, tweak all of its settings, and then create a Field from that.

Re: Accent insensitive analyzer

2013-03-22 Thread Jack Krupansky
Try the ASCII Folding FIlter: https://lucene.apache.org/core/4_2_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/ASCIIFoldingFilter.html -- Jack Krupansky -Original Message- From: Jerome Blouin Sent: Friday, March 22, 2013 12:22 PM To: java-user@lucene.apache.org Subject:

RE: Accent insensitive analyzer

2013-03-22 Thread Jerome Blouin
I understand that I can't configure it on an analyzer so on which class can I apply it? Thank, Jerome -Original Message- From: Jack Krupansky [mailto:j...@basetechnology.com] Sent: Friday, March 22, 2013 12:38 PM To: java-user@lucene.apache.org Subject: Re: Accent insensitive analyzer

Re: Accent insensitive analyzer

2013-03-22 Thread Jack Krupansky
Start with the Standard Tokenizer: https://lucene.apache.org/core/4_2_0/analyzers-common/org/apache/lucene/analysis/standard/StandardTokenizer.html -- Jack Krupansky -Original Message- From: Jerome Blouin Sent: Friday, March 22, 2013 12:53 PM To: java-user@lucene.apache.org Subject:

Re: Accent insensitive analyzer

2013-03-22 Thread SUJIT PAL
Hi Jerome, How about this one? http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ISOLatin1AccentFilterFactory Regards, Sujit On Mar 22, 2013, at 9:22 AM, Jerome Blouin wrote: Hello, I'm looking for an analyzer that allows performing accent insensitive search in latin

RE: Accent insensitive analyzer

2013-03-22 Thread Jerome Blouin
Thanks. I'll check that later. -Original Message- From: Sujit Pal [mailto:sujitatgt...@gmail.com] On Behalf Of SUJIT PAL Sent: Friday, March 22, 2013 2:52 PM To: java-user@lucene.apache.org Subject: Re: Accent insensitive analyzer Hi Jerome, How about this one?

Re: Segment file clean-up and codecs

2013-03-22 Thread Simon Willnauer
can you send this to d...@lucene.apache.org? simon On Fri, Mar 22, 2013 at 7:52 PM, Ravikumar Govindarajan ravikumar.govindara...@gmail.com wrote: Most of us, writing custom codec use segment-name as a handle and push data to a different storage Would it be possible to get a hook in the

Re: Lucene reliability as primary store

2013-03-22 Thread Simon Willnauer
On Fri, Mar 22, 2013 at 2:00 PM, Pablo Guerrero sir...@gmail.com wrote: Hi all, I'm evaluating using Lucene for some data that would not be stored anywhere else, and I'm concerned about reliabilty. Having a database storing the data in addition to Lucene would be a problem, and I want to know

Re: Field.Index deprecation ?

2013-03-22 Thread Simon Willnauer
On Fri, Mar 22, 2013 at 5:28 PM, Michael McCandless luc...@mikemccandless.com wrote: We badly need Lucene in Action 3rd edition! go mike go!!! ;) The easiest approach is to use one of the new XXXField classes under oal.document, eg StringField for your example. If none of the existing

RE: Field.Index deprecation ?

2013-03-22 Thread Igal Sapir
+1 I own a copy of 2nd Edition and will gladly purchase 3rd Edition when it's out. -- typos, misspels, and other weird words brought to you courtesy of my mobile device and its auto-(in)correct feature. On Mar 22, 2013 3:21 PM, Uwe Schindler u...@thetaphi.de wrote: Come on! :-) - Uwe