Identifying the most relevant document

2014-04-29 Thread Vishnu
I am trying to solve the following search problem. Say we have 10 different documents d1..d10 Each document contains a type of data say, d1 -> list of movie names, d2 -> list of actor names, d3 -> list of addresses etc. Each document contains list of entities and scores. So d1 contains movie names

Re: Fields, Index segments and docIds (second Try)

2014-04-29 Thread Jose Carlos Canova
My suggestion is you not worry about the docId, in practice it is an "internal lucene" id, quite similar with a rowId on a database, each index may generate a different docId (it is their problem) from a translated document, you may use your own ID that relates one document to another on different

Fields, Index segments and docIds (second Try)

2014-04-29 Thread Olivier Binda
Hello. Sorry to bring this up again. I don't want to be rudeand I mean no disrespect, but after thinking it through today, I need to and would really love to have the answer to the following question : 1) At lucene indexing time, is it possible to rewrite a read-only index so that some field

Re: Encryption

2014-04-29 Thread rulinma
You can do it. Choose reasonable alogrith. Analyzer written by self is needed also. -- View this message in context: http://lucene.472066.n3.nabble.com/Encryption-tp539373p4133687.html Sent from the Lucene - Java Users mailing list archive at Nabble.com.

Re: No Compound Files

2014-04-29 Thread Varun Thacker
Created LUCENE-5633 for it. On Tue, Apr 29, 2014 at 6:28 PM, Shai Erera wrote: > NoMP means no merges, and indeed it seems silly that NoMP distinguishes > between compound/non-compound settings. Perhaps it's rooted somewhere in > the past, I don't remember. > > I checked and IndexWriter.addInde

Re: No Compound Files

2014-04-29 Thread Shai Erera
NoMP means no merges, and indeed it seems silly that NoMP distinguishes between compound/non-compound settings. Perhaps it's rooted somewhere in the past, I don't remember. I checked and IndexWriter.addIndexes consults MP.useCompoundFile(segmentInfo) when it adds the segments. But maybe NoMP.useCo

Re: No Compound Files

2014-04-29 Thread Michael McCandless
+1 to just have NoMergePolicy.INSTANCE Mike McCandless http://blog.mikemccandless.com On Tue, Apr 29, 2014 at 8:07 AM, Robert Muir wrote: > I think NoMergePolicy.NO_COMPOUND_FILES and > NoMergePolicy.COMPOUND_FILES should be removed, and replaced with > NoMergePolicy.INSTANCE > > If you want t

Re: No Compound Files

2014-04-29 Thread Varun Thacker
Thanks for the response. I was not aware of IWC.setUseCompoundFile . @Shai this is what I feel is confusing - From what I understand NoMergePolicy means no merges. Hence why have two separate options? On Tue, Apr 29, 2014 at 5:44 PM, Shai Erera wrote: > The problem is that compound files se

Re: No Compound Files

2014-04-29 Thread Robert Muir
On Tue, Apr 29, 2014 at 8:14 AM, Shai Erera wrote: > > If we only offer NoMP.INSTANCE, what would it do w/ merged segments? always > compound? always not-compound? it doesnt merge though. - To unsubscribe, e-mail: java-user-unsu

Re: No Compound Files

2014-04-29 Thread Shai Erera
The problem is that compound files settings are split between MergePolicy and IndexWriterConfig. As documented on IWC.setUseCompoundFile, this setting controls how new segments are flushed, while the MP setting controls how merged segments are written. If we only offer NoMP.INSTANCE, what would it

Re: No Compound Files

2014-04-29 Thread Robert Muir
I think NoMergePolicy.NO_COMPOUND_FILES and NoMergePolicy.COMPOUND_FILES should be removed, and replaced with NoMergePolicy.INSTANCE If you want to change whether CFS is used by indexwriter flush, you need to set that in IndexWriterConfig. On Tue, Apr 29, 2014 at 8:03 AM, Varun Thacker wrote: >

No Compound Files

2014-04-29 Thread Varun Thacker
I wanted to use the NoMergePolicy.NO_COMPOUND_FILES to ensure that no merges take place on the index. However I was unsuccessful at it. What I am doing wrong here. Attaching a gist with - 1. Output when using NoMergePolicy.NO_COMPOUND_FILES 2. Output when using TieredMergePolicy with policy.setNoC

Terms of a given set of documents (subset of the full index)

2014-04-29 Thread iut483
Hi, I am trying to retrieve Terms for a given set of documents (int array or Bitset), which is the result of a query. // Index creation // Query with an IndexSearcher IndexSearcher searcher = new IndexSearcher(ir); TopDocs docs = searcher.search(query, 100); >From the "docs", an array of int c

Re: Fields, Index segments and docIds

2014-04-29 Thread Olivier Binda
On 04/29/2014 08:46 AM, Uwe Schindler wrote: Hi Oliver, To me it looks like you want to do it much too complicated. It also seems that you misunderstood join queries, which seems to be your problem. Comments inside: My lucene Index is built and stored in a zip file (uncompressed) which is use

Re: Getting multi-values to use in filter?

2014-04-29 Thread Shai Erera
Hi Rob, While the demo code uses a fixed number of 3 values, you don't need to encode the number of values up front. Since your read the byte[] of a document up front, you can read in a while loop as long as in.position() < in.length(). Shai On Tue, Apr 29, 2014 at 10:04 AM, Rob Audenaerde wrot

Re: Fields, Index segments and docIds

2014-04-29 Thread Olivier Binda
This really help ! I didn't know about MultiReader. This looks like exactly what I need for 1 & 2 For 3. Remapping docIds would allow me to use them as ids for my data, instead of having a stored field with my ids (which is usually the official recommanded way to do this is lucene) It may no

Re: Getting multi-values to use in filter?

2014-04-29 Thread Rob Audenaerde
Hi Shai, I read the article on your blog, thanks for it! It seems to be a natural fit to do multi-values like this, and it is helpful indeed. For my specific problem, I have multiple values that do not have a fixed number, so it can be either 0 or 10 values. I think the best way to solve this i