Collector is collecting more than the specified hits

2014-02-13 Thread saisantoshi
The problem with the below collector is the collect method is not stopping after the numHits count has reached. Is there a way to stop the collector collecting the docs after it has reached the numHits specified. For example: * TopScoreDocCollector topScore = TopScoreDocCollector.create(numHits, t

Re: Actual min and max-value of NumericField during codec flush

2014-02-13 Thread Ravikumar Govindarajan
Yeah, now I understood a little bit. Since LogMP always merges adjacent segments, that should pretty much serve my use-case, when used with a SortingMP Early-Query termination quits by throwing an Exception right?. Is it ok to individually search using SegmentReader and then break-off, instead of

Re: Adding custom weights to individual terms

2014-02-13 Thread lukai
Hi, Rune: Per your requirement, you can generate a separated filed for the document before send document to lucene. Let's say the name is: score_field. The content of this field in this way: Doc 1#score_field: Lucence:0.7 is:0 ... Doc 2#score_field: Lucene:0.5 is:0 ... Store the field with

Re: Adding custom weights to individual terms

2014-02-13 Thread Rune Stilling
Den 13/02/2014 kl. 12.36 skrev Michael McCandless : > You could stuff your custom weights into a payload, and index that, > but this is per term per document per position, while it sounds like > you just want one float for each term regardless of which > documents/positions where that term occurre

Re: Adding custom weights to individual terms

2014-02-13 Thread Rune Stilling
I’m not sure how I would do that, when Lucene is meant to use my custom weights when calculating document weights when executing a search query. Doc 1 Lucene(0.7) is(0) a(0) powerful(0.9) indexing(0.62) and(0) search(0.99) API(0.3) Doc 2 Lucene(0.5) is(0) used by(0) a(0) lot of(0) smart(0) peopl

Re: Adding custom weights to individual terms

2014-02-13 Thread Rune Stilling
Den 13/02/2014 kl. 12.36 skrev Michael McCandless : > You could stuff your custom weights into a payload, and index that, > but this is per term per document per position, while it sounds like > you just want one float for each term regardless of which > documents/positions where that term occurre

Re: IndexWriter and IndexReader

2014-02-13 Thread Michael McCandless
Overwriting an index in-place while open IndexReaders are actively searching works fine. You can either open a new IW with OpenMode.CREATE, or, you can call IW.deleteAll() if you have an existing IW already open. Writing to a shared index directory mapped to N machines is not generally done, beca

Re: MAX_TERM_LENGTH

2014-02-13 Thread Marcio Napoli
Thanks for note, Marcio Napoli Go beyond Apache Lucene(tm) features with Numere(R) http://numere.stela.org.br 2014-02-13 14:56 GMT-02:00 Michael McCandless : > You can use IndexReader.getBinaryDocValues(field). > > BTW your site should reference *Apache* Lucene, not just Lucene. > > Mike McCa

Re: MAX_TERM_LENGTH

2014-02-13 Thread Michael McCandless
You can use IndexReader.getBinaryDocValues(field). BTW your site should reference *Apache* Lucene, not just Lucene. Mike McCandless http://blog.mikemccandless.com On Thu, Feb 13, 2014 at 11:51 AM, Marcio Napoli wrote: > Hey Mike, > > I need quick access to values per document. The use of bina

Re: MAX_TERM_LENGTH

2014-02-13 Thread Marcio Napoli
Hey Mike, I need quick access to values per document. The use of binary values is possible via doc FieldCache -> FieldCacheSource.getValues ()? Thanks, Marcio Napoli Go beyond Lucene(tm) features with Numere(R) http://numere.stela.org.br 2014-02-13 13:16 GMT-02:00 Michael McCandless : > Why d

RE: Getting term ords during collect

2014-02-13 Thread Kyle Judson
The SortedSetDocValuesField worked great. Thanks. Kyle > From: luc...@mikemccandless.com > Date: Wed, 12 Feb 2014 05:39:24 -0500 > Subject: Re: Getting term ords during collect > To: java-user@lucene.apache.org > > It sounds like you are just indexing at TextField and then calling > getDocTermOr

IndexWriter and IndexReader

2014-02-13 Thread Cemo
I am quite new to Lucene. I am trying to prepare an application where: 1. ~ 100K documents exist. 2. ~ 4 search server will be utilized 3. Documents are not frequently updated and I want to check every minute a deletion or addition. 4. I am ready to sacrifice some system resource to

Re: simple question about index reader

2014-02-13 Thread Michael McCandless
The reader holds all the underlying files still open, and relies on the filesystem to "protect" still-open files that are deleted. Windows does this by refusing to allow deletion. Unix does it by keeping the file bytes available on disk but removing the directory entry ("delete on last close").

Re: MAX_TERM_LENGTH

2014-02-13 Thread Michael McCandless
Why do you index such immense terms? What's the end user use case? Do they really need to be inverted? Maybe use binary doc values instead? Mike McCandless http://blog.mikemccandless.com On Thu, Feb 13, 2014 at 8:36 AM, Marcio Napoli wrote: > Hi All, > > I have a need to work with big terms.

simple question about index reader

2014-02-13 Thread Yonghui Zhao
Hi, I am new to lucene and I get a simple question about index reader. If I open a DirectoryReader say reader1 based on a disk directory, then the lucene index directory is changed, to get new result I need get a new DirectoryReader. Suppose reader1 will get the result before the change forever.

MAX_TERM_LENGTH

2014-02-13 Thread Marcio Napoli
Hi All, I have a need to work with big terms. So the 32k is not enough. How can i increase the maximum size of a term? Found in the IndexWriter MAX_TERM_LENGTH constant, which refers to FieldCache and DocumentsWriterPerThread (BYTE_BLOCK_SIZE-2). Thanks, Marcio Napoli Go beyond Lucene(tm) featur

Re: Adding custom weights to individual terms

2014-02-13 Thread Shai Erera
I often prefer to manage such weights outside the index. Usually managing them inside the index leads to problems in the future when e.g the weights change. If they are encoded in the index, it means re-indexing. Also, if the weight changes then in some segments the weight will be different than ot

Re: Which is better ,Search through query and whole text document or search through query with document field.

2014-02-13 Thread Ian Lea
The one that meets your requirements most easily will be the best. If people will want to search for words in particular fields you'll need to split it but if they only ever want to search across all fields there's no point. A common requirement is to want both, in which case you can split it and

Re: Adding custom weights to individual terms

2014-02-13 Thread Michael McCandless
You could stuff your custom weights into a payload, and index that, but this is per term per document per position, while it sounds like you just want one float for each term regardless of which documents/positions where that term occurred? Doing your own custom attribute would be a challenge: not

Re: [Suggestions Required] 110 Concurrency users indexing on Lucene dont finish in 200 ms.

2014-02-13 Thread Michael McCandless
For better performance, you should not send 100 threads to IndexWriter, but rather a number of threads in proportion to how many CPUs the machine has. E.g. if your CPU has 8 cores then use at most 12 (=8 * 1.5) indexing threads. It's fine to have 100 client threads sending documents, but drop thes

Which is better ,Search through query and whole text document or search through query with document field.

2014-02-13 Thread Rajendra Rao
Hello, I have query and document.Its unstructured & natural text.I used lucene for searching document on query.If I separate Document into field and then search.what will be difference? I can't check it because now i don't have field separated data .But in future we will have. Thanks.

Re: [Suggestions Required] 110 Concurrency users indexing on Lucene dont finish in 200 ms.

2014-02-13 Thread sree
Thanks for your reply. We are using 100 threads and each indexes 100 documents. Now we created a standalone project which uses lucene to index 100 documents for 100 theads concurrently and we can see that each thread uses an average of more than 1 sec. lucene-group.zip