Term vector Lucene 4.2

2013-04-02 Thread andi rexha
Hi, I have a problem while trying to extract term vector's attributes (i.e. position of the terms). What I have done was: Terms termVector = indexReader.getTermVector(docId, fieldName); TermsEnum reuse = null; TermsEnum iterator = termVector.iterator(reuse); PositionIncr

RE: Term vector Lucene 4.2

2013-04-02 Thread andi rexha
Hi Adrien, Thank you very much for the reply. I have two other small question about this: 1) Is "final int freq = docsAndPositions.freq();" the same with "iterator.totalTermFreq()" ? In my tests it returns the same result and from the documentation it seems that the result should be the same.

Segment readers in Lucene 4.2

2013-04-02 Thread andi rexha
Hi, I have a question about the Index Readers in Lucene. As far as I understand from the documentation, with the Lucene 4, we can create an Index Reader from DirectoryReader.open(directory); >From the code of the DirectoryReader, I have seen that it uses the >SegmentReader to create the reader.

RE: Segment readers in Lucene 4.2

2013-04-02 Thread andi rexha
Hi, Thanks for the reply ;) > > this is all not public tot he code because it is also subject to change! > > With Lucene 4.x, you can assume: > directoryReader.leaves().get(i) corresponds to segmentsinfos.info(i) > > WARNING: But this is only true if: > - the reader is instanceof DirectoryR

WhitespaceTokenizer, incrementToke() ArrayOutOfBoundException

2013-04-15 Thread andi rexha
Hi, I have tryed to get all the tokens from a TokenStream in the same way as I was doing in the 3.x version of Lucene, but now (at least with WhitespaceTokenizer) I get an exception: Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1 at java.lang.Character.codePointAtIm

RE: WhitespaceTokenizer, incrementToke() ArrayOutOfBoundException

2013-04-15 Thread andi rexha
reset" that is now mandatory > and throws AIOOBE if not present? > > -- Jack Krupansky > > -Original Message- > From: andi rexha > Sent: Monday, April 15, 2013 10:21 AM > To: java-user@lucene.apache.org > Subject: WhitespaceTokenizer, incrementToke() Arr

Merge policy!

2013-05-02 Thread andi rexha
Hi, I want to create a simulation of the old "optimize" for an index. I can do it for 4.1 api, because I can use the method "setUseCompoundFile(true)" for the merge policy and call indexWriter.forceMerge(1); But I dont find a way to do it for the 4.2.1 api. Do you have any suggestion on how to

RE: Merge policy!

2013-05-02 Thread andi rexha
e using? > > Mike McCandless > > http://blog.mikemccandless.com > > > On Thu, May 2, 2013 at 6:53 AM, andi rexha wrote: > > Hi, > > I want to create a simulation of the old "optimize" for an index. I can do > > it for 4.1 api, because I can use the method

Cache Field Lucene 3.6.0

2013-07-30 Thread andi rexha
Hi, I have a stored and tokenized field, and I want to cache all the field values. I have one document in the index, with the "field.value" => "hello world" and with tokens => "hello", "world". I try to extract the fields content : String [] cachedFields = FieldCache.DEFAULT.getStri

RE: Cache Field Lucene 3.6.0

2013-07-30 Thread andi rexha
Hi Adrien, Thank you very much. I will have a look on your suggestion ;) > From: jpou...@gmail.com > Date: Tue, 30 Jul 2013 16:16:03 +0200 > Subject: Re: Cache Field Lucene 3.6.0 > To: java-user@lucene.apache.org > > Hi, > > On Tue, Jul 30, 2013 at 4:09 PM, andi rexha

Are term vector position returned ordered in Lucene 3.6?

2014-02-03 Thread andi rexha
Hi, I am extracting term vectors position in Lucene 3.6. First I extract the term vector and get the indexes from the "indexOf" method as suggested for extracting the position. TermPositionVector termFreqVector = (TermPositionVector) reader.getTermFreqVector(i, termField); String[] terms = ter

Multi-thread indexing, should the commit be called from each thread?

2014-05-21 Thread andi rexha
Hi! I have a question about multi-thread indexing. When I perform a Multi-thread indexing, should I commit from each thread that I add documents or the commit should be done only when all the threads are done with their indexing task? Thank you!

RE: Multi-thread indexing, should the commit be called from each thread?

2014-05-22 Thread andi rexha
> > You don't need to commit from each thread, you can definitely commit when > > all threads are done. In general, you should commit only when you want to > > ensure the data is "safe" on disk. > > > > Shai > > > > > > On Wed, May 21,

Exception while using a custom analyzer in a parallel indexing!

2014-09-15 Thread andi rexha
Hi, I have an index writer that is used to from a pool of threads to index. The index writer is using a "PerFieldAnalyzerWrapper": this.analyzer = new PerFieldAnalyzerWrapper(DEFAULT_ANALYZER, fields); If I add the documents single threaded I dont get any exception. In the case that I add th

Lucene Free Text Suggester, get only single tokens as suggestion!

2014-10-13 Thread andi rexha
Hi! I have a field in an index for which I want to have a "free text suggestion". The field is analyzed, storend and term vector the field. I tried to use two approaches to get the suggestions from the field. I have tried to apply the free text suggester with a dictionary like :

How to find out opened readers in Lucene

2014-12-09 Thread andi rexha
Hi, I have some IndexReaders and a single IndexWriter opened. I index some documents, and in the end only one reader should be opened and the old ones should be closed. The readers should be closed, but when I restart the system(meaning I restart the IndexWriter), actually the size of the inde

RE: How to find out opened readers in Lucene

2014-12-09 Thread andi rexha
ing the readers. > > Mike McCandless > > http://blog.mikemccandless.com > > > On Tue, Dec 9, 2014 at 5:59 AM, andi rexha wrote: > > Hi, > > I have some IndexReaders and a single IndexWriter opened. I index some > > documents, and in the end only one reader should be

Getting new token stream from analyzer for legacy projects!

2014-12-12 Thread andi rexha
Hi, I have a legacy problem with the token stream. In my application I create a batch of documents from a unique analyzer (this due to configuration). I add the field using the tokenStream from the analyzer(for internal reasons). In a pseudo code this translates in : Analyzer analyzer = getFr

Is the merge executed asynchronously?

2016-02-16 Thread andi rexha
Hi! I have a library which keeps cached information about the current segments of an index. I assume that after each (indexing=>commit) sequence, I have the information of the segments which represent the current status of the segments ("merged" ones as well). Checking the system, seems like t

Integer Range Query in Lucene 4.10.4 not working as expected.

2017-06-30 Thread andi rexha
I have a numeric range query to perform in an index. I begin by indexing a document with a field value of "300". When I search for a range [100 TO 400] I get results from the search operation. Strangely enough, when I search for [100 TO 4000], I don't get any search results. Here is a code sni