Adding Docvalues to a Field

2017-05-05 Thread aravinth thangasami
Hi all,

On process of moving to Lucene 5 from Lucene 4, we faced this following
issue
We have enabled doc values in Lucene 5.we previously don't used doc values
in Lucene 4

Using UninvertingReader, sorting works fine until the first merge happens.
On merge documents in the older version without doc values affect the
sorting order.

Is there any way to solve this issue without reindexing ???

What is  your opinion on it ?

I was thinking about these two ways.will these possible ?

1. Does Uninverting Reader can be made to store the formed doc values to
disk ?
2. During merge, does IndexWriter can be made to write the doc values for
documents without doc value ?



Thanks
Aravinth


Re: Un-used index files are not getting released

2017-05-05 Thread Ian Lea
The most common cause is unclosed index readers.  If you run lsof against
the tomcat process id and see that some deleted files are still open,
that's almost certainly the problem.  Then all you have to do is track it
down in your code.


--
Ian.


On Thu, May 4, 2017 at 10:09 PM, Siraj Haider  wrote:

> Hi all,
> We recently switched to Lucene 6.5 from 2.9 and we have an issue that the
> files in index directory are not getting released after the IndexWriter
> finishes up writing a batch of documents. We are using
> IndexFolder.listFiles().length to check the number of files in index
> folder. We have even tried closing/reopening the
> IndexWriter/SearcherManager/MMapDirectory after indexing each batch to
> see if that would release the files but it didn't. When we shutdown the
> tomcat and restart it, only then we see that number drop, which proves that
> there were some deleted files still held by Lucene somewhere. Can you
> please direct me on what needs to be checked?
>
> P.S. I apologize for the duplicate email, as I didn't see my yesterday's
> email in the list.
>
> Regards
> -Siraj
>
> 
>
> This electronic mail message and any attachments may contain information
> which is privileged, sensitive and/or otherwise exempt from disclosure
> under applicable law. The information is intended only for the use of the
> individual or entity named as the addressee above. If you are not the
> intended recipient, you are hereby notified that any disclosure, copying,
> distribution (electronic or otherwise) or forwarding of, or the taking of
> any action in reliance on, the contents of this transmission is strictly
> prohibited. If you have received this electronic transmission in error,
> please notify us by telephone, facsimile, or e-mail as noted above to
> arrange for the return of any electronic mail or attachments. Thank You.
>


Re: Adding Docvalues to a Field

2017-05-05 Thread Erick Erickson
In a word, "no". You must re-index from scratch. Worse, now that you
have some segments thinking the fields are docValues and some not and
maybe some mixed, I know of no way to un-entangle them.

I'd create a new collection and re-index it entirely, then use
collection aliasing to point the applications at the new collection.

Best,
Erick

On Fri, May 5, 2017 at 2:49 AM, aravinth thangasami
 wrote:
> Hi all,
>
> On process of moving to Lucene 5 from Lucene 4, we faced this following
> issue
> We have enabled doc values in Lucene 5.we previously don't used doc values
> in Lucene 4
>
> Using UninvertingReader, sorting works fine until the first merge happens.
> On merge documents in the older version without doc values affect the
> sorting order.
>
> Is there any way to solve this issue without reindexing ???
>
> What is  your opinion on it ?
>
> I was thinking about these two ways.will these possible ?
>
> 1. Does Uninverting Reader can be made to store the formed doc values to
> disk ?
> 2. During merge, does IndexWriter can be made to write the doc values for
> documents without doc value ?
>
>
>
> Thanks
> Aravinth

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Adding Docvalues to a Field

2017-05-05 Thread aravinth thangasami
Thanks Erick

On Fri, May 5, 2017 at 9:19 PM, Erick Erickson 
wrote:

> In a word, "no". You must re-index from scratch. Worse, now that you
> have some segments thinking the fields are docValues and some not and
> maybe some mixed, I know of no way to un-entangle them.
>
> I'd create a new collection and re-index it entirely, then use
> collection aliasing to point the applications at the new collection.
>
> Best,
> Erick
>
> On Fri, May 5, 2017 at 2:49 AM, aravinth thangasami
>  wrote:
> > Hi all,
> >
> > On process of moving to Lucene 5 from Lucene 4, we faced this following
> > issue
> > We have enabled doc values in Lucene 5.we previously don't used doc
> values
> > in Lucene 4
> >
> > Using UninvertingReader, sorting works fine until the first merge
> happens.
> > On merge documents in the older version without doc values affect the
> > sorting order.
> >
> > Is there any way to solve this issue without reindexing ???
> >
> > What is  your opinion on it ?
> >
> > I was thinking about these two ways.will these possible ?
> >
> > 1. Does Uninverting Reader can be made to store the formed doc values to
> > disk ?
> > 2. During merge, does IndexWriter can be made to write the doc values for
> > documents without doc value ?
> >
> >
> >
> > Thanks
> > Aravinth
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>


Re: Adding Docvalues to a Field

2017-05-05 Thread Uwe Schindler
Hi Aravinth,

Erick was referring to Solr. To fix your issue without fully indexing you can 
use merging to update the whole index. To do this use the following approach:

Wrap your index using UninvertingReader. Then get all LeadReaders using the 
leaves() method. Wrap all this leaves with SlowCodecReaderWrapper.wrap(). You 
may use Java 8 stream API to do this. 😃

Then create an new index with IndexWriter and use 
IndexWriter.addIndex(CodecReader) and pass in the previously created wrappers, 
ideally one by one. Those readers are slow, but ready to be merged into a new 
index with DocValues. The empty Writer will then import the wrapped index and 
takes the emulates DocValues. This may take some time, but afterwards you have 
an index with all fields having the DocValues on disk. Inverting is no longer 
needed.

I hope that helps. I can post code that should do this. There is no ready to 
use tool available, because you need to correctly configure the uninverter.

Uwe

Am 5. Mai 2017 22:12:13 MESZ schrieb aravinth thangasami 
:
>Thanks Erick
>
>On Fri, May 5, 2017 at 9:19 PM, Erick Erickson
>
>wrote:
>
>> In a word, "no". You must re-index from scratch. Worse, now that you
>> have some segments thinking the fields are docValues and some not and
>> maybe some mixed, I know of no way to un-entangle them.
>>
>> I'd create a new collection and re-index it entirely, then use
>> collection aliasing to point the applications at the new collection.
>>
>> Best,
>> Erick
>>
>> On Fri, May 5, 2017 at 2:49 AM, aravinth thangasami
>>  wrote:
>> > Hi all,
>> >
>> > On process of moving to Lucene 5 from Lucene 4, we faced this
>following
>> > issue
>> > We have enabled doc values in Lucene 5.we previously don't used doc
>> values
>> > in Lucene 4
>> >
>> > Using UninvertingReader, sorting works fine until the first merge
>> happens.
>> > On merge documents in the older version without doc values affect
>the
>> > sorting order.
>> >
>> > Is there any way to solve this issue without reindexing ???
>> >
>> > What is  your opinion on it ?
>> >
>> > I was thinking about these two ways.will these possible ?
>> >
>> > 1. Does Uninverting Reader can be made to store the formed doc
>values to
>> > disk ?
>> > 2. During merge, does IndexWriter can be made to write the doc
>values for
>> > documents without doc value ?
>> >
>> >
>> >
>> > Thanks
>> > Aravinth
>>
>> -
>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-user-h...@lucene.apache.org
>>
>>

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de

RE: Adding Docvalues to a Field

2017-05-05 Thread Uwe Schindler
Hi Aravinth,

To get rid of the partially merged (mixed) docvalues fields you can use the 
following additional approach on top of my previous mail:
 
> Erick was referring to Solr. To fix your issue without fully indexing you can
> use merging to update the whole index. To do this use the following
> approach:
> 
> Wrap your index using UninvertingReader. Then get all LeadReaders using
> the leaves() method.

The problem is by that approach, that all those leaves that have partial (!) 
docvalues are seen by UninvertingReader as having DocValues already and those 
just return the partial DocValues, so Uninverting is not done. So we have to 
trick UninvertingReader to ignore the already existing (partial) DocValues. So 
instead of wrapping the whole IndexReader, we change the workflow:

- Get all leaves() of the broken docvalues/non-docvalues index
- Wrap all those LeafReader instances using an anonymous FilterLeafReader 
instance, overriding all the DocValues-related methods to return "null" instead 
of calling super. This hides all partially existing doc values (not form 
FieldInfos, but that should not hurt). The consumer of this reader will see no 
DocValues. 
- Then wrap those filtered Readers with new UninvertingRaeder(filteredLeaf) - 
this adds back fresh DocValues, recalculated from the uninverted fields. Be 
sure to get the types right, otherwise you will get merge errors (incompatible 
field types).
- Then wrap all those uninverting leaves with SlowCodecReaderWrapper.wrap(). 
This makes them mergeable (its slow and costs memory, but works).

The remaining stuff as said before:
 
> Then create an new index with IndexWriter and use
> IndexWriter.addIndex(CodecReader) and pass in the previously created
> wrappers, ideally one by one. Those readers are slow, but ready to be
> merged into a new index with DocValues. The empty Writer will then import
> the wrapped index and takes the emulates DocValues. This may take some
> time, but afterwards you have an index with all fields having the DocValues
> on disk. Inverting is no longer needed.
> 
> I hope that helps. I can post code that should do this. There is no ready to
> use tool available, because you need to correctly configure the uninverter.
> 
> Uwe
> 
> Am 5. Mai 2017 22:12:13 MESZ schrieb aravinth thangasami
> :
> >Thanks Erick
> >
> >On Fri, May 5, 2017 at 9:19 PM, Erick Erickson
> >
> >wrote:
> >
> >> In a word, "no". You must re-index from scratch. Worse, now that you
> >> have some segments thinking the fields are docValues and some not and
> >> maybe some mixed, I know of no way to un-entangle them.
> >>
> >> I'd create a new collection and re-index it entirely, then use
> >> collection aliasing to point the applications at the new collection.
> >>
> >> Best,
> >> Erick
> >>
> >> On Fri, May 5, 2017 at 2:49 AM, aravinth thangasami
> >>  wrote:
> >> > Hi all,
> >> >
> >> > On process of moving to Lucene 5 from Lucene 4, we faced this
> >following
> >> > issue
> >> > We have enabled doc values in Lucene 5.we previously don't used doc
> >> values
> >> > in Lucene 4
> >> >
> >> > Using UninvertingReader, sorting works fine until the first merge
> >> happens.
> >> > On merge documents in the older version without doc values affect
> >the
> >> > sorting order.
> >> >
> >> > Is there any way to solve this issue without reindexing ???
> >> >
> >> > What is  your opinion on it ?
> >> >
> >> > I was thinking about these two ways.will these possible ?
> >> >
> >> > 1. Does Uninverting Reader can be made to store the formed doc
> >values to
> >> > disk ?
> >> > 2. During merge, does IndexWriter can be made to write the doc
> >values for
> >> > documents without doc value ?
> >> >
> >> >
> >> >
> >> > Thanks
> >> > Aravinth
> >>
> >> -
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> >>
> 
> --
> Uwe Schindler
> Achterdiek 19, 28357 Bremen
> https://www.thetaphi.de


-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



Re: Adding Docvalues to a Field

2017-05-05 Thread aravinth thangasami
Will try on it.

Thanks Uwe :)

On Sat, May 6, 2017 at 4:02 AM, Uwe Schindler  wrote:

> Hi Aravinth,
>
> To get rid of the partially merged (mixed) docvalues fields you can use
> the following additional approach on top of my previous mail:
>
> > Erick was referring to Solr. To fix your issue without fully indexing
> you can
> > use merging to update the whole index. To do this use the following
> > approach:
> >
> > Wrap your index using UninvertingReader. Then get all LeadReaders using
> > the leaves() method.
>
> The problem is by that approach, that all those leaves that have partial
> (!) docvalues are seen by UninvertingReader as having DocValues already and
> those just return the partial DocValues, so Uninverting is not done. So we
> have to trick UninvertingReader to ignore the already existing (partial)
> DocValues. So instead of wrapping the whole IndexReader, we change the
> workflow:
>
> - Get all leaves() of the broken docvalues/non-docvalues index
> - Wrap all those LeafReader instances using an anonymous FilterLeafReader
> instance, overriding all the DocValues-related methods to return "null"
> instead of calling super. This hides all partially existing doc values (not
> form FieldInfos, but that should not hurt). The consumer of this reader
> will see no DocValues.
> - Then wrap those filtered Readers with new UninvertingRaeder(filteredLeaf)
> - this adds back fresh DocValues, recalculated from the uninverted fields.
> Be sure to get the types right, otherwise you will get merge errors
> (incompatible field types).
> - Then wrap all those uninverting leaves with
> SlowCodecReaderWrapper.wrap(). This makes them mergeable (its slow and
> costs memory, but works).
>
> The remaining stuff as said before:
>
> > Then create an new index with IndexWriter and use
> > IndexWriter.addIndex(CodecReader) and pass in the previously created
> > wrappers, ideally one by one. Those readers are slow, but ready to be
> > merged into a new index with DocValues. The empty Writer will then import
> > the wrapped index and takes the emulates DocValues. This may take some
> > time, but afterwards you have an index with all fields having the
> DocValues
> > on disk. Inverting is no longer needed.
> >
> > I hope that helps. I can post code that should do this. There is no
> ready to
> > use tool available, because you need to correctly configure the
> uninverter.
> >
> > Uwe
> >
> > Am 5. Mai 2017 22:12:13 MESZ schrieb aravinth thangasami
> > :
> > >Thanks Erick
> > >
> > >On Fri, May 5, 2017 at 9:19 PM, Erick Erickson
> > >
> > >wrote:
> > >
> > >> In a word, "no". You must re-index from scratch. Worse, now that you
> > >> have some segments thinking the fields are docValues and some not and
> > >> maybe some mixed, I know of no way to un-entangle them.
> > >>
> > >> I'd create a new collection and re-index it entirely, then use
> > >> collection aliasing to point the applications at the new collection.
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >> On Fri, May 5, 2017 at 2:49 AM, aravinth thangasami
> > >>  wrote:
> > >> > Hi all,
> > >> >
> > >> > On process of moving to Lucene 5 from Lucene 4, we faced this
> > >following
> > >> > issue
> > >> > We have enabled doc values in Lucene 5.we previously don't used doc
> > >> values
> > >> > in Lucene 4
> > >> >
> > >> > Using UninvertingReader, sorting works fine until the first merge
> > >> happens.
> > >> > On merge documents in the older version without doc values affect
> > >the
> > >> > sorting order.
> > >> >
> > >> > Is there any way to solve this issue without reindexing ???
> > >> >
> > >> > What is  your opinion on it ?
> > >> >
> > >> > I was thinking about these two ways.will these possible ?
> > >> >
> > >> > 1. Does Uninverting Reader can be made to store the formed doc
> > >values to
> > >> > disk ?
> > >> > 2. During merge, does IndexWriter can be made to write the doc
> > >values for
> > >> > documents without doc value ?
> > >> >
> > >> >
> > >> >
> > >> > Thanks
> > >> > Aravinth
> > >>
> > >> -
> > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> > >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> > >>
> > >>
> >
> > --
> > Uwe Schindler
> > Achterdiek 19, 28357 Bremen
> > https://www.thetaphi.de
>
>
> -
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>