Adding Docvalues to a Field
Hi all, On process of moving to Lucene 5 from Lucene 4, we faced this following issue We have enabled doc values in Lucene 5.we previously don't used doc values in Lucene 4 Using UninvertingReader, sorting works fine until the first merge happens. On merge documents in the older version without doc values affect the sorting order. Is there any way to solve this issue without reindexing ??? What is your opinion on it ? I was thinking about these two ways.will these possible ? 1. Does Uninverting Reader can be made to store the formed doc values to disk ? 2. During merge, does IndexWriter can be made to write the doc values for documents without doc value ? Thanks Aravinth
Re: Un-used index files are not getting released
The most common cause is unclosed index readers. If you run lsof against the tomcat process id and see that some deleted files are still open, that's almost certainly the problem. Then all you have to do is track it down in your code. -- Ian. On Thu, May 4, 2017 at 10:09 PM, Siraj Haider wrote: > Hi all, > We recently switched to Lucene 6.5 from 2.9 and we have an issue that the > files in index directory are not getting released after the IndexWriter > finishes up writing a batch of documents. We are using > IndexFolder.listFiles().length to check the number of files in index > folder. We have even tried closing/reopening the > IndexWriter/SearcherManager/MMapDirectory after indexing each batch to > see if that would release the files but it didn't. When we shutdown the > tomcat and restart it, only then we see that number drop, which proves that > there were some deleted files still held by Lucene somewhere. Can you > please direct me on what needs to be checked? > > P.S. I apologize for the duplicate email, as I didn't see my yesterday's > email in the list. > > Regards > -Siraj > > > > This electronic mail message and any attachments may contain information > which is privileged, sensitive and/or otherwise exempt from disclosure > under applicable law. The information is intended only for the use of the > individual or entity named as the addressee above. If you are not the > intended recipient, you are hereby notified that any disclosure, copying, > distribution (electronic or otherwise) or forwarding of, or the taking of > any action in reliance on, the contents of this transmission is strictly > prohibited. If you have received this electronic transmission in error, > please notify us by telephone, facsimile, or e-mail as noted above to > arrange for the return of any electronic mail or attachments. Thank You. >
Re: Adding Docvalues to a Field
In a word, "no". You must re-index from scratch. Worse, now that you have some segments thinking the fields are docValues and some not and maybe some mixed, I know of no way to un-entangle them. I'd create a new collection and re-index it entirely, then use collection aliasing to point the applications at the new collection. Best, Erick On Fri, May 5, 2017 at 2:49 AM, aravinth thangasami wrote: > Hi all, > > On process of moving to Lucene 5 from Lucene 4, we faced this following > issue > We have enabled doc values in Lucene 5.we previously don't used doc values > in Lucene 4 > > Using UninvertingReader, sorting works fine until the first merge happens. > On merge documents in the older version without doc values affect the > sorting order. > > Is there any way to solve this issue without reindexing ??? > > What is your opinion on it ? > > I was thinking about these two ways.will these possible ? > > 1. Does Uninverting Reader can be made to store the formed doc values to > disk ? > 2. During merge, does IndexWriter can be made to write the doc values for > documents without doc value ? > > > > Thanks > Aravinth - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Adding Docvalues to a Field
Thanks Erick On Fri, May 5, 2017 at 9:19 PM, Erick Erickson wrote: > In a word, "no". You must re-index from scratch. Worse, now that you > have some segments thinking the fields are docValues and some not and > maybe some mixed, I know of no way to un-entangle them. > > I'd create a new collection and re-index it entirely, then use > collection aliasing to point the applications at the new collection. > > Best, > Erick > > On Fri, May 5, 2017 at 2:49 AM, aravinth thangasami > wrote: > > Hi all, > > > > On process of moving to Lucene 5 from Lucene 4, we faced this following > > issue > > We have enabled doc values in Lucene 5.we previously don't used doc > values > > in Lucene 4 > > > > Using UninvertingReader, sorting works fine until the first merge > happens. > > On merge documents in the older version without doc values affect the > > sorting order. > > > > Is there any way to solve this issue without reindexing ??? > > > > What is your opinion on it ? > > > > I was thinking about these two ways.will these possible ? > > > > 1. Does Uninverting Reader can be made to store the formed doc values to > > disk ? > > 2. During merge, does IndexWriter can be made to write the doc values for > > documents without doc value ? > > > > > > > > Thanks > > Aravinth > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: Adding Docvalues to a Field
Hi Aravinth, Erick was referring to Solr. To fix your issue without fully indexing you can use merging to update the whole index. To do this use the following approach: Wrap your index using UninvertingReader. Then get all LeadReaders using the leaves() method. Wrap all this leaves with SlowCodecReaderWrapper.wrap(). You may use Java 8 stream API to do this. 😃 Then create an new index with IndexWriter and use IndexWriter.addIndex(CodecReader) and pass in the previously created wrappers, ideally one by one. Those readers are slow, but ready to be merged into a new index with DocValues. The empty Writer will then import the wrapped index and takes the emulates DocValues. This may take some time, but afterwards you have an index with all fields having the DocValues on disk. Inverting is no longer needed. I hope that helps. I can post code that should do this. There is no ready to use tool available, because you need to correctly configure the uninverter. Uwe Am 5. Mai 2017 22:12:13 MESZ schrieb aravinth thangasami : >Thanks Erick > >On Fri, May 5, 2017 at 9:19 PM, Erick Erickson > >wrote: > >> In a word, "no". You must re-index from scratch. Worse, now that you >> have some segments thinking the fields are docValues and some not and >> maybe some mixed, I know of no way to un-entangle them. >> >> I'd create a new collection and re-index it entirely, then use >> collection aliasing to point the applications at the new collection. >> >> Best, >> Erick >> >> On Fri, May 5, 2017 at 2:49 AM, aravinth thangasami >> wrote: >> > Hi all, >> > >> > On process of moving to Lucene 5 from Lucene 4, we faced this >following >> > issue >> > We have enabled doc values in Lucene 5.we previously don't used doc >> values >> > in Lucene 4 >> > >> > Using UninvertingReader, sorting works fine until the first merge >> happens. >> > On merge documents in the older version without doc values affect >the >> > sorting order. >> > >> > Is there any way to solve this issue without reindexing ??? >> > >> > What is your opinion on it ? >> > >> > I was thinking about these two ways.will these possible ? >> > >> > 1. Does Uninverting Reader can be made to store the formed doc >values to >> > disk ? >> > 2. During merge, does IndexWriter can be made to write the doc >values for >> > documents without doc value ? >> > >> > >> > >> > Thanks >> > Aravinth >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> -- Uwe Schindler Achterdiek 19, 28357 Bremen https://www.thetaphi.de
RE: Adding Docvalues to a Field
Hi Aravinth, To get rid of the partially merged (mixed) docvalues fields you can use the following additional approach on top of my previous mail: > Erick was referring to Solr. To fix your issue without fully indexing you can > use merging to update the whole index. To do this use the following > approach: > > Wrap your index using UninvertingReader. Then get all LeadReaders using > the leaves() method. The problem is by that approach, that all those leaves that have partial (!) docvalues are seen by UninvertingReader as having DocValues already and those just return the partial DocValues, so Uninverting is not done. So we have to trick UninvertingReader to ignore the already existing (partial) DocValues. So instead of wrapping the whole IndexReader, we change the workflow: - Get all leaves() of the broken docvalues/non-docvalues index - Wrap all those LeafReader instances using an anonymous FilterLeafReader instance, overriding all the DocValues-related methods to return "null" instead of calling super. This hides all partially existing doc values (not form FieldInfos, but that should not hurt). The consumer of this reader will see no DocValues. - Then wrap those filtered Readers with new UninvertingRaeder(filteredLeaf) - this adds back fresh DocValues, recalculated from the uninverted fields. Be sure to get the types right, otherwise you will get merge errors (incompatible field types). - Then wrap all those uninverting leaves with SlowCodecReaderWrapper.wrap(). This makes them mergeable (its slow and costs memory, but works). The remaining stuff as said before: > Then create an new index with IndexWriter and use > IndexWriter.addIndex(CodecReader) and pass in the previously created > wrappers, ideally one by one. Those readers are slow, but ready to be > merged into a new index with DocValues. The empty Writer will then import > the wrapped index and takes the emulates DocValues. This may take some > time, but afterwards you have an index with all fields having the DocValues > on disk. Inverting is no longer needed. > > I hope that helps. I can post code that should do this. There is no ready to > use tool available, because you need to correctly configure the uninverter. > > Uwe > > Am 5. Mai 2017 22:12:13 MESZ schrieb aravinth thangasami > : > >Thanks Erick > > > >On Fri, May 5, 2017 at 9:19 PM, Erick Erickson > > > >wrote: > > > >> In a word, "no". You must re-index from scratch. Worse, now that you > >> have some segments thinking the fields are docValues and some not and > >> maybe some mixed, I know of no way to un-entangle them. > >> > >> I'd create a new collection and re-index it entirely, then use > >> collection aliasing to point the applications at the new collection. > >> > >> Best, > >> Erick > >> > >> On Fri, May 5, 2017 at 2:49 AM, aravinth thangasami > >> wrote: > >> > Hi all, > >> > > >> > On process of moving to Lucene 5 from Lucene 4, we faced this > >following > >> > issue > >> > We have enabled doc values in Lucene 5.we previously don't used doc > >> values > >> > in Lucene 4 > >> > > >> > Using UninvertingReader, sorting works fine until the first merge > >> happens. > >> > On merge documents in the older version without doc values affect > >the > >> > sorting order. > >> > > >> > Is there any way to solve this issue without reindexing ??? > >> > > >> > What is your opinion on it ? > >> > > >> > I was thinking about these two ways.will these possible ? > >> > > >> > 1. Does Uninverting Reader can be made to store the formed doc > >values to > >> > disk ? > >> > 2. During merge, does IndexWriter can be made to write the doc > >values for > >> > documents without doc value ? > >> > > >> > > >> > > >> > Thanks > >> > Aravinth > >> > >> - > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > >> > > -- > Uwe Schindler > Achterdiek 19, 28357 Bremen > https://www.thetaphi.de - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Adding Docvalues to a Field
Will try on it. Thanks Uwe :) On Sat, May 6, 2017 at 4:02 AM, Uwe Schindler wrote: > Hi Aravinth, > > To get rid of the partially merged (mixed) docvalues fields you can use > the following additional approach on top of my previous mail: > > > Erick was referring to Solr. To fix your issue without fully indexing > you can > > use merging to update the whole index. To do this use the following > > approach: > > > > Wrap your index using UninvertingReader. Then get all LeadReaders using > > the leaves() method. > > The problem is by that approach, that all those leaves that have partial > (!) docvalues are seen by UninvertingReader as having DocValues already and > those just return the partial DocValues, so Uninverting is not done. So we > have to trick UninvertingReader to ignore the already existing (partial) > DocValues. So instead of wrapping the whole IndexReader, we change the > workflow: > > - Get all leaves() of the broken docvalues/non-docvalues index > - Wrap all those LeafReader instances using an anonymous FilterLeafReader > instance, overriding all the DocValues-related methods to return "null" > instead of calling super. This hides all partially existing doc values (not > form FieldInfos, but that should not hurt). The consumer of this reader > will see no DocValues. > - Then wrap those filtered Readers with new UninvertingRaeder(filteredLeaf) > - this adds back fresh DocValues, recalculated from the uninverted fields. > Be sure to get the types right, otherwise you will get merge errors > (incompatible field types). > - Then wrap all those uninverting leaves with > SlowCodecReaderWrapper.wrap(). This makes them mergeable (its slow and > costs memory, but works). > > The remaining stuff as said before: > > > Then create an new index with IndexWriter and use > > IndexWriter.addIndex(CodecReader) and pass in the previously created > > wrappers, ideally one by one. Those readers are slow, but ready to be > > merged into a new index with DocValues. The empty Writer will then import > > the wrapped index and takes the emulates DocValues. This may take some > > time, but afterwards you have an index with all fields having the > DocValues > > on disk. Inverting is no longer needed. > > > > I hope that helps. I can post code that should do this. There is no > ready to > > use tool available, because you need to correctly configure the > uninverter. > > > > Uwe > > > > Am 5. Mai 2017 22:12:13 MESZ schrieb aravinth thangasami > > : > > >Thanks Erick > > > > > >On Fri, May 5, 2017 at 9:19 PM, Erick Erickson > > > > > >wrote: > > > > > >> In a word, "no". You must re-index from scratch. Worse, now that you > > >> have some segments thinking the fields are docValues and some not and > > >> maybe some mixed, I know of no way to un-entangle them. > > >> > > >> I'd create a new collection and re-index it entirely, then use > > >> collection aliasing to point the applications at the new collection. > > >> > > >> Best, > > >> Erick > > >> > > >> On Fri, May 5, 2017 at 2:49 AM, aravinth thangasami > > >> wrote: > > >> > Hi all, > > >> > > > >> > On process of moving to Lucene 5 from Lucene 4, we faced this > > >following > > >> > issue > > >> > We have enabled doc values in Lucene 5.we previously don't used doc > > >> values > > >> > in Lucene 4 > > >> > > > >> > Using UninvertingReader, sorting works fine until the first merge > > >> happens. > > >> > On merge documents in the older version without doc values affect > > >the > > >> > sorting order. > > >> > > > >> > Is there any way to solve this issue without reindexing ??? > > >> > > > >> > What is your opinion on it ? > > >> > > > >> > I was thinking about these two ways.will these possible ? > > >> > > > >> > 1. Does Uninverting Reader can be made to store the formed doc > > >values to > > >> > disk ? > > >> > 2. During merge, does IndexWriter can be made to write the doc > > >values for > > >> > documents without doc value ? > > >> > > > >> > > > >> > > > >> > Thanks > > >> > Aravinth > > >> > > >> - > > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > >> > > >> > > > > -- > > Uwe Schindler > > Achterdiek 19, 28357 Bremen > > https://www.thetaphi.de > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >