[ https://issues.apache.org/jira/browse/LUCENE-5693?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated LUCENE-5693: --------------------------------------- Attachment: LUCENE-5693.patch Patch, decoupled from LUCENE-5675. Tests pass. The trickiest one was the new TestFieldCacheVsDocValues: it heavily relies on being able to read deleted docs from postings, which I think is invalid. I also had to fix CheckIndex to not verify term vectors for deleted docs; I think that's fair. The core fix is easy: FreqProxFields (passed to the PostingsWriterat flush) just skips the deleted docs. Also, this uncovered a bug in ToParentBJQ.explain's handling of deleted docs. > don't write deleted documents on flush > -------------------------------------- > > Key: LUCENE-5693 > URL: https://issues.apache.org/jira/browse/LUCENE-5693 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Michael McCandless > Attachments: LUCENE-5693.patch > > > When we flush a new segment, sometimes some documents are "born deleted", > e.g. if the app did a IW.deleteDocuments that matched some not-yet-flushed > documents. > We already compute the liveDocs on flush, but then we continue (wastefully) > to send those known-deleted documents to all Codec parts. > I started to implement this on LUCENE-5675 but it was too controversial. > Also, I expect typically the number of deleted docs is 0, or small, so not > writing "born deleted" docs won't be much of a win for most apps. Still it > seems silly to write them, consuming IO/CPU in the process, only to consume > more IO/CPU later for merging to re-delete them. -- This message was sent by Atlassian JIRA (v6.2#6252) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org