[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...
Github user asfgit closed the pull request at: https://github.com/apache/lucene-solr/pull/516 --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...
Github user jpountz commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/516#discussion_r239448053 --- Diff: lucene/backward-codecs/src/java/org/apache/lucene/codecs/lucene70/Lucene70Codec.java --- @@ -66,7 +67,7 @@ public PostingsFormat getPostingsFormatForField(String field) { private final DocValuesFormat docValuesFormat = new PerFieldDocValuesFormat() { @Override public DocValuesFormat getDocValuesFormatForField(String field) { - throw new IllegalStateException("This codec should only be used for reading, not writing"); + return defaultDVFormat; --- End diff -- I don't think it's a requirement, but I like using the old format better: - it's probably easier to debug, as all fields would be using the same format - given that PFDVFormat creates different files per format, forcing the new format would increase the number of files for old segments --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...
Github user mikemccand commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/516#discussion_r239426275 --- Diff: lucene/backward-codecs/src/java/org/apache/lucene/codecs/lucene70/Lucene70Codec.java --- @@ -66,7 +67,7 @@ public PostingsFormat getPostingsFormatForField(String field) { private final DocValuesFormat docValuesFormat = new PerFieldDocValuesFormat() { @Override public DocValuesFormat getDocValuesFormatForField(String field) { - throw new IllegalStateException("This codec should only be used for reading, not writing"); + return defaultDVFormat; --- End diff -- Ahh this is because the DV updates must also be written with the old codec? --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...
Github user s1monw commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/516#discussion_r239409024 --- Diff: lucene/backward-codecs/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java --- @@ -1587,6 +1590,76 @@ public void testDocValuesUpdates() throws Exception { writer.close(); dir.close(); } + + public void testSoftDeletes() throws Exception { +Path oldIndexDir = createTempDir("dvupdates"); +TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir); +Directory dir = newFSDirectory(oldIndexDir); +verifyUsesDefaultCodec(dir, dvUpdatesIndex); +IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random())).setSoftDeletesField("__soft_delete"); +IndexWriter writer = new IndexWriter(dir, conf); +int maxDoc = writer.maxDoc(); +writer.updateDocValues(new Term("id", "1"),new NumericDocValuesField("__soft_delete", 1)); + +if (random().nextBoolean()) { + writer.commit(); +} +writer.forceMerge(1); +writer.commit(); +assertEquals(maxDoc-1, writer.maxDoc()); +writer.close(); +dir.close(); + } + + public void testDocValuesUpdatesWithNewField() throws Exception { +Path oldIndexDir = createTempDir("dvupdates"); +TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir); +Directory dir = newFSDirectory(oldIndexDir); +verifyUsesDefaultCodec(dir, dvUpdatesIndex); + +// update fields and verify index +IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random())); +IndexWriter writer = new IndexWriter(dir, conf); +// introduce a new field that we later update +writer.addDocument(Arrays.asList(new StringField("id", "" + Integer.MAX_VALUE, Field.Store.NO), +new NumericDocValuesField("new_numeric", 1), +new BinaryDocValuesField("new_binary", toBytes(1; +writer.updateNumericDocValue(new Term("id", "1"), "new_numeric", 1); +writer.updateBinaryDocValue(new Term("id", "1"), "new_binary", toBytes(1)); + +writer.commit(); +Callable assertDV = () -> { + boolean found = false; + try (DirectoryReader reader = DirectoryReader.open(dir)) { +for (LeafReaderContext ctx : reader.leaves()) { + LeafReader leafReader = ctx.reader(); + TermsEnum id = leafReader.terms("id").iterator(); + if (id.seekExact(new BytesRef("1"))) { +PostingsEnum postings = id.postings(null, PostingsEnum.NONE); +NumericDocValues numericDocValues = leafReader.getNumericDocValues("new_numeric"); +BinaryDocValues binaryDocValues = leafReader.getBinaryDocValues("new_binary"); +int doc; +while ((doc = postings.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) { + found = true; + assertEquals(doc, binaryDocValues.nextDoc()); + assertEquals(doc, numericDocValues.nextDoc()); + assertEquals(1, numericDocValues.longValue()); + assertEquals(toBytes(1), binaryDocValues.binaryValue()); +} + } --- End diff -- well this doesn't work since we have an other doc with a value in this field I guess and it might be merged into a single segment. which we also force --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...
Github user s1monw commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/516#discussion_r239408501 --- Diff: lucene/backward-codecs/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java --- @@ -1587,6 +1590,76 @@ public void testDocValuesUpdates() throws Exception { writer.close(); dir.close(); } + + public void testSoftDeletes() throws Exception { +Path oldIndexDir = createTempDir("dvupdates"); +TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir); +Directory dir = newFSDirectory(oldIndexDir); +verifyUsesDefaultCodec(dir, dvUpdatesIndex); +IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random())).setSoftDeletesField("__soft_delete"); +IndexWriter writer = new IndexWriter(dir, conf); +int maxDoc = writer.maxDoc(); +writer.updateDocValues(new Term("id", "1"),new NumericDocValuesField("__soft_delete", 1)); + +if (random().nextBoolean()) { + writer.commit(); +} +writer.forceMerge(1); +writer.commit(); +assertEquals(maxDoc-1, writer.maxDoc()); +writer.close(); +dir.close(); + } + + public void testDocValuesUpdatesWithNewField() throws Exception { +Path oldIndexDir = createTempDir("dvupdates"); +TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir); +Directory dir = newFSDirectory(oldIndexDir); +verifyUsesDefaultCodec(dir, dvUpdatesIndex); + +// update fields and verify index +IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random())); +IndexWriter writer = new IndexWriter(dir, conf); +// introduce a new field that we later update +writer.addDocument(Arrays.asList(new StringField("id", "" + Integer.MAX_VALUE, Field.Store.NO), +new NumericDocValuesField("new_numeric", 1), +new BinaryDocValuesField("new_binary", toBytes(1; +writer.updateNumericDocValue(new Term("id", "1"), "new_numeric", 1); +writer.updateBinaryDocValue(new Term("id", "1"), "new_binary", toBytes(1)); + +writer.commit(); +Callable assertDV = () -> { + boolean found = false; + try (DirectoryReader reader = DirectoryReader.open(dir)) { +for (LeafReaderContext ctx : reader.leaves()) { + LeafReader leafReader = ctx.reader(); + TermsEnum id = leafReader.terms("id").iterator(); + if (id.seekExact(new BytesRef("1"))) { +PostingsEnum postings = id.postings(null, PostingsEnum.NONE); +NumericDocValues numericDocValues = leafReader.getNumericDocValues("new_numeric"); +BinaryDocValues binaryDocValues = leafReader.getBinaryDocValues("new_binary"); +int doc; +while ((doc = postings.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) { + found = true; + assertEquals(doc, binaryDocValues.nextDoc()); + assertEquals(doc, numericDocValues.nextDoc()); + assertEquals(1, numericDocValues.longValue()); + assertEquals(toBytes(1), binaryDocValues.binaryValue()); +} + } --- End diff -- ð --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...
Github user s1monw commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/516#discussion_r239408275 --- Diff: lucene/backward-codecs/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java --- @@ -1587,6 +1590,76 @@ public void testDocValuesUpdates() throws Exception { writer.close(); dir.close(); } + + public void testSoftDeletes() throws Exception { +Path oldIndexDir = createTempDir("dvupdates"); +TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir); +Directory dir = newFSDirectory(oldIndexDir); +verifyUsesDefaultCodec(dir, dvUpdatesIndex); +IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random())).setSoftDeletesField("__soft_delete"); +IndexWriter writer = new IndexWriter(dir, conf); +int maxDoc = writer.maxDoc(); +writer.updateDocValues(new Term("id", "1"),new NumericDocValuesField("__soft_delete", 1)); + +if (random().nextBoolean()) { + writer.commit(); +} +writer.forceMerge(1); +writer.commit(); +assertEquals(maxDoc-1, writer.maxDoc()); +writer.close(); +dir.close(); + } + + public void testDocValuesUpdatesWithNewField() throws Exception { +Path oldIndexDir = createTempDir("dvupdates"); +TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir); +Directory dir = newFSDirectory(oldIndexDir); +verifyUsesDefaultCodec(dir, dvUpdatesIndex); + +// update fields and verify index +IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random())); +IndexWriter writer = new IndexWriter(dir, conf); +// introduce a new field that we later update +writer.addDocument(Arrays.asList(new StringField("id", "" + Integer.MAX_VALUE, Field.Store.NO), +new NumericDocValuesField("new_numeric", 1), +new BinaryDocValuesField("new_binary", toBytes(1; +writer.updateNumericDocValue(new Term("id", "1"), "new_numeric", 1); +writer.updateBinaryDocValue(new Term("id", "1"), "new_binary", toBytes(1)); + +writer.commit(); +Callable assertDV = () -> { --- End diff -- it's throws an exception hence the callable --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...
Github user jpountz commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/516#discussion_r239408003 --- Diff: lucene/backward-codecs/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java --- @@ -1587,6 +1590,76 @@ public void testDocValuesUpdates() throws Exception { writer.close(); dir.close(); } + + public void testSoftDeletes() throws Exception { +Path oldIndexDir = createTempDir("dvupdates"); +TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir); +Directory dir = newFSDirectory(oldIndexDir); +verifyUsesDefaultCodec(dir, dvUpdatesIndex); +IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random())).setSoftDeletesField("__soft_delete"); +IndexWriter writer = new IndexWriter(dir, conf); +int maxDoc = writer.maxDoc(); +writer.updateDocValues(new Term("id", "1"),new NumericDocValuesField("__soft_delete", 1)); + +if (random().nextBoolean()) { + writer.commit(); +} +writer.forceMerge(1); +writer.commit(); +assertEquals(maxDoc-1, writer.maxDoc()); +writer.close(); +dir.close(); + } + + public void testDocValuesUpdatesWithNewField() throws Exception { +Path oldIndexDir = createTempDir("dvupdates"); +TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir); +Directory dir = newFSDirectory(oldIndexDir); +verifyUsesDefaultCodec(dir, dvUpdatesIndex); + +// update fields and verify index +IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random())); +IndexWriter writer = new IndexWriter(dir, conf); +// introduce a new field that we later update +writer.addDocument(Arrays.asList(new StringField("id", "" + Integer.MAX_VALUE, Field.Store.NO), +new NumericDocValuesField("new_numeric", 1), +new BinaryDocValuesField("new_binary", toBytes(1; +writer.updateNumericDocValue(new Term("id", "1"), "new_numeric", 1); +writer.updateBinaryDocValue(new Term("id", "1"), "new_binary", toBytes(1)); + +writer.commit(); +Callable assertDV = () -> { --- End diff -- s/Callable/Runnable/ since you don't care about the return value/ --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...
Github user jpountz commented on a diff in the pull request: https://github.com/apache/lucene-solr/pull/516#discussion_r239407845 --- Diff: lucene/backward-codecs/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java --- @@ -1587,6 +1590,76 @@ public void testDocValuesUpdates() throws Exception { writer.close(); dir.close(); } + + public void testSoftDeletes() throws Exception { +Path oldIndexDir = createTempDir("dvupdates"); +TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir); +Directory dir = newFSDirectory(oldIndexDir); +verifyUsesDefaultCodec(dir, dvUpdatesIndex); +IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random())).setSoftDeletesField("__soft_delete"); +IndexWriter writer = new IndexWriter(dir, conf); +int maxDoc = writer.maxDoc(); +writer.updateDocValues(new Term("id", "1"),new NumericDocValuesField("__soft_delete", 1)); + +if (random().nextBoolean()) { + writer.commit(); +} +writer.forceMerge(1); +writer.commit(); +assertEquals(maxDoc-1, writer.maxDoc()); +writer.close(); +dir.close(); + } + + public void testDocValuesUpdatesWithNewField() throws Exception { +Path oldIndexDir = createTempDir("dvupdates"); +TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir); +Directory dir = newFSDirectory(oldIndexDir); +verifyUsesDefaultCodec(dir, dvUpdatesIndex); + +// update fields and verify index +IndexWriterConfig conf = new IndexWriterConfig(new MockAnalyzer(random())); +IndexWriter writer = new IndexWriter(dir, conf); +// introduce a new field that we later update +writer.addDocument(Arrays.asList(new StringField("id", "" + Integer.MAX_VALUE, Field.Store.NO), +new NumericDocValuesField("new_numeric", 1), +new BinaryDocValuesField("new_binary", toBytes(1; +writer.updateNumericDocValue(new Term("id", "1"), "new_numeric", 1); +writer.updateBinaryDocValue(new Term("id", "1"), "new_binary", toBytes(1)); + +writer.commit(); +Callable assertDV = () -> { + boolean found = false; + try (DirectoryReader reader = DirectoryReader.open(dir)) { +for (LeafReaderContext ctx : reader.leaves()) { + LeafReader leafReader = ctx.reader(); + TermsEnum id = leafReader.terms("id").iterator(); + if (id.seekExact(new BytesRef("1"))) { +PostingsEnum postings = id.postings(null, PostingsEnum.NONE); +NumericDocValues numericDocValues = leafReader.getNumericDocValues("new_numeric"); +BinaryDocValues binaryDocValues = leafReader.getBinaryDocValues("new_binary"); +int doc; +while ((doc = postings.nextDoc()) != DocIdSetIterator.NO_MORE_DOCS) { + found = true; + assertEquals(doc, binaryDocValues.nextDoc()); + assertEquals(doc, numericDocValues.nextDoc()); + assertEquals(1, numericDocValues.longValue()); + assertEquals(toBytes(1), binaryDocValues.binaryValue()); +} + } --- End diff -- for completeness, add `assertEquals(DocIdSetIterator.NO_MORE_DOCS, binaryDocValues.nextDoc()); assertEquals(DocIdSetIterator.NO_MORE_DOCS, numericDocValues.nextDoc());`? --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...
GitHub user s1monw opened a pull request: https://github.com/apache/lucene-solr/pull/516 LUCENE-8594: DV update are broken for updates on new field A segmemnt written with Lucene70Codec failes if it ties to update a DV field that didn't exist in the index before it was upgraded to Lucene80Codec. We bake the DV format into the FieldInfo when it's used the first time and therefor never go to the codec if we need to update. yet on a field that didn't exist before and was added during an indexing operation we have to consult the coded and get an exception. This change fixes this issue and adds the relevant bwc tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/s1monw/lucene-solr fix_dv_update_on_old_index Alternatively you can review and apply these changes as the patch at: https://github.com/apache/lucene-solr/pull/516.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #516 commit 45bb0938105cae662d04f757e1cfe70ecb76 Author: Simon Willnauer Date: 2018-12-06T10:31:02Z LUCENE-8594: DV update are broken for updates on new field A segmemnt written with Lucene70Codec failes if it ties to update a DV field that didn't exist in the index before it was upgraded to Lucene80Codec. We bake the DV format into the FieldInfo when it's used the first time and therefor never go to the codec if we need to update. yet on a field that didn't exist before and was added during an indexing operation we have to consult the coded and get an exception. This change fixes this issue and adds the relevant bwc tests. --- - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org