[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...

2018-12-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/lucene-solr/pull/516


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...

2018-12-06 Thread jpountz
Github user jpountz commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/516#discussion_r239448053
  
--- Diff: 
lucene/backward-codecs/src/java/org/apache/lucene/codecs/lucene70/Lucene70Codec.java
 ---
@@ -66,7 +67,7 @@ public PostingsFormat getPostingsFormatForField(String 
field) {
   private final DocValuesFormat docValuesFormat = new 
PerFieldDocValuesFormat() {
 @Override
 public DocValuesFormat getDocValuesFormatForField(String field) {
-  throw new IllegalStateException("This codec should only be used for 
reading, not writing");
+  return defaultDVFormat;
--- End diff --

I don't think it's a requirement, but I like using the old format better:
 - it's probably easier to debug, as all fields would be using the same 
format
 - given that PFDVFormat creates different files per format, forcing the 
new format would increase the number of files for old segments


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...

2018-12-06 Thread mikemccand
Github user mikemccand commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/516#discussion_r239426275
  
--- Diff: 
lucene/backward-codecs/src/java/org/apache/lucene/codecs/lucene70/Lucene70Codec.java
 ---
@@ -66,7 +67,7 @@ public PostingsFormat getPostingsFormatForField(String 
field) {
   private final DocValuesFormat docValuesFormat = new 
PerFieldDocValuesFormat() {
 @Override
 public DocValuesFormat getDocValuesFormatForField(String field) {
-  throw new IllegalStateException("This codec should only be used for 
reading, not writing");
+  return defaultDVFormat;
--- End diff --

Ahh this is because the DV updates must also be written with the old codec?


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...

2018-12-06 Thread s1monw
Github user s1monw commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/516#discussion_r239409024
  
--- Diff: 
lucene/backward-codecs/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java
 ---
@@ -1587,6 +1590,76 @@ public void testDocValuesUpdates() throws Exception {
 writer.close();
 dir.close();
   }
+
+  public void testSoftDeletes() throws Exception {
+Path oldIndexDir = createTempDir("dvupdates");
+TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
+Directory dir = newFSDirectory(oldIndexDir);
+verifyUsesDefaultCodec(dir, dvUpdatesIndex);
+IndexWriterConfig conf = new IndexWriterConfig(new 
MockAnalyzer(random())).setSoftDeletesField("__soft_delete");
+IndexWriter writer = new IndexWriter(dir, conf);
+int maxDoc = writer.maxDoc();
+writer.updateDocValues(new Term("id", "1"),new 
NumericDocValuesField("__soft_delete", 1));
+
+if (random().nextBoolean()) {
+  writer.commit();
+}
+writer.forceMerge(1);
+writer.commit();
+assertEquals(maxDoc-1, writer.maxDoc());
+writer.close();
+dir.close();
+  }
+
+  public void testDocValuesUpdatesWithNewField() throws Exception {
+Path oldIndexDir = createTempDir("dvupdates");
+TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
+Directory dir = newFSDirectory(oldIndexDir);
+verifyUsesDefaultCodec(dir, dvUpdatesIndex);
+
+// update fields and verify index
+IndexWriterConfig conf = new IndexWriterConfig(new 
MockAnalyzer(random()));
+IndexWriter writer = new IndexWriter(dir, conf);
+// introduce a new field that we later update
+writer.addDocument(Arrays.asList(new StringField("id", "" + 
Integer.MAX_VALUE, Field.Store.NO),
+new NumericDocValuesField("new_numeric", 1),
+new BinaryDocValuesField("new_binary", toBytes(1;
+writer.updateNumericDocValue(new Term("id", "1"), "new_numeric", 1);
+writer.updateBinaryDocValue(new Term("id", "1"), "new_binary", 
toBytes(1));
+
+writer.commit();
+Callable assertDV = () -> {
+  boolean found = false;
+  try (DirectoryReader reader = DirectoryReader.open(dir)) {
+for (LeafReaderContext ctx : reader.leaves()) {
+  LeafReader leafReader = ctx.reader();
+  TermsEnum id = leafReader.terms("id").iterator();
+  if (id.seekExact(new BytesRef("1"))) {
+PostingsEnum postings = id.postings(null, PostingsEnum.NONE);
+NumericDocValues numericDocValues = 
leafReader.getNumericDocValues("new_numeric");
+BinaryDocValues binaryDocValues = 
leafReader.getBinaryDocValues("new_binary");
+int doc;
+while ((doc = postings.nextDoc()) != 
DocIdSetIterator.NO_MORE_DOCS) {
+  found = true;
+  assertEquals(doc, binaryDocValues.nextDoc());
+  assertEquals(doc, numericDocValues.nextDoc());
+  assertEquals(1, numericDocValues.longValue());
+  assertEquals(toBytes(1), binaryDocValues.binaryValue());
+}
+  }
--- End diff --

well this doesn't work since we have an other doc with a value in this 
field I guess and it might be merged into a single segment. which we also force


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...

2018-12-06 Thread s1monw
Github user s1monw commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/516#discussion_r239408501
  
--- Diff: 
lucene/backward-codecs/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java
 ---
@@ -1587,6 +1590,76 @@ public void testDocValuesUpdates() throws Exception {
 writer.close();
 dir.close();
   }
+
+  public void testSoftDeletes() throws Exception {
+Path oldIndexDir = createTempDir("dvupdates");
+TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
+Directory dir = newFSDirectory(oldIndexDir);
+verifyUsesDefaultCodec(dir, dvUpdatesIndex);
+IndexWriterConfig conf = new IndexWriterConfig(new 
MockAnalyzer(random())).setSoftDeletesField("__soft_delete");
+IndexWriter writer = new IndexWriter(dir, conf);
+int maxDoc = writer.maxDoc();
+writer.updateDocValues(new Term("id", "1"),new 
NumericDocValuesField("__soft_delete", 1));
+
+if (random().nextBoolean()) {
+  writer.commit();
+}
+writer.forceMerge(1);
+writer.commit();
+assertEquals(maxDoc-1, writer.maxDoc());
+writer.close();
+dir.close();
+  }
+
+  public void testDocValuesUpdatesWithNewField() throws Exception {
+Path oldIndexDir = createTempDir("dvupdates");
+TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
+Directory dir = newFSDirectory(oldIndexDir);
+verifyUsesDefaultCodec(dir, dvUpdatesIndex);
+
+// update fields and verify index
+IndexWriterConfig conf = new IndexWriterConfig(new 
MockAnalyzer(random()));
+IndexWriter writer = new IndexWriter(dir, conf);
+// introduce a new field that we later update
+writer.addDocument(Arrays.asList(new StringField("id", "" + 
Integer.MAX_VALUE, Field.Store.NO),
+new NumericDocValuesField("new_numeric", 1),
+new BinaryDocValuesField("new_binary", toBytes(1;
+writer.updateNumericDocValue(new Term("id", "1"), "new_numeric", 1);
+writer.updateBinaryDocValue(new Term("id", "1"), "new_binary", 
toBytes(1));
+
+writer.commit();
+Callable assertDV = () -> {
+  boolean found = false;
+  try (DirectoryReader reader = DirectoryReader.open(dir)) {
+for (LeafReaderContext ctx : reader.leaves()) {
+  LeafReader leafReader = ctx.reader();
+  TermsEnum id = leafReader.terms("id").iterator();
+  if (id.seekExact(new BytesRef("1"))) {
+PostingsEnum postings = id.postings(null, PostingsEnum.NONE);
+NumericDocValues numericDocValues = 
leafReader.getNumericDocValues("new_numeric");
+BinaryDocValues binaryDocValues = 
leafReader.getBinaryDocValues("new_binary");
+int doc;
+while ((doc = postings.nextDoc()) != 
DocIdSetIterator.NO_MORE_DOCS) {
+  found = true;
+  assertEquals(doc, binaryDocValues.nextDoc());
+  assertEquals(doc, numericDocValues.nextDoc());
+  assertEquals(1, numericDocValues.longValue());
+  assertEquals(toBytes(1), binaryDocValues.binaryValue());
+}
+  }
--- End diff --

👍 


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...

2018-12-06 Thread s1monw
Github user s1monw commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/516#discussion_r239408275
  
--- Diff: 
lucene/backward-codecs/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java
 ---
@@ -1587,6 +1590,76 @@ public void testDocValuesUpdates() throws Exception {
 writer.close();
 dir.close();
   }
+
+  public void testSoftDeletes() throws Exception {
+Path oldIndexDir = createTempDir("dvupdates");
+TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
+Directory dir = newFSDirectory(oldIndexDir);
+verifyUsesDefaultCodec(dir, dvUpdatesIndex);
+IndexWriterConfig conf = new IndexWriterConfig(new 
MockAnalyzer(random())).setSoftDeletesField("__soft_delete");
+IndexWriter writer = new IndexWriter(dir, conf);
+int maxDoc = writer.maxDoc();
+writer.updateDocValues(new Term("id", "1"),new 
NumericDocValuesField("__soft_delete", 1));
+
+if (random().nextBoolean()) {
+  writer.commit();
+}
+writer.forceMerge(1);
+writer.commit();
+assertEquals(maxDoc-1, writer.maxDoc());
+writer.close();
+dir.close();
+  }
+
+  public void testDocValuesUpdatesWithNewField() throws Exception {
+Path oldIndexDir = createTempDir("dvupdates");
+TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
+Directory dir = newFSDirectory(oldIndexDir);
+verifyUsesDefaultCodec(dir, dvUpdatesIndex);
+
+// update fields and verify index
+IndexWriterConfig conf = new IndexWriterConfig(new 
MockAnalyzer(random()));
+IndexWriter writer = new IndexWriter(dir, conf);
+// introduce a new field that we later update
+writer.addDocument(Arrays.asList(new StringField("id", "" + 
Integer.MAX_VALUE, Field.Store.NO),
+new NumericDocValuesField("new_numeric", 1),
+new BinaryDocValuesField("new_binary", toBytes(1;
+writer.updateNumericDocValue(new Term("id", "1"), "new_numeric", 1);
+writer.updateBinaryDocValue(new Term("id", "1"), "new_binary", 
toBytes(1));
+
+writer.commit();
+Callable assertDV = () -> {
--- End diff --

it's throws an exception hence the callable


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...

2018-12-06 Thread jpountz
Github user jpountz commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/516#discussion_r239408003
  
--- Diff: 
lucene/backward-codecs/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java
 ---
@@ -1587,6 +1590,76 @@ public void testDocValuesUpdates() throws Exception {
 writer.close();
 dir.close();
   }
+
+  public void testSoftDeletes() throws Exception {
+Path oldIndexDir = createTempDir("dvupdates");
+TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
+Directory dir = newFSDirectory(oldIndexDir);
+verifyUsesDefaultCodec(dir, dvUpdatesIndex);
+IndexWriterConfig conf = new IndexWriterConfig(new 
MockAnalyzer(random())).setSoftDeletesField("__soft_delete");
+IndexWriter writer = new IndexWriter(dir, conf);
+int maxDoc = writer.maxDoc();
+writer.updateDocValues(new Term("id", "1"),new 
NumericDocValuesField("__soft_delete", 1));
+
+if (random().nextBoolean()) {
+  writer.commit();
+}
+writer.forceMerge(1);
+writer.commit();
+assertEquals(maxDoc-1, writer.maxDoc());
+writer.close();
+dir.close();
+  }
+
+  public void testDocValuesUpdatesWithNewField() throws Exception {
+Path oldIndexDir = createTempDir("dvupdates");
+TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
+Directory dir = newFSDirectory(oldIndexDir);
+verifyUsesDefaultCodec(dir, dvUpdatesIndex);
+
+// update fields and verify index
+IndexWriterConfig conf = new IndexWriterConfig(new 
MockAnalyzer(random()));
+IndexWriter writer = new IndexWriter(dir, conf);
+// introduce a new field that we later update
+writer.addDocument(Arrays.asList(new StringField("id", "" + 
Integer.MAX_VALUE, Field.Store.NO),
+new NumericDocValuesField("new_numeric", 1),
+new BinaryDocValuesField("new_binary", toBytes(1;
+writer.updateNumericDocValue(new Term("id", "1"), "new_numeric", 1);
+writer.updateBinaryDocValue(new Term("id", "1"), "new_binary", 
toBytes(1));
+
+writer.commit();
+Callable assertDV = () -> {
--- End diff --

s/Callable/Runnable/ since you don't care about the return value/


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...

2018-12-06 Thread jpountz
Github user jpountz commented on a diff in the pull request:

https://github.com/apache/lucene-solr/pull/516#discussion_r239407845
  
--- Diff: 
lucene/backward-codecs/src/test/org/apache/lucene/index/TestBackwardsCompatibility.java
 ---
@@ -1587,6 +1590,76 @@ public void testDocValuesUpdates() throws Exception {
 writer.close();
 dir.close();
   }
+
+  public void testSoftDeletes() throws Exception {
+Path oldIndexDir = createTempDir("dvupdates");
+TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
+Directory dir = newFSDirectory(oldIndexDir);
+verifyUsesDefaultCodec(dir, dvUpdatesIndex);
+IndexWriterConfig conf = new IndexWriterConfig(new 
MockAnalyzer(random())).setSoftDeletesField("__soft_delete");
+IndexWriter writer = new IndexWriter(dir, conf);
+int maxDoc = writer.maxDoc();
+writer.updateDocValues(new Term("id", "1"),new 
NumericDocValuesField("__soft_delete", 1));
+
+if (random().nextBoolean()) {
+  writer.commit();
+}
+writer.forceMerge(1);
+writer.commit();
+assertEquals(maxDoc-1, writer.maxDoc());
+writer.close();
+dir.close();
+  }
+
+  public void testDocValuesUpdatesWithNewField() throws Exception {
+Path oldIndexDir = createTempDir("dvupdates");
+TestUtil.unzip(getDataInputStream(dvUpdatesIndex), oldIndexDir);
+Directory dir = newFSDirectory(oldIndexDir);
+verifyUsesDefaultCodec(dir, dvUpdatesIndex);
+
+// update fields and verify index
+IndexWriterConfig conf = new IndexWriterConfig(new 
MockAnalyzer(random()));
+IndexWriter writer = new IndexWriter(dir, conf);
+// introduce a new field that we later update
+writer.addDocument(Arrays.asList(new StringField("id", "" + 
Integer.MAX_VALUE, Field.Store.NO),
+new NumericDocValuesField("new_numeric", 1),
+new BinaryDocValuesField("new_binary", toBytes(1;
+writer.updateNumericDocValue(new Term("id", "1"), "new_numeric", 1);
+writer.updateBinaryDocValue(new Term("id", "1"), "new_binary", 
toBytes(1));
+
+writer.commit();
+Callable assertDV = () -> {
+  boolean found = false;
+  try (DirectoryReader reader = DirectoryReader.open(dir)) {
+for (LeafReaderContext ctx : reader.leaves()) {
+  LeafReader leafReader = ctx.reader();
+  TermsEnum id = leafReader.terms("id").iterator();
+  if (id.seekExact(new BytesRef("1"))) {
+PostingsEnum postings = id.postings(null, PostingsEnum.NONE);
+NumericDocValues numericDocValues = 
leafReader.getNumericDocValues("new_numeric");
+BinaryDocValues binaryDocValues = 
leafReader.getBinaryDocValues("new_binary");
+int doc;
+while ((doc = postings.nextDoc()) != 
DocIdSetIterator.NO_MORE_DOCS) {
+  found = true;
+  assertEquals(doc, binaryDocValues.nextDoc());
+  assertEquals(doc, numericDocValues.nextDoc());
+  assertEquals(1, numericDocValues.longValue());
+  assertEquals(toBytes(1), binaryDocValues.binaryValue());
+}
+  }
--- End diff --

for completeness, add `assertEquals(DocIdSetIterator.NO_MORE_DOCS, 
binaryDocValues.nextDoc()); assertEquals(DocIdSetIterator.NO_MORE_DOCS, 
numericDocValues.nextDoc());`?


---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[GitHub] lucene-solr pull request #516: LUCENE-8594: DV update are broken for updates...

2018-12-06 Thread s1monw
GitHub user s1monw opened a pull request:

https://github.com/apache/lucene-solr/pull/516

LUCENE-8594: DV update are broken for updates on new field

A segmemnt written with Lucene70Codec failes if it ties to update
a DV field that didn't exist in the index before it was upgraded to
Lucene80Codec. We bake the DV format into the FieldInfo when it's used
the first time and therefor never go to the codec if we need to update.
yet on a field that didn't exist before and was added during an indexing
operation we have to consult the coded and get an exception.
This change fixes this issue and adds the relevant bwc tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/s1monw/lucene-solr fix_dv_update_on_old_index

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/516.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #516


commit 45bb0938105cae662d04f757e1cfe70ecb76
Author: Simon Willnauer 
Date:   2018-12-06T10:31:02Z

LUCENE-8594: DV update are broken for updates on new field

A segmemnt written with Lucene70Codec failes if it ties to update
a DV field that didn't exist in the index before it was upgraded to
Lucene80Codec. We bake the DV format into the FieldInfo when it's used
the first time and therefor never go to the codec if we need to update.
yet on a field that didn't exist before and was added during an indexing
operation we have to consult the coded and get an exception.
This change fixes this issue and adds the relevant bwc tests.




---

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org