Re: [PR] Core: Set lastAddedSchemaId in case the same view version is being added as part of a concurrent update [iceberg]
nastra commented on code in PR #14997:
URL: https://github.com/apache/iceberg/pull/14997#discussion_r2676518191
##
core/src/main/java/org/apache/iceberg/view/ViewMetadata.java:
##
@@ -358,7 +367,9 @@ private boolean sameViewVersion(ViewVersion one,
ViewVersion two) {
&& Objects.equals(one.representations(), two.representations())
&& Objects.equals(one.defaultCatalog(), two.defaultCatalog())
&& Objects.equals(one.defaultNamespace(), two.defaultNamespace())
- && one.schemaId() == two.schemaId();
+ && (one.schemaId() == two.schemaId()
+ || (two.schemaId() == LAST_ADDED
+ && Objects.equals(lastSeenExistingSchemaId,
one.schemaId(;
Review Comment:
yes I agree that this should be made symmetrical. I was mostly exploring
this part to do proper deduplication for the edge case that we're testing for
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Core: Set lastAddedSchemaId in case the same view version is being added as part of a concurrent update [iceberg]
nastra commented on code in PR #14997:
URL: https://github.com/apache/iceberg/pull/14997#discussion_r2676510829
##
core/src/main/java/org/apache/iceberg/view/ViewMetadata.java:
##
@@ -369,6 +380,7 @@ public Builder addSchema(Schema schema) {
private int addSchemaInternal(Schema schema) {
int newSchemaId = reuseOrCreateNewSchemaId(schema);
if (schemasById.containsKey(newSchemaId)) {
+this.lastSeenExistingSchemaId = newSchemaId;
Review Comment:
setting `lastAddedSchemaId` has other implications. See also
https://github.com/apache/iceberg/pull/14997#discussion_r2672797167
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Core: Set lastAddedSchemaId in case the same view version is being added as part of a concurrent update [iceberg]
pvary commented on code in PR #14997:
URL: https://github.com/apache/iceberg/pull/14997#discussion_r2676300592
##
core/src/main/java/org/apache/iceberg/view/ViewMetadata.java:
##
@@ -358,7 +367,9 @@ private boolean sameViewVersion(ViewVersion one,
ViewVersion two) {
&& Objects.equals(one.representations(), two.representations())
&& Objects.equals(one.defaultCatalog(), two.defaultCatalog())
&& Objects.equals(one.defaultNamespace(), two.defaultNamespace())
- && one.schemaId() == two.schemaId();
+ && (one.schemaId() == two.schemaId()
+ || (two.schemaId() == LAST_ADDED
+ && Objects.equals(lastSeenExistingSchemaId,
one.schemaId(;
Review Comment:
the method name suggests that the comparison is symmetrical, but it is not
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Core: Set lastAddedSchemaId in case the same view version is being added as part of a concurrent update [iceberg]
pvary commented on code in PR #14997:
URL: https://github.com/apache/iceberg/pull/14997#discussion_r2676287672
##
core/src/main/java/org/apache/iceberg/view/ViewMetadata.java:
##
@@ -358,7 +367,9 @@ private boolean sameViewVersion(ViewVersion one,
ViewVersion two) {
&& Objects.equals(one.representations(), two.representations())
&& Objects.equals(one.defaultCatalog(), two.defaultCatalog())
&& Objects.equals(one.defaultNamespace(), two.defaultNamespace())
- && one.schemaId() == two.schemaId();
+ && (one.schemaId() == two.schemaId()
+ || (two.schemaId() == LAST_ADDED
+ && Objects.equals(lastSeenExistingSchemaId,
one.schemaId(;
Review Comment:
This is a bit ugly - there is a difference between `one` and `two`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Core: Set lastAddedSchemaId in case the same view version is being added as part of a concurrent update [iceberg]
pvary commented on code in PR #14997:
URL: https://github.com/apache/iceberg/pull/14997#discussion_r2676263932
##
core/src/main/java/org/apache/iceberg/view/ViewMetadata.java:
##
@@ -369,6 +380,7 @@ public Builder addSchema(Schema schema) {
private int addSchemaInternal(Schema schema) {
int newSchemaId = reuseOrCreateNewSchemaId(schema);
if (schemasById.containsKey(newSchemaId)) {
+this.lastSeenExistingSchemaId = newSchemaId;
Review Comment:
Ok.. I see 2 failing tests:
org.apache.iceberg.view.ViewCatalogTests#replaceViewVersionByUpdatingSQLForDialect
org.apache.iceberg.view.ViewCatalogTests#concurrentReplaceViewVersion
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Core: Set lastAddedSchemaId in case the same view version is being added as part of a concurrent update [iceberg]
pvary commented on code in PR #14997:
URL: https://github.com/apache/iceberg/pull/14997#discussion_r2676249855
##
core/src/main/java/org/apache/iceberg/view/ViewMetadata.java:
##
@@ -369,6 +380,7 @@ public Builder addSchema(Schema schema) {
private int addSchemaInternal(Schema schema) {
int newSchemaId = reuseOrCreateNewSchemaId(schema);
if (schemasById.containsKey(newSchemaId)) {
+this.lastSeenExistingSchemaId = newSchemaId;
Review Comment:
Why not just:
```
this.lastAddedSchemaId = newSchemaId;
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Core: Set lastAddedSchemaId in case the same view version is being added as part of a concurrent update [iceberg]
nastra commented on code in PR #14997:
URL: https://github.com/apache/iceberg/pull/14997#discussion_r2672797167
##
core/src/main/java/org/apache/iceberg/view/ViewMetadata.java:
##
@@ -369,6 +369,12 @@ public Builder addSchema(Schema schema) {
private int addSchemaInternal(Schema schema) {
int newSchemaId = reuseOrCreateNewSchemaId(schema);
if (schemasById.containsKey(newSchemaId)) {
+if (null == lastAddedSchemaId) {
Review Comment:
just FYI that this is most likely currently wrong, because the implication
of setting `lastAddedSchemaId` here is that the metadata update will contain a
`-1` as the schema ID in `addVersionInternal()`.
The internal state tracking is quite delicate here, so I'm currently
exploring a few other options on how to achieve a concurrent replace operation
to not fail due to internal state tracking in `ViewMetadata`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Core: Set lastAddedSchemaId in case the same view version is being added as part of a concurrent update [iceberg]
nastra commented on code in PR #14997:
URL: https://github.com/apache/iceberg/pull/14997#discussion_r2672490061
##
core/src/test/java/org/apache/iceberg/view/TestViewMetadata.java:
##
@@ -995,6 +995,40 @@ public void
deduplicatingViewVersionByIdAndAssigningSchemaId() {
assertThat(metadata.currentVersion().schemaId()).isEqualTo(1);
}
+ @Test
+ public void applySameViewVersionAndSchemaUpdateWithSchemaIdAssignment() {
+Schema schema = new Schema(Types.NestedField.required(-1, "x",
Types.LongType.get()));
+ViewVersion viewVersion = newViewVersion(1, -1, "select * from ns.tbl");
+ViewMetadata metadata =
+ViewMetadata.builder()
+.setLocation("custom-location")
+.addSchema(schema)
+.addVersion(viewVersion)
+.setCurrentVersionId(1)
+.build();
+assertThat(metadata.versions()).hasSize(1);
+assertThat(metadata.currentVersion().versionId()).isEqualTo(1);
+assertThat(metadata.currentVersion().schemaId()).isEqualTo(0);
+
+// simulates a case where the same view update is applied twice and the
schema ID is set to -1
+// (indicating that the schema ID should be automatically assigned)
+// this scenario can happen with concurrent updates in REST cases where
the same update is
+// applied twice. The view version gets a new ID assigned because
+// ViewMetadata#sameViewVersion(current, updated) isn't true, because the
current's schemaId was
Review Comment:
we should probably deduplicate this properly. I'm working on a fix for this
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
[PR] Core: Set lastAddedSchemaId in case the same view version is being added as part of a concurrent update [iceberg]
nastra opened a new pull request, #14997:
URL: https://github.com/apache/iceberg/pull/14997
This fixes an issue that @haizhou-zhao brought up in
https://github.com/apache/iceberg/pull/14334.
Basically the test added in https://github.com/apache/iceberg/pull/14334
performs a concurrent update of the same view version, but fails with
```
org.apache.iceberg.exceptions.ValidationException: Cannot set last added
schema: no schema has been added
at
org.apache.iceberg.exceptions.ValidationException.check(ValidationException.java:49)
at
org.apache.iceberg.view.ViewMetadata$Builder.addVersionInternal(ViewMetadata.java:297)
at
org.apache.iceberg.view.ViewMetadata$Builder.addVersion(ViewMetadata.java:277)
at
org.apache.iceberg.MetadataUpdate$AddViewVersion.applyTo(MetadataUpdate.java:508)
at
org.apache.iceberg.rest.CatalogHandlers.lambda$commit$11(CatalogHandlers.java:624)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1511)
at
org.apache.iceberg.rest.CatalogHandlers.lambda$commit$12(CatalogHandlers.java:624)
```
This is due to our internal state tracking of `lastAddedSchemaId`, which is
then assumed to be set when adding the view version and checking
```
if (version.schemaId() == LAST_ADDED) {
ValidationException.check(lastAddedSchemaId != null, "Cannot set last
added schema: no schema has been added");
version =
ImmutableViewVersion.builder().from(version).schemaId(lastAddedSchemaId).build();
}
```
I added a reproducible test to `TestViewMetadata` where the schema ID is set
to `-1`, indicating that the schema ID can be re-assigned.
Once we get this change in, we should also get
https://github.com/apache/iceberg/pull/14334 in, as that reproduces the issue
and has a good test for it.
@huaxingao, @singhpk234, @amogh-jahagirdar since you guys reviewed
https://github.com/apache/iceberg/pull/14434 already, could you please review
this one as well?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
