[GitHub] [incubator-iceberg] manishmalhotrawork edited a comment on issue #280: Add persistent IDs to partition fields

2019-08-25 Thread GitBox
manishmalhotrawork edited a comment on issue #280: Add persistent IDs to 
partition fields
URL: 
https://github.com/apache/incubator-iceberg/issues/280#issuecomment-524733183
 
 
   @rdblue thanks for the answer.
   Curios on the flow and as how fieldId are maintained in PartitionSpec vs 
Table Schema.
   
   Here [PartitionSpec is 
initialized](https://github.com/apache/incubator-iceberg/blob/master/api/src/main/java/org/apache/iceberg/PartitionSpec.java#L63)
   And fields are copied to partition spec, which also included original IDs 
assigned to fields.
   
   and in [partitionType 
method](https://github.com/apache/incubator-iceberg/blob/master/api/src/main/java/org/apache/iceberg/PartitionSpec.java#L113)
   where new fieldID is assigned ( 1000+) for the partitionFields and returns 
StructType with those fields.
   
   Not sure if Im missing something.
   
   So, why original fieldId and ids assigned in `partitionSpec` are different ?
   How the `partitionSpec` returned `StructType` with IDs as 1000+ is used?
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] manishmalhotrawork commented on issue #280: Add persistent IDs to partition fields

2019-08-25 Thread GitBox
manishmalhotrawork commented on issue #280: Add persistent IDs to partition 
fields
URL: 
https://github.com/apache/incubator-iceberg/issues/280#issuecomment-524733183
 
 
   @rdblue thanks for the answer.
   Curios on the flow and as how fieldId are maintained in PartitionSpec vs 
Table Schema.
   
   Here [PartitionSpec is 
initialized](https://github.com/apache/incubator-iceberg/blob/master/api/src/main/java/org/apache/iceberg/PartitionSpec.java#L63)
   And fields are copied to partition spec, which also included original IDs 
assigned to fields.
   
   and in [partitionType 
method](https://github.com/apache/incubator-iceberg/blob/master/api/src/main/java/org/apache/iceberg/PartitionSpec.java#L113)
   where new fieldID is assigned ( 1000+) for the partitionFields and returns 
StructType with those fields.
   
   Not sure if Im missing something.
   
   So, why original fieldId and ids assigned in `partitionSpec` are different ?
   
   How the `partitionSpec` returned `StructType` with IDs as 1000+ is used?
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] waterlx commented on issue #394: Re-deploy web page to fix #333

2019-08-25 Thread GitBox
waterlx commented on issue #394: Re-deploy web page to fix #333
URL: 
https://github.com/apache/incubator-iceberg/issues/394#issuecomment-524706332
 
 
   The web page is updated, thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] jbapple commented on issue #348: Avoid floating point number ordering NaN semantics

2019-08-25 Thread GitBox
jbapple commented on issue #348: Avoid floating point number ordering NaN 
semantics
URL: https://github.com/apache/incubator-iceberg/pull/348#issuecomment-524681562
 
 
   OK, pushed a new version. I didn't make any changes to Java or Python code 
in order to keep this commit focused on doing one thing.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] aokolnychyi commented on a change in pull request #362: Support create and replace transactions in Catalog

2019-08-25 Thread GitBox
aokolnychyi commented on a change in pull request #362: Support create and 
replace transactions in Catalog
URL: https://github.com/apache/incubator-iceberg/pull/362#discussion_r317420150
 
 

 ##
 File path: 
hive/src/test/java/org/apache/iceberg/hive/HiveCreateReplaceTableTest.java
 ##
 @@ -0,0 +1,238 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.hive;
+
+import com.google.common.collect.Maps;
+import java.io.IOException;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.Transaction;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.AlreadyExistsException;
+import org.apache.iceberg.exceptions.NoSuchTableException;
+import org.apache.iceberg.types.Types;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.ExpectedException;
+import org.junit.rules.TemporaryFolder;
+
+import static org.apache.iceberg.PartitionSpec.builderFor;
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+public class HiveCreateReplaceTableTest extends HiveMetastoreTest {
+
+  private static final String TABLE_NAME = "tbl";
+  private static final TableIdentifier TABLE_IDENTIFIER = 
TableIdentifier.of(DB_NAME, TABLE_NAME);
+  private static final Schema SCHEMA = new Schema(
+  required(3, "id", Types.IntegerType.get()),
+  required(4, "data", Types.StringType.get())
+  );
+  private static final PartitionSpec SPEC = builderFor(SCHEMA)
+  .identity("id")
+  .build();
+
+  @Rule
+  public ExpectedException exceptionRule = ExpectedException.none();
+  @Rule
+  public TemporaryFolder temp = new TemporaryFolder();
+
+  private String tableLocation;
+
+  @Before
+  public void createTableLocation() throws IOException {
+tableLocation = temp.newFolder("hive-").getPath();
+  }
+
+  @After
+  public void cleanup() {
+catalog.dropTable(TABLE_IDENTIFIER);
+  }
+
+  @Test
+  public void testCreateTableTxn() {
+Assert.assertFalse("Table should not exist", 
catalog.tableExists(TABLE_IDENTIFIER));
+
+Transaction txn = catalog.newCreateTableTransaction(
+TABLE_IDENTIFIER, SCHEMA, SPEC, tableLocation, Maps.newHashMap());
+txn.updateProperties()
+.set("prop", "value")
+.commit();
+
+// verify the table is still not visible before the transaction is 
committed
+Assert.assertFalse(catalog.tableExists(TABLE_IDENTIFIER));
+
+txn.commitTransaction();
+
+Table table = catalog.loadTable(TABLE_IDENTIFIER);
+Assert.assertEquals("Table props should match", "value", 
table.properties().get("prop"));
+  }
+
+  @Test
+  public void testCreateTableTxnTableCreatedConcurrently() {
+exceptionRule.expect(RuntimeException.class);
+exceptionRule.expectMessage("Metastore operation failed");
+
+Assert.assertFalse("Table should not exist", 
catalog.tableExists(TABLE_IDENTIFIER));
+
+Transaction txn = catalog.newCreateTableTransaction(
+TABLE_IDENTIFIER, SCHEMA, SPEC, tableLocation, Maps.newHashMap());
+
+// create the table concurrently
+catalog.createTable(TABLE_IDENTIFIER, SCHEMA, SPEC);
+Assert.assertTrue("Table should be created", 
catalog.tableExists(TABLE_IDENTIFIER));
+
+// expect the transaction to fail
+txn.commitTransaction();
+  }
+
+  @Test
+  public void testCreateTableTxnTableAlreadyExists() {
+exceptionRule.expect(AlreadyExistsException.class);
+exceptionRule.expectMessage("Table already exists: hivedb.tbl");
+
+Assert.assertFalse("Table should not exist", 
catalog.tableExists(TABLE_IDENTIFIER));
+
+// create a table before starting a transaction
+catalog.createTable(TABLE_IDENTIFIER, SCHEMA, SPEC);
+Assert.assertTrue("Table should be created", 
catalog.tableExists(TABLE_IDENTIFIER));
+
+// should not be possible to start a new create table transaction
+catalog.newCreateTableTransaction(
+TABLE_IDENTIFIER, SCHEMA, SPEC, tableLocation, Maps.newHashMap());
+  }
+
+  @Test
+  public void testReplaceTableTxn() {
+catalog.createTable(TABLE_IDENTIF

[GitHub] [incubator-iceberg] aokolnychyi commented on a change in pull request #362: Support create and replace transactions in Catalog

2019-08-25 Thread GitBox
aokolnychyi commented on a change in pull request #362: Support create and 
replace transactions in Catalog
URL: https://github.com/apache/incubator-iceberg/pull/362#discussion_r317420104
 
 

 ##
 File path: 
hive/src/test/java/org/apache/iceberg/hive/HiveCreateReplaceTableTest.java
 ##
 @@ -0,0 +1,238 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.hive;
+
+import com.google.common.collect.Maps;
+import java.io.IOException;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.Table;
+import org.apache.iceberg.Transaction;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.AlreadyExistsException;
+import org.apache.iceberg.exceptions.NoSuchTableException;
+import org.apache.iceberg.types.Types;
+import org.junit.After;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.ExpectedException;
+import org.junit.rules.TemporaryFolder;
+
+import static org.apache.iceberg.PartitionSpec.builderFor;
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+public class HiveCreateReplaceTableTest extends HiveMetastoreTest {
+
+  private static final String TABLE_NAME = "tbl";
+  private static final TableIdentifier TABLE_IDENTIFIER = 
TableIdentifier.of(DB_NAME, TABLE_NAME);
+  private static final Schema SCHEMA = new Schema(
+  required(3, "id", Types.IntegerType.get()),
+  required(4, "data", Types.StringType.get())
+  );
+  private static final PartitionSpec SPEC = builderFor(SCHEMA)
+  .identity("id")
+  .build();
+
+  @Rule
+  public ExpectedException exceptionRule = ExpectedException.none();
+  @Rule
+  public TemporaryFolder temp = new TemporaryFolder();
+
+  private String tableLocation;
+
+  @Before
+  public void createTableLocation() throws IOException {
+tableLocation = temp.newFolder("hive-").getPath();
+  }
+
+  @After
+  public void cleanup() {
+catalog.dropTable(TABLE_IDENTIFIER);
+  }
+
+  @Test
+  public void testCreateTableTxn() {
+Assert.assertFalse("Table should not exist", 
catalog.tableExists(TABLE_IDENTIFIER));
+
+Transaction txn = catalog.newCreateTableTransaction(
+TABLE_IDENTIFIER, SCHEMA, SPEC, tableLocation, Maps.newHashMap());
+txn.updateProperties()
+.set("prop", "value")
+.commit();
+
+// verify the table is still not visible before the transaction is 
committed
+Assert.assertFalse(catalog.tableExists(TABLE_IDENTIFIER));
+
+txn.commitTransaction();
+
+Table table = catalog.loadTable(TABLE_IDENTIFIER);
+Assert.assertEquals("Table props should match", "value", 
table.properties().get("prop"));
+  }
+
+  @Test
+  public void testCreateTableTxnTableCreatedConcurrently() {
+exceptionRule.expect(RuntimeException.class);
+exceptionRule.expectMessage("Metastore operation failed");
+
+Assert.assertFalse("Table should not exist", 
catalog.tableExists(TABLE_IDENTIFIER));
+
+Transaction txn = catalog.newCreateTableTransaction(
+TABLE_IDENTIFIER, SCHEMA, SPEC, tableLocation, Maps.newHashMap());
+
+// create the table concurrently
+catalog.createTable(TABLE_IDENTIFIER, SCHEMA, SPEC);
+Assert.assertTrue("Table should be created", 
catalog.tableExists(TABLE_IDENTIFIER));
+
+// expect the transaction to fail
+txn.commitTransaction();
+  }
+
+  @Test
+  public void testCreateTableTxnTableAlreadyExists() {
+exceptionRule.expect(AlreadyExistsException.class);
+exceptionRule.expectMessage("Table already exists: hivedb.tbl");
+
+Assert.assertFalse("Table should not exist", 
catalog.tableExists(TABLE_IDENTIFIER));
+
+// create a table before starting a transaction
+catalog.createTable(TABLE_IDENTIFIER, SCHEMA, SPEC);
+Assert.assertTrue("Table should be created", 
catalog.tableExists(TABLE_IDENTIFIER));
+
+// should not be possible to start a new create table transaction
+catalog.newCreateTableTransaction(
+TABLE_IDENTIFIER, SCHEMA, SPEC, tableLocation, Maps.newHashMap());
+  }
+
+  @Test
+  public void testReplaceTableTxn() {
+catalog.createTable(TABLE_IDENTIF

[GitHub] [incubator-iceberg] aokolnychyi commented on a change in pull request #362: Support create and replace transactions in Catalog

2019-08-25 Thread GitBox
aokolnychyi commented on a change in pull request #362: Support create and 
replace transactions in Catalog
URL: https://github.com/apache/incubator-iceberg/pull/362#discussion_r317420055
 
 

 ##
 File path: hive/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
 ##
 @@ -184,6 +183,9 @@ public void commit(TableMetadata base, TableMetadata 
metadata) {
 });
   }
   threw = false;
+} catch (org.apache.hadoop.hive.metastore.api.AlreadyExistsException e) {
 
 Review comment:
   This part requires attention. This change was needed to always throw 
`AlreadyExistsException` instead of `RuntimeException` when tables are created 
concurrently.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] aokolnychyi commented on a change in pull request #362: Support create and replace transactions in Catalog

2019-08-25 Thread GitBox
aokolnychyi commented on a change in pull request #362: Support create and 
replace transactions in Catalog
URL: https://github.com/apache/incubator-iceberg/pull/362#discussion_r317419683
 
 

 ##
 File path: core/src/main/java/org/apache/iceberg/BaseTransaction.java
 ##
 @@ -241,7 +234,14 @@ private void commitCreateTransaction() {
 }
   }
 
-  private void commitReplaceTransaction() {
+  private void commitReplaceTransaction(boolean orCreate) {
+if (base == null && !orCreate) {
+  throw new NoSuchTableException("Table doesn't exist");
+} else if (base == null) {
+  commitCreateTransaction();
 
 Review comment:
   Done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] aokolnychyi commented on a change in pull request #398: Push down StringStartsWith in Spark IcebergSource

2019-08-25 Thread GitBox
aokolnychyi commented on a change in pull request #398: Push down 
StringStartsWith in Spark IcebergSource
URL: https://github.com/apache/incubator-iceberg/pull/398#discussion_r317416149
 
 

 ##
 File path: 
api/src/main/java/org/apache/iceberg/expressions/InclusiveMetricsEvaluator.java
 ##
 @@ -254,5 +258,43 @@ public Boolean or(Boolean leftResult, Boolean 
rightResult) {
 public  Boolean notIn(BoundReference ref, Literal lit) {
   return ROWS_MIGHT_MATCH;
 }
+
+@Override
+public  Boolean startsWith(BoundReference ref, Literal lit) {
+  int id = ref.fieldId();
+
+  String prefix = (String) lit.value();
+  ByteBuffer prefixAsBytes = 
ByteBuffer.wrap(prefix.getBytes(StandardCharsets.UTF_8));
 
 Review comment:
   I've created #414 separately to address this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] aokolnychyi opened a new pull request #414: Add toByteBuffer to Literal

2019-08-25 Thread GitBox
aokolnychyi opened a new pull request #414: Add toByteBuffer to Literal
URL: https://github.com/apache/incubator-iceberg/pull/414
 
 
   This PR extends `Literal` with `toByteBuffer` to allow us to get binary 
representations of literals.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #348: Avoid floating point number ordering NaN semantics

2019-08-25 Thread GitBox
rdblue commented on a change in pull request #348: Avoid floating point number 
ordering NaN semantics
URL: https://github.com/apache/incubator-iceberg/pull/348#discussion_r317413767
 
 

 ##
 File path: site/docs/spec.md
 ##
 @@ -206,19 +206,22 @@ The schema of a manifest file is a struct called 
`manifest_entry` with the follo
 | **`104  file_size_in_bytes`** | `long`| 
Total file size in bytes

 |
 | ~~**`105 block_size_in_bytes`**~~ | `long`| 
**Deprecated. Always write a default value and do not read.**   

 |
 | **`106  file_ordinal`**   | `optional int`| 
Ordinal of the file w.r.t files with the same partition tuple and snapshot id   

 |
-| **`107  sort_columns`**   | `optional list`   | 
Columns the file is sorted by   

 |
+| **`107  sort_columns`**   | `optional list`   | 
Columns the file is sorted by [2]. If a column has type `float` or `double` and 
contains `NaN`, it must not be in `sort_columns`.   
 |
 | **`108  column_sizes`**   | `optional map`| 
Map from column id to the total size on disk of all regions that store the 
column. Does not include bytes necessary to read other columns, like footers. 
Leave null for row-oriented formats (Avro). |
 | **`109  value_counts`**   | `optional map`| 
Map from column id to number of values in the column (including null values)

 |
 | **`110  null_value_counts`**  | `optional map`| 
Map from column id to number of null values in the column   

 |
 | ~~**`111 distinct_counts`**~~ | `optional map`| 
**Deprecated. Do not use.** 

 |
-| **`125  lower_bounds`**   | `optional map<126: int, 127: binary>` | 
Map from column id to lower bound in the column serialized as binary [1]. Each 
value must be less than or equal to all values in the column for the file.  
  |
-| **`128  upper_bounds`**   | `optional map<129: int, 130: binary>` | 
Map from column id to upper bound in the column serialized as binary [1]. Each 
value must be greater than or equal to all values in the column for the file.   
  |
+| **`112  nan_value_counts`**   | `optional map`| 
Map from column id to number of NaN values in the column

 |
 
 Review comment:
   We will need to assign a new ID for this, as well as the key and value: 
https://github.com/apache/incubator-iceberg/blob/master/api/src/main/java/org/apache/iceberg/DataFile.java#L63


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #348: Avoid floating point number ordering NaN semantics

2019-08-25 Thread GitBox
rdblue commented on a change in pull request #348: Avoid floating point number 
ordering NaN semantics
URL: https://github.com/apache/incubator-iceberg/pull/348#discussion_r317413690
 
 

 ##
 File path: site/docs/spec.md
 ##
 @@ -206,19 +206,22 @@ The schema of a manifest file is a struct called 
`manifest_entry` with the follo
 | **`104  file_size_in_bytes`** | `long`| 
Total file size in bytes

 |
 | ~~**`105 block_size_in_bytes`**~~ | `long`| 
**Deprecated. Always write a default value and do not read.**   

 |
 | **`106  file_ordinal`**   | `optional int`| 
Ordinal of the file w.r.t files with the same partition tuple and snapshot id   

 |
-| **`107  sort_columns`**   | `optional list`   | 
Columns the file is sorted by   

 |
+| **`107  sort_columns`**   | `optional list`   | 
Columns the file is sorted by [2]. If a column has type `float` or `double` and 
contains `NaN`, it must not be in `sort_columns`.   
 |
 
 Review comment:
   Sort columns is currently not used and we intend to remove it. It sounded 
like a good idea at first, but we will need direction and null handling rules. 
What we are planning to do instead is to define sort orders in table metadata 
and attach them to files by ID. So don't worry about this, we'll remove it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #362: Support create and replace transactions in Catalog

2019-08-25 Thread GitBox
rdblue commented on a change in pull request #362: Support create and replace 
transactions in Catalog
URL: https://github.com/apache/incubator-iceberg/pull/362#discussion_r317413447
 
 

 ##
 File path: core/src/main/java/org/apache/iceberg/BaseTransaction.java
 ##
 @@ -241,7 +234,14 @@ private void commitCreateTransaction() {
 }
   }
 
-  private void commitReplaceTransaction() {
+  private void commitReplaceTransaction(boolean orCreate) {
+if (base == null && !orCreate) {
+  throw new NoSuchTableException("Table doesn't exist");
+} else if (base == null) {
+  commitCreateTransaction();
 
 Review comment:
   In that case, I think we should use the settings from the current metadata 
instead of base.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] aokolnychyi commented on a change in pull request #351: Extend Iceberg with a way to overwrite files for eager updates/deletes

2019-08-25 Thread GitBox
aokolnychyi commented on a change in pull request #351: Extend Iceberg with a 
way to overwrite files for eager updates/deletes
URL: https://github.com/apache/incubator-iceberg/pull/351#discussion_r317398734
 
 

 ##
 File path: core/src/main/java/org/apache/iceberg/OverwriteData.java
 ##
 @@ -88,6 +119,42 @@ public OverwriteFiles validateAddedFiles() {
   }
 }
 
+if (conflictDetectionFilter != null) {
+  PartitionSpec spec = writeSpec();
+  Expression inclusiveExpr = 
Projections.inclusive(spec).project(conflictDetectionFilter);
+  Evaluator inclusive = new Evaluator(spec.partitionType(), inclusiveExpr);
+
+  InclusiveMetricsEvaluator metrics = new 
InclusiveMetricsEvaluator(base.schema(), conflictDetectionFilter);
+
+  List newFiles = collectNewFiles(base);
+  for (DataFile newFile : newFiles) {
+ValidationException.check(
+!inclusive.eval(newFile.partition()) || !metrics.eval(newFile),
+"A conflicting file was appended that matches filter '%s': %s",
 
 Review comment:
   You are right, that's exactly the behavior we wanted to achieve. If the file 
can potentially contain data that matches the filter, we should fail. I should 
update the error message.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] aokolnychyi commented on a change in pull request #351: Extend Iceberg with a way to overwrite files for eager updates/deletes

2019-08-25 Thread GitBox
aokolnychyi commented on a change in pull request #351: Extend Iceberg with a 
way to overwrite files for eager updates/deletes
URL: https://github.com/apache/incubator-iceberg/pull/351#discussion_r317398659
 
 

 ##
 File path: core/src/main/java/org/apache/iceberg/OverwriteData.java
 ##
 @@ -88,6 +119,42 @@ public OverwriteFiles validateAddedFiles() {
   }
 }
 
+if (conflictDetectionFilter != null) {
+  PartitionSpec spec = writeSpec();
+  Expression inclusiveExpr = 
Projections.inclusive(spec).project(conflictDetectionFilter);
+  Evaluator inclusive = new Evaluator(spec.partitionType(), inclusiveExpr);
+
+  InclusiveMetricsEvaluator metrics = new 
InclusiveMetricsEvaluator(base.schema(), conflictDetectionFilter);
+
+  List newFiles = collectNewFiles(base);
+  for (DataFile newFile : newFiles) {
+ValidationException.check(
+!inclusive.eval(newFile.partition()) || !metrics.eval(newFile),
+"A conflicting file was appended that matches filter '%s': %s",
+conflictDetectionFilter, newFile.path());
+  }
+}
+
 return super.apply(base);
   }
+
+  private List collectNewFiles(TableMetadata meta) {
+List newFiles = new ArrayList<>();
+
+Long currentSnapshotId = meta.currentSnapshot() == null ? null : 
meta.currentSnapshot().snapshotId();
 
 Review comment:
   Not sure I got. If the current snapshot is null, it means the table is still 
empty and shouldn't have any new files, right?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] aokolnychyi commented on a change in pull request #351: Extend Iceberg with a way to overwrite files for eager updates/deletes

2019-08-25 Thread GitBox
aokolnychyi commented on a change in pull request #351: Extend Iceberg with a 
way to overwrite files for eager updates/deletes
URL: https://github.com/apache/incubator-iceberg/pull/351#discussion_r317398607
 
 

 ##
 File path: core/src/main/java/org/apache/iceberg/OverwriteData.java
 ##
 @@ -88,6 +119,42 @@ public OverwriteFiles validateAddedFiles() {
   }
 }
 
+if (conflictDetectionFilter != null) {
+  PartitionSpec spec = writeSpec();
+  Expression inclusiveExpr = 
Projections.inclusive(spec).project(conflictDetectionFilter);
+  Evaluator inclusive = new Evaluator(spec.partitionType(), inclusiveExpr);
+
+  InclusiveMetricsEvaluator metrics = new 
InclusiveMetricsEvaluator(base.schema(), conflictDetectionFilter);
+
+  List newFiles = collectNewFiles(base);
+  for (DataFile newFile : newFiles) {
+ValidationException.check(
+!inclusive.eval(newFile.partition()) || !metrics.eval(newFile),
+"A conflicting file was appended that matches filter '%s': %s",
+conflictDetectionFilter, newFile.path());
+  }
+}
+
 return super.apply(base);
   }
+
+  private List collectNewFiles(TableMetadata meta) {
+List newFiles = new ArrayList<>();
+
+Long currentSnapshotId = meta.currentSnapshot() == null ? null : 
meta.currentSnapshot().snapshotId();
+while (currentSnapshotId != null && 
!currentSnapshotId.equals(readSnapshotId)) {
 
 Review comment:
   This part is only invoked if we call `validateNoConflictingAppends`, which 
always accepts a valid `readSnapshotId` and an expression for detecting 
conflicts. If `validateNoConflictingAppends` is not called, there will be no 
validation and the behavior would be as it was before this change. 
   
   We will call `validateNoConflictingAppends(null, expr)` only if we start 
with an empty table, meaning that we have to scan the entire history.
   
   We can consider setting `readSnapshotId` and `conflictDetectionFilter` 
independently, but I am not sure whether it gives us any benefits. Also, it 
will lead to correctness errors if we won't set `readSnapshotId` in 
transactions correctly.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] aokolnychyi opened a new issue #413: Support case-sensitivity in all operations with metrics evaluators

2019-08-25 Thread GitBox
aokolnychyi opened a new issue #413: Support case-sensitivity in all operations 
with metrics evaluators
URL: https://github.com/apache/incubator-iceberg/issues/413
 
 
   We respect case-sensitivity in a couple of places such as 
`FilteredManifest`. However, `MergingSnapshotProducer` and `OverwriteData` use 
`StrictMetricsEvaluator` with `caseSensitive` always set to true. We need to 
properly support case-sensitivity in all operations with metrics evaluators.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] aokolnychyi opened a new pull request #412: Use null counts in metrics evaluators

2019-08-25 Thread GitBox
aokolnychyi opened a new pull request #412: Use null counts in metrics 
evaluators
URL: https://github.com/apache/incubator-iceberg/pull/412
 
 
   As discussed in #398, we need to take into account null counts in metrics 
evaluators.
   
   This PR resolves #408.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] aokolnychyi commented on a change in pull request #362: Support create and replace transactions in Catalog

2019-08-25 Thread GitBox
aokolnychyi commented on a change in pull request #362: Support create and 
replace transactions in Catalog
URL: https://github.com/apache/incubator-iceberg/pull/362#discussion_r317387008
 
 

 ##
 File path: core/src/main/java/org/apache/iceberg/BaseTransaction.java
 ##
 @@ -241,7 +234,14 @@ private void commitCreateTransaction() {
 }
   }
 
-  private void commitReplaceTransaction() {
+  private void commitReplaceTransaction(boolean orCreate) {
+if (base == null && !orCreate) {
+  throw new NoSuchTableException("Table doesn't exist");
+} else if (base == null) {
+  commitCreateTransaction();
 
 Review comment:
   We need to know how many retries to use, etc. We cannot configure `Tasks` as 
`base` is null.
   ```
   Tasks.foreach(ops)
.retry(base.propertyAsInt(COMMIT_NUM_RETRIES, 
COMMIT_NUM_RETRIES_DEFAULT))
.exponentialBackoff(
base.propertyAsInt(COMMIT_MIN_RETRY_WAIT_MS, 
COMMIT_MIN_RETRY_WAIT_MS_DEFAULT),
base.propertyAsInt(COMMIT_MAX_RETRY_WAIT_MS, 
COMMIT_MAX_RETRY_WAIT_MS_DEFAULT),
base.propertyAsInt(COMMIT_TOTAL_RETRY_TIME_MS, 
COMMIT_TOTAL_RETRY_TIME_MS_DEFAULT),
2.0 /* exponential */)
.onlyRetryOn(CommitFailedException.class)
.run(underlyingOps -> {
  ...
}
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org