date:20191017

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #529: Add hadoop table catalog (WIP)

2019-10-17 Thread GitBox

rdblue commented on a change in pull request #529: Add hadoop table catalog 
(WIP)
URL: https://github.com/apache/incubator-iceberg/pull/529#discussion_r336281218
 
 

 ##
 File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
 ##
 @@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.hadoop;
+
+import com.google.common.base.Preconditions;
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.Map;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.iceberg.BaseMetastoreCatalog;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.TableMetadata;
+import org.apache.iceberg.TableOperations;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.AlreadyExistsException;
+import org.apache.iceberg.exceptions.RuntimeIOException;
+
+
+public class HadoopCatalog extends BaseMetastoreCatalog implements Closeable {
+  private static final String ICEBERG_HADOOP_WAREHOUSE_BASE = 
"iceberg/warehouse";
+  private final Configuration conf;
+  private String warehouseUri;
+
+  public HadoopCatalog(Configuration conf, String warehouseUri) {
+this.conf = conf;
+
+if (warehouseUri != null) {
+  this.warehouseUri = warehouseUri;
+} else {
+  String fsRoot = conf.get("fs.defaultFS");
+  Path warehousePath = new Path(fsRoot, ICEBERG_HADOOP_WAREHOUSE_BASE);
+  try {
+FileSystem fs = Util.getFs(warehousePath, conf);
+if (!fs.isDirectory(warehousePath)) {
+  if (!fs.mkdirs(warehousePath)) {
+throw new IOException("failed to create warehouse for hadoop 
catalog");
+  }
+}
+this.warehouseUri = fsRoot + "/" + ICEBERG_HADOOP_WAREHOUSE_BASE;
+  } catch (IOException e) {
+throw new RuntimeIOException("failed to create directory for 
warehouse", e);
+  }
+}
+  }
+
+  public HadoopCatalog(Configuration conf) {
+this(conf, null);
+  }
+
+  @Override
+  public org.apache.iceberg.Table createTable(
+  TableIdentifier identifier, Schema schema, PartitionSpec spec, 
Map properties) {
+Preconditions.checkArgument(identifier.namespace().levels().length == 1,
+"Missing database in table identifier: %s", identifier);
+Path tablePath = new Path(defaultWarehouseLocation(identifier));
+try {
+  FileSystem fs = Util.getFs(tablePath, conf);
+  if (!fs.isDirectory(tablePath)) {
+fs.mkdirs(tablePath);
+  } else {
+throw new AlreadyExistsException("the table already exists: " + 
identifier);
+  }
+} catch (IOException e) {
+  throw new RuntimeIOException("failed to create directory", e);
+}
+return super.createTable(identifier, schema, spec, null, properties);
+  }
+
+  public org.apache.iceberg.Table createTable(
+  TableIdentifier identifier, Schema schema, PartitionSpec spec) {
+Preconditions.checkArgument(identifier.namespace().levels().length == 1,
+"Missing database in table identifier: %s", identifier);
+return createTable(identifier, schema, spec, null, null);
+  }
+
+  @Override
+  protected TableOperations newTableOps(TableIdentifier identifier) {
+Preconditions.checkArgument(identifier.namespace().levels().length == 1,
+"Missing database in table identifier: %s", identifier);
+return new HadoopTableOperations(new 
Path(defaultWarehouseLocation(identifier)), conf);
+  }
+
+  @Override
+  protected String defaultWarehouseLocation(TableIdentifier tableIdentifier) {
+String dbName = tableIdentifier.namespace().level(0);
+String tableName = tableIdentifier.name();
+return this.warehouseUri + "/" + dbName + ".db" + "/" + tableName;
+  }
+
+  @Override
+  public boolean dropTable(TableIdentifier identifier, boolean purge) {
+Preconditions.checkArgument(identifier.namespace().levels().length == 1,
+"Missing database in table identifier: %s", identifier);
+
+Path tablePath = new Path(defaultWarehouseLocation(identifier));
+TableOperations ops =

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #529: Add hadoop table catalog (WIP)

2019-10-17 Thread GitBox

rdblue commented on a change in pull request #529: Add hadoop table catalog 
(WIP)
URL: https://github.com/apache/incubator-iceberg/pull/529#discussion_r336280986
 
 

 ##
 File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
 ##
 @@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.hadoop;
+
+import com.google.common.base.Preconditions;
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.Map;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.iceberg.BaseMetastoreCatalog;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.TableMetadata;
+import org.apache.iceberg.TableOperations;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.AlreadyExistsException;
+import org.apache.iceberg.exceptions.RuntimeIOException;
+
+
+public class HadoopCatalog extends BaseMetastoreCatalog implements Closeable {
+  private static final String ICEBERG_HADOOP_WAREHOUSE_BASE = 
"iceberg/warehouse";
+  private final Configuration conf;
+  private String warehouseUri;
+
+  public HadoopCatalog(Configuration conf, String warehouseUri) {
+this.conf = conf;
+
+if (warehouseUri != null) {
+  this.warehouseUri = warehouseUri;
+} else {
+  String fsRoot = conf.get("fs.defaultFS");
+  Path warehousePath = new Path(fsRoot, ICEBERG_HADOOP_WAREHOUSE_BASE);
+  try {
+FileSystem fs = Util.getFs(warehousePath, conf);
+if (!fs.isDirectory(warehousePath)) {
+  if (!fs.mkdirs(warehousePath)) {
+throw new IOException("failed to create warehouse for hadoop 
catalog");
+  }
+}
+this.warehouseUri = fsRoot + "/" + ICEBERG_HADOOP_WAREHOUSE_BASE;
+  } catch (IOException e) {
+throw new RuntimeIOException("failed to create directory for 
warehouse", e);
+  }
+}
+  }
+
+  public HadoopCatalog(Configuration conf) {
+this(conf, null);
+  }
+
+  @Override
+  public org.apache.iceberg.Table createTable(
+  TableIdentifier identifier, Schema schema, PartitionSpec spec, 
Map properties) {
+Preconditions.checkArgument(identifier.namespace().levels().length == 1,
+"Missing database in table identifier: %s", identifier);
+Path tablePath = new Path(defaultWarehouseLocation(identifier));
+try {
+  FileSystem fs = Util.getFs(tablePath, conf);
+  if (!fs.isDirectory(tablePath)) {
+fs.mkdirs(tablePath);
+  } else {
+throw new AlreadyExistsException("the table already exists: " + 
identifier);
+  }
+} catch (IOException e) {
+  throw new RuntimeIOException("failed to create directory", e);
+}
+return super.createTable(identifier, schema, spec, null, properties);
+  }
+
+  public org.apache.iceberg.Table createTable(
+  TableIdentifier identifier, Schema schema, PartitionSpec spec) {
+Preconditions.checkArgument(identifier.namespace().levels().length == 1,
+"Missing database in table identifier: %s", identifier);
+return createTable(identifier, schema, spec, null, null);
+  }
+
+  @Override
+  protected TableOperations newTableOps(TableIdentifier identifier) {
+Preconditions.checkArgument(identifier.namespace().levels().length == 1,
+"Missing database in table identifier: %s", identifier);
+return new HadoopTableOperations(new 
Path(defaultWarehouseLocation(identifier)), conf);
+  }
+
+  @Override
+  protected String defaultWarehouseLocation(TableIdentifier tableIdentifier) {
+String dbName = tableIdentifier.namespace().level(0);
+String tableName = tableIdentifier.name();
+return this.warehouseUri + "/" + dbName + ".db" + "/" + tableName;
+  }
+
+  @Override
+  public boolean dropTable(TableIdentifier identifier, boolean purge) {
+Preconditions.checkArgument(identifier.namespace().levels().length == 1,
+"Missing database in table identifier: %s", identifier);
+
+Path tablePath = new Path(defaultWarehouseLocation(identifier));
+TableOperations ops =

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #529: Add hadoop table catalog (WIP)

2019-10-17 Thread GitBox

rdblue commented on a change in pull request #529: Add hadoop table catalog 
(WIP)
URL: https://github.com/apache/incubator-iceberg/pull/529#discussion_r336280691
 
 

 ##
 File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
 ##
 @@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.hadoop;
+
+import com.google.common.base.Preconditions;
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.Map;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.iceberg.BaseMetastoreCatalog;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.TableMetadata;
+import org.apache.iceberg.TableOperations;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.AlreadyExistsException;
+import org.apache.iceberg.exceptions.RuntimeIOException;
+
+
+public class HadoopCatalog extends BaseMetastoreCatalog implements Closeable {
+  private static final String ICEBERG_HADOOP_WAREHOUSE_BASE = 
"iceberg/warehouse";
+  private final Configuration conf;
+  private String warehouseUri;
+
+  public HadoopCatalog(Configuration conf, String warehouseUri) {
+this.conf = conf;
+
+if (warehouseUri != null) {
+  this.warehouseUri = warehouseUri;
+} else {
+  String fsRoot = conf.get("fs.defaultFS");
+  Path warehousePath = new Path(fsRoot, ICEBERG_HADOOP_WAREHOUSE_BASE);
+  try {
+FileSystem fs = Util.getFs(warehousePath, conf);
+if (!fs.isDirectory(warehousePath)) {
+  if (!fs.mkdirs(warehousePath)) {
+throw new IOException("failed to create warehouse for hadoop 
catalog");
+  }
+}
+this.warehouseUri = fsRoot + "/" + ICEBERG_HADOOP_WAREHOUSE_BASE;
+  } catch (IOException e) {
+throw new RuntimeIOException("failed to create directory for 
warehouse", e);
+  }
+}
+  }
+
+  public HadoopCatalog(Configuration conf) {
+this(conf, null);
+  }
+
+  @Override
+  public org.apache.iceberg.Table createTable(
+  TableIdentifier identifier, Schema schema, PartitionSpec spec, 
Map properties) {
+Preconditions.checkArgument(identifier.namespace().levels().length == 1,
+"Missing database in table identifier: %s", identifier);
+Path tablePath = new Path(defaultWarehouseLocation(identifier));
+try {
+  FileSystem fs = Util.getFs(tablePath, conf);
+  if (!fs.isDirectory(tablePath)) {
+fs.mkdirs(tablePath);
+  } else {
+throw new AlreadyExistsException("the table already exists: " + 
identifier);
+  }
+} catch (IOException e) {
+  throw new RuntimeIOException("failed to create directory", e);
+}
+return super.createTable(identifier, schema, spec, null, properties);
+  }
+
+  public org.apache.iceberg.Table createTable(
+  TableIdentifier identifier, Schema schema, PartitionSpec spec) {
+Preconditions.checkArgument(identifier.namespace().levels().length == 1,
+"Missing database in table identifier: %s", identifier);
+return createTable(identifier, schema, spec, null, null);
+  }
+
+  @Override
+  protected TableOperations newTableOps(TableIdentifier identifier) {
+Preconditions.checkArgument(identifier.namespace().levels().length == 1,
+"Missing database in table identifier: %s", identifier);
 
 Review comment:
   Why restrict namespaces to 1 level?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #529: Add hadoop table catalog (WIP)

2019-10-17 Thread GitBox

rdblue commented on a change in pull request #529: Add hadoop table catalog 
(WIP)
URL: https://github.com/apache/incubator-iceberg/pull/529#discussion_r336280440
 
 

 ##
 File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
 ##
 @@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.hadoop;
+
+import com.google.common.base.Preconditions;
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.Map;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.iceberg.BaseMetastoreCatalog;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.TableMetadata;
+import org.apache.iceberg.TableOperations;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.AlreadyExistsException;
+import org.apache.iceberg.exceptions.RuntimeIOException;
+
+
+public class HadoopCatalog extends BaseMetastoreCatalog implements Closeable {
+  private static final String ICEBERG_HADOOP_WAREHOUSE_BASE = 
"iceberg/warehouse";
+  private final Configuration conf;
+  private String warehouseUri;
+
+  public HadoopCatalog(Configuration conf, String warehouseUri) {
+this.conf = conf;
+
+if (warehouseUri != null) {
+  this.warehouseUri = warehouseUri;
+} else {
+  String fsRoot = conf.get("fs.defaultFS");
+  Path warehousePath = new Path(fsRoot, ICEBERG_HADOOP_WAREHOUSE_BASE);
+  try {
+FileSystem fs = Util.getFs(warehousePath, conf);
+if (!fs.isDirectory(warehousePath)) {
+  if (!fs.mkdirs(warehousePath)) {
+throw new IOException("failed to create warehouse for hadoop 
catalog");
+  }
+}
+this.warehouseUri = fsRoot + "/" + ICEBERG_HADOOP_WAREHOUSE_BASE;
+  } catch (IOException e) {
+throw new RuntimeIOException("failed to create directory for 
warehouse", e);
+  }
+}
+  }
+
+  public HadoopCatalog(Configuration conf) {
+this(conf, null);
+  }
+
+  @Override
+  public org.apache.iceberg.Table createTable(
+  TableIdentifier identifier, Schema schema, PartitionSpec spec, 
Map properties) {
+Preconditions.checkArgument(identifier.namespace().levels().length == 1,
+"Missing database in table identifier: %s", identifier);
+Path tablePath = new Path(defaultWarehouseLocation(identifier));
+try {
+  FileSystem fs = Util.getFs(tablePath, conf);
+  if (!fs.isDirectory(tablePath)) {
+fs.mkdirs(tablePath);
+  } else {
+throw new AlreadyExistsException("the table already exists: " + 
identifier);
 
 Review comment:
   I don't think this is correct. The table exists if its metadata exists, not 
if the directory is present.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #529: Add hadoop table catalog (WIP)

2019-10-17 Thread GitBox

rdblue commented on a change in pull request #529: Add hadoop table catalog 
(WIP)
URL: https://github.com/apache/incubator-iceberg/pull/529#discussion_r336280536
 
 

 ##
 File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
 ##
 @@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.hadoop;
+
+import com.google.common.base.Preconditions;
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.Map;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.iceberg.BaseMetastoreCatalog;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.TableMetadata;
+import org.apache.iceberg.TableOperations;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.AlreadyExistsException;
+import org.apache.iceberg.exceptions.RuntimeIOException;
+
+
+public class HadoopCatalog extends BaseMetastoreCatalog implements Closeable {
+  private static final String ICEBERG_HADOOP_WAREHOUSE_BASE = 
"iceberg/warehouse";
+  private final Configuration conf;
+  private String warehouseUri;
+
+  public HadoopCatalog(Configuration conf, String warehouseUri) {
+this.conf = conf;
+
+if (warehouseUri != null) {
+  this.warehouseUri = warehouseUri;
+} else {
+  String fsRoot = conf.get("fs.defaultFS");
+  Path warehousePath = new Path(fsRoot, ICEBERG_HADOOP_WAREHOUSE_BASE);
+  try {
+FileSystem fs = Util.getFs(warehousePath, conf);
+if (!fs.isDirectory(warehousePath)) {
+  if (!fs.mkdirs(warehousePath)) {
+throw new IOException("failed to create warehouse for hadoop 
catalog");
+  }
+}
+this.warehouseUri = fsRoot + "/" + ICEBERG_HADOOP_WAREHOUSE_BASE;
+  } catch (IOException e) {
+throw new RuntimeIOException("failed to create directory for 
warehouse", e);
+  }
+}
+  }
+
+  public HadoopCatalog(Configuration conf) {
+this(conf, null);
+  }
+
+  @Override
+  public org.apache.iceberg.Table createTable(
+  TableIdentifier identifier, Schema schema, PartitionSpec spec, 
Map properties) {
+Preconditions.checkArgument(identifier.namespace().levels().length == 1,
+"Missing database in table identifier: %s", identifier);
+Path tablePath = new Path(defaultWarehouseLocation(identifier));
+try {
+  FileSystem fs = Util.getFs(tablePath, conf);
+  if (!fs.isDirectory(tablePath)) {
+fs.mkdirs(tablePath);
+  } else {
+throw new AlreadyExistsException("the table already exists: " + 
identifier);
+  }
+} catch (IOException e) {
+  throw new RuntimeIOException("failed to create directory", e);
+}
+return super.createTable(identifier, schema, spec, null, properties);
 
 Review comment:
   `defaultWarehouseLocation` is overridden below. This only depends on 
`warehouseUri`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #529: Add hadoop table catalog (WIP)

2019-10-17 Thread GitBox

rdblue commented on a change in pull request #529: Add hadoop table catalog 
(WIP)
URL: https://github.com/apache/incubator-iceberg/pull/529#discussion_r336279982
 
 

 ##
 File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
 ##
 @@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.hadoop;
+
+import com.google.common.base.Preconditions;
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.Map;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.iceberg.BaseMetastoreCatalog;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.TableMetadata;
+import org.apache.iceberg.TableOperations;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.AlreadyExistsException;
+import org.apache.iceberg.exceptions.RuntimeIOException;
+
+
+public class HadoopCatalog extends BaseMetastoreCatalog implements Closeable {
 
 Review comment:
   Can you add documentation for how this catalog works? I believe that it 
creates Hadoop tables that require a file system with atomic rename. That 
should be stated in docs. I would also like to see a description of how this 
class is configured, where the tables are created, and what is implemented (no 
renameTable, is dropTable supported?).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] feng-tao commented on issue #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox

feng-tao commented on issue #551: [python] First add to docs, addresses #323 
and #363
URL: https://github.com/apache/incubator-iceberg/pull/551#issuecomment-543422736
 
 
   @TGooch44 do you know if we will have a pypi package to try it out?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on issue #537: Docs: Fix typos

2019-10-17 Thread GitBox

rdblue commented on issue #537: Docs: Fix typos
URL: https://github.com/apache/incubator-iceberg/pull/537#issuecomment-543422604
 
 
   I'm closing this since I think the typo was actually correct and I haven't 
heard back. Feel free to reopen if you think it still need to be fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue closed pull request #537: Docs: Fix typos

2019-10-17 Thread GitBox

rdblue closed pull request #537: Docs: Fix typos
URL: https://github.com/apache/incubator-iceberg/pull/537
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox

rdblue commented on a change in pull request #551: [python] First add to docs, 
addresses #323 and #363
URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336278823
 
 

 ##
 File path: site/docs/python-quickstart.md
 ##
 @@ -0,0 +1,40 @@
+
+
+# Examples
+
+## Inspect Table Metadata
 
 Review comment:
   The new wording sounds good. Thanks!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue merged pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox

rdblue merged pull request #551: [python] First add to docs, addresses #323 and 
#363
URL: https://github.com/apache/incubator-iceberg/pull/551
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on issue #556: Fix Kryo serialization in ParquetUtil.getSplitOffsets

2019-10-17 Thread GitBox

rdblue commented on issue #556: Fix Kryo serialization in 
ParquetUtil.getSplitOffsets
URL: https://github.com/apache/incubator-iceberg/pull/556#issuecomment-543421851
 
 
   Looks like the failure is checkstyle:
   
   ```
   [ant:checkstyle] [ERROR] 
/home/travis/build/apache/incubator-iceberg/spark/src/test/java/org/apache/iceberg/TestKryoSerialization.java:27:8:
 Unused import - org.apache.avro.generic.GenericData. [UnusedImports]
   [ant:checkstyle] [ERROR] 
/home/travis/build/apache/incubator-iceberg/spark/src/test/java/org/apache/iceberg/TestKryoSerialization.java:41:
 Extra separation in import group before 'java.io.File' [ImportOrder]
   [ant:checkstyle] [ERROR] 
/home/travis/build/apache/incubator-iceberg/spark/src/test/java/org/apache/iceberg/TestKryoSerialization.java:41:
 Wrong order for 'java.io.File' import. [ImportOrder]
   [ant:checkstyle] [ERROR] 
/home/travis/build/apache/incubator-iceberg/spark/src/test/java/org/apache/iceberg/TestKryoSerialization.java:47:8:
 Unused import - java.util.List. [UnusedImports]
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on issue #553: Spark ReadTask is expensive to serialize

2019-10-17 Thread GitBox

rdblue commented on issue #553: Spark ReadTask is expensive to serialize
URL: 
https://github.com/apache/incubator-iceberg/issues/553#issuecomment-543421665
 
 
   Using a broadcast sounds good to me for now.
   
   Can you open a PR for this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue closed issue #555: Iceberg tables should allow for automatic table creation when writing if table not exists already

2019-10-17 Thread GitBox

rdblue closed issue #555: Iceberg tables should allow for automatic table 
creation when writing if table not exists already
URL: https://github.com/apache/incubator-iceberg/issues/555
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on issue #555: Iceberg tables should allow for automatic table creation when writing if table not exists already

2019-10-17 Thread GitBox

rdblue commented on issue #555: Iceberg tables should allow for automatic table 
creation when writing if table not exists already
URL: 
https://github.com/apache/incubator-iceberg/issues/555#issuecomment-543421500
 
 
   We are planning on adding support for the new logical plans in Spark 3.0. 
That will include support fro common SQL statements, like `CREATE TABLE ... AS 
SELECT ...` as well as `REPLACE TABLE ... AS SELECT ...`. It will also include 
support for the new 
[DataFrameWriterV2](https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriterV2.scala)
 API that can be used for the same operations. The new API looks like this:
   
   ```scala
   df.writeTo("db.table").append()
   df.writeTo("db.table").partitionBy(hours($"ts")).create()
   df.writeTo("db.table").partitionBy(hours($"ts")).createOrReplace()
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] TGooch44 commented on a change in pull request #530: [python] adding Hive package to wrap BaseMetastoreTables/TableOperations

2019-10-17 Thread GitBox

TGooch44 commented on a change in pull request #530: [python] adding Hive 
package to wrap BaseMetastoreTables/TableOperations
URL: https://github.com/apache/incubator-iceberg/pull/530#discussion_r336266729
 
 

 ##
 File path: python/iceberg/hive/hive_table_operations.py
 ##
 @@ -0,0 +1,59 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+
+from iceberg.core import BaseMetastoreTableOperations
+
+
+class HiveTableOperations(BaseMetastoreTableOperations):
+
+def __init__(self, conf, client, database, table):
+super(HiveTableOperations, self).__init__(conf)
+self._client = client
+self.database = database
+self.table = table
+self.refresh()
+
+def refresh(self):
+with self._client as open_client:
+tbl_info = open_client.get_table(self.database, self.table)
+
+table_type = 
tbl_info.parameters.get(BaseMetastoreTableOperations.TABLE_TYPE_PROP)
+
+if table_type is None or table_type.lower() != 
BaseMetastoreTableOperations.ICEBERG_TABLE_TYPE_VALUE:
+raise RuntimeError("Invalid table, not Iceberg: %s.%s.%s" % 
(self.database,
 
 Review comment:
   Trying to be too fast...let me add some tests here to catch some of this 
kind of stuff


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] jzhuge opened a new pull request #556: Fix Kryo serialization in ParquetUtil.getSplitOffsets

2019-10-17 Thread GitBox

jzhuge opened a new pull request #556: Fix Kryo serialization in 
ParquetUtil.getSplitOffsets
URL: https://github.com/apache/incubator-iceberg/pull/556
 
 
   Found it during integration with downstream Spark 2.3 branch.
   
   Added a unit test.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] manishmalhotrawork commented on a change in pull request #529: Add hadoop table catalog (WIP)

2019-10-17 Thread GitBox

manishmalhotrawork commented on a change in pull request #529: Add hadoop table 
catalog (WIP)
URL: https://github.com/apache/incubator-iceberg/pull/529#discussion_r336260283
 
 

 ##
 File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
 ##
 @@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.hadoop;
+
+import com.google.common.base.Preconditions;
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.Map;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.iceberg.BaseMetastoreCatalog;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.TableMetadata;
+import org.apache.iceberg.TableOperations;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.AlreadyExistsException;
+import org.apache.iceberg.exceptions.RuntimeIOException;
+
+
+public class HadoopCatalog extends BaseMetastoreCatalog implements Closeable {
+  private static final String ICEBERG_HADOOP_WAREHOUSE_BASE = 
"iceberg/warehouse";
+  private final Configuration conf;
+  private String hdfsRoot;
+
+  public HadoopCatalog(Configuration conf) {
+this.conf = conf;
+hdfsRoot = conf.get("fs.defaultFS");
+Path warehousePath = new Path(hdfsRoot + ICEBERG_HADOOP_WAREHOUSE_BASE);
+try {
+  FileSystem fs = Util.getFs(warehousePath, conf);
+  if (!fs.isDirectory(warehousePath)) {
+if (!fs.mkdirs(warehousePath)) {
+  throw new IOException("failed to create warehouse for hadoop 
catalog");
+}
+  }
+  this.hdfsRoot = hdfsRoot + "/" + ICEBERG_HADOOP_WAREHOUSE_BASE;
+} catch (IOException e) {
+  throw new RuntimeIOException("failed to create directory for warehouse", 
e);
+}
+  }
+
+  @Override
+  public org.apache.iceberg.Table createTable(
+  TableIdentifier identifier, Schema schema, PartitionSpec spec, String 
location, Map properties) {
+Preconditions.checkArgument(identifier.namespace().levels().length == 1,
+"Missing database in table identifier: %s", identifier);
+Path tablePath = new Path(defaultWarehouseLocation(identifier));
+try {
+  FileSystem fs = Util.getFs(tablePath, conf);
+  if (!fs.isDirectory(tablePath)) {
+fs.mkdirs(tablePath);
+  } else {
+throw new AlreadyExistsException("the table already exists: " + 
identifier);
+  }
+} catch (IOException e) {
+  throw new RuntimeIOException("failed to create directory", e);
+}
+return super.createTable(identifier, schema, spec, null, properties);
+  }
+
+  public org.apache.iceberg.Table createTable(
+  TableIdentifier identifier, Schema schema, PartitionSpec spec, 
Map properties) {
+Preconditions.checkArgument(identifier.namespace().levels().length == 1,
+"Missing database in table identifier: %s", identifier);
+return createTable(identifier, schema, spec, null, properties);
 
 Review comment:
   Im sorry for this.
   
   When I think more about it, it should be ok. As parent class signature would 
not change based on the child class behavior. Also parent method is expecting 
`location` and `properties` to be null.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] manishmalhotrawork commented on a change in pull request #529: Add hadoop table catalog (WIP)

2019-10-17 Thread GitBox

manishmalhotrawork commented on a change in pull request #529: Add hadoop table 
catalog (WIP)
URL: https://github.com/apache/incubator-iceberg/pull/529#discussion_r336257210
 
 

 ##
 File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
 ##
 @@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.hadoop;
+
+import com.google.common.base.Preconditions;
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.Map;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.iceberg.BaseMetastoreCatalog;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.TableMetadata;
+import org.apache.iceberg.TableOperations;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.AlreadyExistsException;
+import org.apache.iceberg.exceptions.RuntimeIOException;
+
+
+public class HadoopCatalog extends BaseMetastoreCatalog implements Closeable {
+  private static final String ICEBERG_HADOOP_WAREHOUSE_BASE = 
"iceberg/warehouse";
+  private final Configuration conf;
+  private String warehouseUri;
+
+  public HadoopCatalog(Configuration conf, String warehouseUri) {
+this.conf = conf;
+
+if (warehouseUri != null) {
+  this.warehouseUri = warehouseUri;
+} else {
+  String fsRoot = conf.get("fs.defaultFS");
+  Path warehousePath = new Path(fsRoot, ICEBERG_HADOOP_WAREHOUSE_BASE);
+  try {
+FileSystem fs = Util.getFs(warehousePath, conf);
+if (!fs.isDirectory(warehousePath)) {
+  if (!fs.mkdirs(warehousePath)) {
+throw new IOException("failed to create warehouse for hadoop 
catalog");
+  }
+}
+this.warehouseUri = fsRoot + "/" + ICEBERG_HADOOP_WAREHOUSE_BASE;
+  } catch (IOException e) {
+throw new RuntimeIOException("failed to create directory for 
warehouse", e);
+  }
+}
+  }
+
+  public HadoopCatalog(Configuration conf) {
+this(conf, null);
+  }
+
+  @Override
+  public org.apache.iceberg.Table createTable(
+  TableIdentifier identifier, Schema schema, PartitionSpec spec, 
Map properties) {
+Preconditions.checkArgument(identifier.namespace().levels().length == 1,
+"Missing database in table identifier: %s", identifier);
+Path tablePath = new Path(defaultWarehouseLocation(identifier));
+try {
+  FileSystem fs = Util.getFs(tablePath, conf);
+  if (!fs.isDirectory(tablePath)) {
+fs.mkdirs(tablePath);
+  } else {
+throw new AlreadyExistsException("the table already exists: " + 
identifier);
+  }
+} catch (IOException e) {
+  throw new RuntimeIOException("failed to create directory", e);
+}
+return super.createTable(identifier, schema, spec, null, properties);
 
 Review comment:
   @chenjunjiedada thanks for taking care of this 
[comment](https://github.com/apache/incubator-iceberg/pull/529/files/11e4993b0d60d676b09124bea65bf4adc2fe3c21#r334631040)
   
   May be my understanding is not perfect, so please correct me.
   but looks like as per this flow, we are expecting: HadoopTable will always 
be under hive warehouse directory.
   as this call to `BaseMetastoreCatalog` will use the 
`defaultWarehouseLocation` which will use `hive.metastore.warehouse.dir` to 
form the final location.
   
   Also for HadoopTables, do we need to set the `hive.metastore.warehouse.dir` ?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] manishmalhotrawork commented on a change in pull request #529: Add hadoop table catalog (WIP)

2019-10-17 Thread GitBox

manishmalhotrawork commented on a change in pull request #529: Add hadoop table 
catalog (WIP)
URL: https://github.com/apache/incubator-iceberg/pull/529#discussion_r336255270
 
 

 ##
 File path: core/src/main/java/org/apache/iceberg/hadoop/HadoopCatalog.java
 ##
 @@ -0,0 +1,144 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.hadoop;
+
+import com.google.common.base.Preconditions;
+import java.io.Closeable;
+import java.io.IOException;
+import java.util.Map;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.iceberg.BaseMetastoreCatalog;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.TableMetadata;
+import org.apache.iceberg.TableOperations;
+import org.apache.iceberg.catalog.TableIdentifier;
+import org.apache.iceberg.exceptions.AlreadyExistsException;
+import org.apache.iceberg.exceptions.RuntimeIOException;
+
+
+public class HadoopCatalog extends BaseMetastoreCatalog implements Closeable {
+  private static final String ICEBERG_HADOOP_WAREHOUSE_BASE = 
"iceberg/warehouse";
+  private final Configuration conf;
+  private String warehouseUri;
+
+  public HadoopCatalog(Configuration conf, String warehouseUri) {
+this.conf = conf;
+
+if (warehouseUri != null) {
+  this.warehouseUri = warehouseUri;
+} else {
+  String fsRoot = conf.get("fs.defaultFS");
+  Path warehousePath = new Path(fsRoot, ICEBERG_HADOOP_WAREHOUSE_BASE);
+  try {
+FileSystem fs = Util.getFs(warehousePath, conf);
+if (!fs.isDirectory(warehousePath)) {
+  if (!fs.mkdirs(warehousePath)) {
+throw new IOException("failed to create warehouse for hadoop 
catalog");
+  }
+}
+this.warehouseUri = fsRoot + "/" + ICEBERG_HADOOP_WAREHOUSE_BASE;
+  } catch (IOException e) {
+throw new RuntimeIOException("failed to create directory for 
warehouse", e);
+  }
+}
+  }
+
+  public HadoopCatalog(Configuration conf) {
+this(conf, null);
+  }
+
+  @Override
+  public org.apache.iceberg.Table createTable(
 
 Review comment:
   @chenjunjiedada thanks for taking care.
   
   I see the `public org.apache.iceberg.Table createTable(
 TableIdentifier identifier, Schema schema, PartitionSpec spec, String 
location, Map properties)`  is removed.
   wondering, if the parent class method will still be callable using 
`HadoopCatalog` object.
   So, what would be the behavior of in that case.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #530: [python] adding Hive package to wrap BaseMetastoreTables/TableOperations

2019-10-17 Thread GitBox

rdblue commented on a change in pull request #530: [python] adding Hive package 
to wrap BaseMetastoreTables/TableOperations
URL: https://github.com/apache/incubator-iceberg/pull/530#discussion_r336240323
 
 

 ##
 File path: python/iceberg/hive/hive_table_operations.py
 ##
 @@ -0,0 +1,59 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+
+from iceberg.core import BaseMetastoreTableOperations
+
+
+class HiveTableOperations(BaseMetastoreTableOperations):
+
+def __init__(self, conf, client, database, table):
+super(HiveTableOperations, self).__init__(conf)
+self._client = client
+self.database = database
+self.table = table
+self.refresh()
+
+def refresh(self):
+with self._client as open_client:
+tbl_info = open_client.get_table(self.database, self.table)
+
+table_type = 
tbl_info.parameters.get(BaseMetastoreTableOperations.TABLE_TYPE_PROP)
+
+if table_type is None or table_type.lower() != 
BaseMetastoreTableOperations.ICEBERG_TABLE_TYPE_VALUE:
+raise RuntimeError("Invalid table, not Iceberg: %s.%s.%s" % 
(self.database,
 
 Review comment:
   Looks like the format string wasn't updated. It still has 3 `%s`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox

rdblue commented on a change in pull request #551: [python] First add to docs, 
addresses #323 and #363
URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336238243
 
 

 ##
 File path: python/README.md
 ##
 @@ -15,6 +15,26 @@
  - limitations under the License.
  -->
 
-# Iceberg
-A python implementation of the Iceberg table format.
-See the project level README for more details: 
https://github.com/apache/incubator-iceberg
+# Iceberg Python
+
+Iceberg is a python library for programatic access to iceberg table metadata 
as well as data access. The intention is to provide a functional subset of the 
java library.
+
+## Getting Started
+
+We are not currently publishing to PyPi so the best way to install the library 
is to clone the git repo and do a pip install -e
+
+```
+git clone https://github.com/apache/incubator-iceberg.git
+cd incubator-iceberg/python
+pip install -e .
 
 Review comment:
   Did the other changes to this file make it? Looks like the empty line is 
still there and I don't see the test instructions.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox

rdblue commented on a change in pull request #551: [python] First add to docs, 
addresses #323 and #363
URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336238079
 
 

 ##
 File path: site/docs/python-quickstart.md
 ##
 @@ -0,0 +1,40 @@
+
+
+# Examples
+
+## Inspect Table Metadata
 
 Review comment:
   Sounds a little scary to me. We just want to make it clear that this isn't 
how to use an official release.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] goldentriangle opened a new issue #555: Iceberg tables should allow for automatic table creation when writing if table not exists already

2019-10-17 Thread GitBox

goldentriangle opened a new issue #555: Iceberg tables should allow for 
automatic table creation when writing if table not exists already
URL: https://github.com/apache/incubator-iceberg/issues/555
 
 
   I think this is a special case for 
https://github.com/apache/incubator-iceberg/issues/540.
   
   When writing a spark dataframe into iceberg table, if the table doesn't 
exist, iceberg should create the table/schema automatically/implicitly


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] TGooch44 commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox

TGooch44 commented on a change in pull request #551: [python] First add to 
docs, addresses #323 and #363
URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336134669
 
 

 ##
 File path: site/docs/python-quickstart.md
 ##
 @@ -0,0 +1,40 @@
+
+
+# Examples
+
+## Inspect Table Metadata
 
 Review comment:
   added the following text:
   
   > Iceberg python is currently in development, and as such, should __only__ 
be used for development and testing purposes until an official release has been 
made.
   > 
   > As such, we are not currently publishing to PyPi so the best way to 
install the library is to perform the following steps:
   
   Let me know if that sounds ok


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] TGooch44 commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox

TGooch44 commented on a change in pull request #551: [python] First add to 
docs, addresses #323 and #363
URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336131362
 
 

 ##
 File path: python/README.md
 ##
 @@ -15,6 +15,26 @@
  - limitations under the License.
  -->
 
-# Iceberg
-A python implementation of the Iceberg table format.
-See the project level README for more details: 
https://github.com/apache/incubator-iceberg
+# Iceberg Python
+
+Iceberg is a python library for programatic access to iceberg table metadata 
as well as data access. The intention is to provide a functional subset of the 
java library.
+
+## Getting Started
+
+We are not currently publishing to PyPi so the best way to install the library 
is to clone the git repo and do a pip install -e
+
+```
+git clone https://github.com/apache/incubator-iceberg.git
+cd incubator-iceberg/python
+pip install -e .
 
 Review comment:
   added tox instructions, let me know if that looks ok


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] TGooch44 commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox

TGooch44 commented on a change in pull request #551: [python] First add to 
docs, addresses #323 and #363
URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336131555
 
 

 ##
 File path: site/docs/python-api-intro.md
 ##
 @@ -0,0 +1,143 @@
+
+
+# Iceberg Python API
+
+Much of the python api conforms to the java api. You can get more info about 
the java api [here](https://iceberg.apache.org/api/).
+
+
+## Tables
+
+The Table interface provides access to table metadata
+
++ schema returns the current table schema
++ spec returns the current table partition spec
++ properties returns a map of key-value properties
++ currentSnapshot returns the current table snapshot
++ snapshots returns all valid snapshots for the table
++ snapshot(id) returns a specific snapshot by ID
++ location returns the table’s base location
+
+Tables also provide refresh to update the table to the latest version.
+
+### Scanning
+Iceberg table scans start by creating a TableScan object with newScan.
+
+``` python
+scan = table.new_scan();
+```
+
+To configure a scan, call filter and select on the TableScan to get a new 
TableScan with those changes.
+
+``` python
+filtered_scan = scan.filter(Expressions.equal("id", 5))
+```
+
+String expressions can also be passed to the filter method.
+
+``` python
+filtered_scan = scan.filter("id=5")
+```
+
+Schema projections can be applied against a TableScan by passing a list of 
column names.
+
+``` python
+filtered_scan = scan.select(["col_1", "col_2", "col_3"])
+```
+
+Because some data types cannot be read using the python library, a convenience 
method for excluding columns from projection is provided.
+
+``` python
+filtered_scan = scan.select_except(["unsupported_col_1", "unsupported_col_2"])
+```
+
+
+Calls to configuration methods create a new TableScan so that each TableScan 
is immutable.
+
+When a scan is configured, planFiles, planTasks, and schema are used to return 
files, tasks, and the read projection.
+
+``` python
+scan = table.new_scan() \
+.filter("id=5") \
+.select(["id", "data"])
+
+projection = scan.schema
+for task in scan.plan_tasks():
+print(task)
+```
+
+## Types
+
+Iceberg data types are located in iceberg.api.types.types
+
+### Primitives
+
+Primitive type instances are available from static methods in each type class. 
Types without parameters use get, and types like __decimal__ use factory 
methods:
+
+```python
+IntegerType.get()# int
+DoubleType.get() # double
+DecimalType.of(9, 2) # decimal(9, 2)
+```
+
+### Nested types
+Structs, maps, and lists are created using factory methods in type classes.
+
+Like struct fields, map keys or values and list elements are tracked as nested 
fields. Nested fields track [field 
IDs](https://iceberg.apache.org/evolution/#correctness) and nullability.
+
+Struct fields are created using __NestedField.optional__ or 
__NestedField.required__. Map value and list element nullability is set in the 
map and list factory methods.
 
 Review comment:
   tried to match this up. let me know if it looks better.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] TGooch44 commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox

TGooch44 commented on a change in pull request #551: [python] First add to 
docs, addresses #323 and #363
URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336131620
 
 

 ##
 File path: site/docs/python-api-intro.md
 ##
 @@ -0,0 +1,143 @@
+
+
+# Iceberg Python API
+
+Much of the python api conforms to the java api. You can get more info about 
the java api [here](https://iceberg.apache.org/api/).
+
+
+## Tables
+
+The Table interface provides access to table metadata
+
++ schema returns the current table schema
++ spec returns the current table partition spec
++ properties returns a map of key-value properties
++ currentSnapshot returns the current table snapshot
++ snapshots returns all valid snapshots for the table
++ snapshot(id) returns a specific snapshot by ID
++ location returns the table’s base location
+
+Tables also provide refresh to update the table to the latest version.
+
+### Scanning
+Iceberg table scans start by creating a TableScan object with newScan.
+
+``` python
+scan = table.new_scan();
+```
+
+To configure a scan, call filter and select on the TableScan to get a new 
TableScan with those changes.
+
+``` python
+filtered_scan = scan.filter(Expressions.equal("id", 5))
+```
+
+String expressions can also be passed to the filter method.
+
+``` python
+filtered_scan = scan.filter("id=5")
+```
+
+Schema projections can be applied against a TableScan by passing a list of 
column names.
+
+``` python
+filtered_scan = scan.select(["col_1", "col_2", "col_3"])
+```
+
+Because some data types cannot be read using the python library, a convenience 
method for excluding columns from projection is provided.
+
+``` python
+filtered_scan = scan.select_except(["unsupported_col_1", "unsupported_col_2"])
+```
+
+
+Calls to configuration methods create a new TableScan so that each TableScan 
is immutable.
+
+When a scan is configured, planFiles, planTasks, and schema are used to return 
files, tasks, and the read projection.
+
+``` python
+scan = table.new_scan() \
+.filter("id=5") \
+.select(["id", "data"])
+
+projection = scan.schema
+for task in scan.plan_tasks():
+print(task)
+```
+
+## Types
+
+Iceberg data types are located in iceberg.api.types.types
+
+### Primitives
+
+Primitive type instances are available from static methods in each type class. 
Types without parameters use get, and types like __decimal__ use factory 
methods:
+
+```python
+IntegerType.get()# int
+DoubleType.get() # double
+DecimalType.of(9, 2) # decimal(9, 2)
+```
+
+### Nested types
+Structs, maps, and lists are created using factory methods in type classes.
+
+Like struct fields, map keys or values and list elements are tracked as nested 
fields. Nested fields track [field 
IDs](https://iceberg.apache.org/evolution/#correctness) and nullability.
+
+Struct fields are created using __NestedField.optional__ or 
__NestedField.required__. Map value and list element nullability is set in the 
map and list factory methods.
+
+```python
+# struct<1 id: int, 2 data: optional string>
+struct = StructType.of([NestedField.required(1, "id", IntegerType.get()),
+NestedField.optional(2, "data", StringType.get()])
+  )
+```
+```python
+# map<1 key: int, 2 value: optional string>
+map_var = MapType.of_optional(1, IntegerType.get(),
+  2, StringType.get())
+```
+```python
+# array<1 element: int>
+list_var = ListType.of_required(1, IntegerType.get());
+```
+
+## Expressions
+Iceberg’s expressions are used to configure table scans. To create 
expressions, use the factory methods in Expressions.
+
+Supported predicate expressions are:
+
++ __is_null__
 
 Review comment:
   ditto for the above comment


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] TGooch44 commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox

TGooch44 commented on a change in pull request #551: [python] First add to 
docs, addresses #323 and #363
URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336131188
 
 

 ##
 File path: python/README.md
 ##
 @@ -15,6 +15,26 @@
  - limitations under the License.
  -->
 
-# Iceberg
-A python implementation of the Iceberg table format.
-See the project level README for more details: 
https://github.com/apache/incubator-iceberg
+# Iceberg Python
+
+Iceberg is a python library for programatic access to iceberg table metadata 
as well as data access. The intention is to provide a functional subset of the 
java library.
+
+## Getting Started
 
 Review comment:
   added


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] jzhuge commented on issue #446: KryoException when writing Iceberg tables in Spark

2019-10-17 Thread GitBox

jzhuge commented on issue #446: KryoException when writing Iceberg tables in 
Spark 
URL: 
https://github.com/apache/incubator-iceberg/issues/446#issuecomment-543276340
 
 
   @aokolnychyi @shardulm94 @rdsr please take a look at a custom Spark Kryo 
registrator for Iceberg in #549.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on issue #550: Bump ORC from 1.5.5 to 1.5.6

2019-10-17 Thread GitBox

rdblue commented on issue #550: Bump ORC from 1.5.5 to 1.5.6
URL: https://github.com/apache/incubator-iceberg/pull/550#issuecomment-543261091
 
 
   Thanks, @Fokko!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #543: Avoid NullPointerException in FindFiles when there is no snapshot

2019-10-17 Thread GitBox

rdblue commented on a change in pull request #543: Avoid NullPointerException 
in FindFiles when there is no snapshot
URL: https://github.com/apache/incubator-iceberg/pull/543#discussion_r336111263
 
 

 ##
 File path: core/src/main/java/org/apache/iceberg/FindFiles.java
 ##
 @@ -191,7 +191,10 @@ public Builder inPartitions(PartitionSpec spec, 
List partitions) {
   Snapshot snapshot = snapshotId != null ?
   ops.current().snapshot(snapshotId) : ops.current().currentSnapshot();
 
-  CloseableIterable entries = new ManifestGroup(ops, 
snapshot.manifests())
+  // snapshot could be null when the table just gets created
+  Iterable manifests = (snapshot != null) ? 
snapshot.manifests() : CloseableIterable.empty();
+
+  CloseableIterable entries = new ManifestGroup(ops, 
manifests)
 
 Review comment:
   If there are no manifests, then entries should be 
`CloseableIterable.empty()`, not the manifest iterable. That doesn't need to be 
closeable.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue merged pull request #550: Bump ORC from 1.5.5 to 1.5.6

2019-10-17 Thread GitBox

rdblue merged pull request #550: Bump ORC from 1.5.5 to 1.5.6
URL: https://github.com/apache/incubator-iceberg/pull/550
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox

rdblue commented on a change in pull request #551: [python] First add to docs, 
addresses #323 and #363
URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336110220
 
 

 ##
 File path: site/docs/python-quickstart.md
 ##
 @@ -0,0 +1,40 @@
+
+
+# Examples
+
+## Inspect Table Metadata
 
 Review comment:
   It would be good to have the information on how to install the library in a 
section here. In user-facing docs like this, we need to be clear that 
installing from master is for development and testing purposes. We can't 
recommend using code unless it is a released version. That means the wording 
should be something like "Iceberg for Python is not yet released and published 
to PyPI. To try out the python library, you can install it using `pip -e`: ..."


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on issue #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox

rdblue commented on issue #551: [python] First add to docs, addresses #323 and 
#363
URL: https://github.com/apache/incubator-iceberg/pull/551#issuecomment-543259905
 
 
   Thanks, @TGooch44! Great to see Python docs!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox

rdblue commented on a change in pull request #551: [python] First add to docs, 
addresses #323 and #363
URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336108392
 
 

 ##
 File path: site/docs/python-api-intro.md
 ##
 @@ -0,0 +1,143 @@
+
+
+# Iceberg Python API
+
+Much of the python api conforms to the java api. You can get more info about 
the java api [here](https://iceberg.apache.org/api/).
+
+
+## Tables
+
+The Table interface provides access to table metadata
+
++ schema returns the current table schema
++ spec returns the current table partition spec
++ properties returns a map of key-value properties
++ currentSnapshot returns the current table snapshot
++ snapshots returns all valid snapshots for the table
++ snapshot(id) returns a specific snapshot by ID
++ location returns the table’s base location
+
+Tables also provide refresh to update the table to the latest version.
+
+### Scanning
+Iceberg table scans start by creating a TableScan object with newScan.
+
+``` python
+scan = table.new_scan();
+```
+
+To configure a scan, call filter and select on the TableScan to get a new 
TableScan with those changes.
+
+``` python
+filtered_scan = scan.filter(Expressions.equal("id", 5))
+```
+
+String expressions can also be passed to the filter method.
+
+``` python
+filtered_scan = scan.filter("id=5")
+```
+
+Schema projections can be applied against a TableScan by passing a list of 
column names.
+
+``` python
+filtered_scan = scan.select(["col_1", "col_2", "col_3"])
+```
+
+Because some data types cannot be read using the python library, a convenience 
method for excluding columns from projection is provided.
+
+``` python
+filtered_scan = scan.select_except(["unsupported_col_1", "unsupported_col_2"])
+```
+
+
+Calls to configuration methods create a new TableScan so that each TableScan 
is immutable.
+
+When a scan is configured, planFiles, planTasks, and schema are used to return 
files, tasks, and the read projection.
+
+``` python
+scan = table.new_scan() \
+.filter("id=5") \
+.select(["id", "data"])
+
+projection = scan.schema
+for task in scan.plan_tasks():
+print(task)
+```
+
+## Types
+
+Iceberg data types are located in iceberg.api.types.types
+
+### Primitives
+
+Primitive type instances are available from static methods in each type class. 
Types without parameters use get, and types like __decimal__ use factory 
methods:
+
+```python
+IntegerType.get()# int
+DoubleType.get() # double
+DecimalType.of(9, 2) # decimal(9, 2)
+```
+
+### Nested types
+Structs, maps, and lists are created using factory methods in type classes.
+
+Like struct fields, map keys or values and list elements are tracked as nested 
fields. Nested fields track [field 
IDs](https://iceberg.apache.org/evolution/#correctness) and nullability.
+
+Struct fields are created using __NestedField.optional__ or 
__NestedField.required__. Map value and list element nullability is set in the 
map and list factory methods.
+
+```python
+# struct<1 id: int, 2 data: optional string>
+struct = StructType.of([NestedField.required(1, "id", IntegerType.get()),
+NestedField.optional(2, "data", StringType.get()])
+  )
+```
+```python
+# map<1 key: int, 2 value: optional string>
+map_var = MapType.of_optional(1, IntegerType.get(),
+  2, StringType.get())
+```
+```python
+# array<1 element: int>
+list_var = ListType.of_required(1, IntegerType.get());
+```
+
+## Expressions
+Iceberg’s expressions are used to configure table scans. To create 
expressions, use the factory methods in Expressions.
+
+Supported predicate expressions are:
+
++ __is_null__
 
 Review comment:
   Could you use fixed-width here instead of bold?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox

rdblue commented on a change in pull request #551: [python] First add to docs, 
addresses #323 and #363
URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336108513
 
 

 ##
 File path: site/docs/python-api-intro.md
 ##
 @@ -0,0 +1,143 @@
+
+
+# Iceberg Python API
+
+Much of the python api conforms to the java api. You can get more info about 
the java api [here](https://iceberg.apache.org/api/).
+
+
+## Tables
+
+The Table interface provides access to table metadata
+
++ schema returns the current table schema
 
 Review comment:
   Using a fixed-width font here for method names would assist readability.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox

rdblue commented on a change in pull request #551: [python] First add to docs, 
addresses #323 and #363
URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336108312
 
 

 ##
 File path: site/docs/python-api-intro.md
 ##
 @@ -0,0 +1,143 @@
+
+
+# Iceberg Python API
+
+Much of the python api conforms to the java api. You can get more info about 
the java api [here](https://iceberg.apache.org/api/).
+
+
+## Tables
+
+The Table interface provides access to table metadata
+
++ schema returns the current table schema
++ spec returns the current table partition spec
++ properties returns a map of key-value properties
++ currentSnapshot returns the current table snapshot
++ snapshots returns all valid snapshots for the table
++ snapshot(id) returns a specific snapshot by ID
++ location returns the table’s base location
+
+Tables also provide refresh to update the table to the latest version.
+
+### Scanning
+Iceberg table scans start by creating a TableScan object with newScan.
+
+``` python
+scan = table.new_scan();
+```
+
+To configure a scan, call filter and select on the TableScan to get a new 
TableScan with those changes.
+
+``` python
+filtered_scan = scan.filter(Expressions.equal("id", 5))
+```
+
+String expressions can also be passed to the filter method.
+
+``` python
+filtered_scan = scan.filter("id=5")
+```
+
+Schema projections can be applied against a TableScan by passing a list of 
column names.
+
+``` python
+filtered_scan = scan.select(["col_1", "col_2", "col_3"])
+```
+
+Because some data types cannot be read using the python library, a convenience 
method for excluding columns from projection is provided.
+
+``` python
+filtered_scan = scan.select_except(["unsupported_col_1", "unsupported_col_2"])
+```
+
+
+Calls to configuration methods create a new TableScan so that each TableScan 
is immutable.
+
+When a scan is configured, planFiles, planTasks, and schema are used to return 
files, tasks, and the read projection.
+
+``` python
+scan = table.new_scan() \
+.filter("id=5") \
+.select(["id", "data"])
+
+projection = scan.schema
+for task in scan.plan_tasks():
+print(task)
+```
+
+## Types
+
+Iceberg data types are located in iceberg.api.types.types
+
+### Primitives
+
+Primitive type instances are available from static methods in each type class. 
Types without parameters use get, and types like __decimal__ use factory 
methods:
+
+```python
+IntegerType.get()# int
+DoubleType.get() # double
+DecimalType.of(9, 2) # decimal(9, 2)
+```
+
+### Nested types
+Structs, maps, and lists are created using factory methods in type classes.
+
+Like struct fields, map keys or values and list elements are tracked as nested 
fields. Nested fields track [field 
IDs](https://iceberg.apache.org/evolution/#correctness) and nullability.
+
+Struct fields are created using __NestedField.optional__ or 
__NestedField.required__. Map value and list element nullability is set in the 
map and list factory methods.
 
 Review comment:
   For method names, we typically use fixed-width font, like this:
   
   ```
   ... using `NestedField.optional` or `NestedField.required`. Map value ...
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox

rdblue commented on a change in pull request #551: [python] First add to docs, 
addresses #323 and #363
URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336107609
 
 

 ##
 File path: python/README.md
 ##
 @@ -15,6 +15,26 @@
  - limitations under the License.
  -->
 
-# Iceberg
-A python implementation of the Iceberg table format.
-See the project level README for more details: 
https://github.com/apache/incubator-iceberg
+# Iceberg Python
+
+Iceberg is a python library for programatic access to iceberg table metadata 
as well as data access. The intention is to provide a functional subset of the 
java library.
+
+## Getting Started
+
+We are not currently publishing to PyPi so the best way to install the library 
is to clone the git repo and do a pip install -e
+
+```
+git clone https://github.com/apache/incubator-iceberg.git
+cd incubator-iceberg/python
+pip install -e .
 
 Review comment:
   This doesn't quite resolve #323 because it doesn't document how to run 
python tests.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox

rdblue commented on a change in pull request #551: [python] First add to docs, 
addresses #323 and #363
URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336107609
 
 

 ##
 File path: python/README.md
 ##
 @@ -15,6 +15,26 @@
  - limitations under the License.
  -->
 
-# Iceberg
-A python implementation of the Iceberg table format.
-See the project level README for more details: 
https://github.com/apache/incubator-iceberg
+# Iceberg Python
+
+Iceberg is a python library for programatic access to iceberg table metadata 
as well as data access. The intention is to provide a functional subset of the 
java library.
+
+## Getting Started
+
+We are not currently publishing to PyPi so the best way to install the library 
is to clone the git repo and do a pip install -e
+
+```
+git clone https://github.com/apache/incubator-iceberg.git
+cd incubator-iceberg/python
+pip install -e .
 
 Review comment:
   This doesn't quite resolve #323 because it doesn't document how to run 
python tests. Could you add a section for that?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #551: [python] First add to docs, addresses #323 and #363

2019-10-17 Thread GitBox

rdblue commented on a change in pull request #551: [python] First add to docs, 
addresses #323 and #363
URL: https://github.com/apache/incubator-iceberg/pull/551#discussion_r336107337
 
 

 ##
 File path: python/README.md
 ##
 @@ -15,6 +15,26 @@
  - limitations under the License.
  -->
 
-# Iceberg
-A python implementation of the Iceberg table format.
-See the project level README for more details: 
https://github.com/apache/incubator-iceberg
+# Iceberg Python
+
+Iceberg is a python library for programatic access to iceberg table metadata 
as well as data access. The intention is to provide a functional subset of the 
java library.
+
+## Getting Started
+
+We are not currently publishing to PyPi so the best way to install the library 
is to clone the git repo and do a pip install -e
+
+```
+git clone https://github.com/apache/incubator-iceberg.git
+cd incubator-iceberg/python
+pip install -e .
+
 
 Review comment:
   Nit: empty line.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue merged pull request #554: Fix IllegalArgumentException in DataFiles.Builder.withPartitionPath

2019-10-17 Thread GitBox

rdblue merged pull request #554: Fix IllegalArgumentException in 
DataFiles.Builder.withPartitionPath
URL: https://github.com/apache/incubator-iceberg/pull/554
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] jzhuge edited a comment on issue #554: Fix IllegalArgumentException in DataFiles.Builder.withPartitionPath

2019-10-17 Thread GitBox

jzhuge edited a comment on issue #554: Fix IllegalArgumentException in 
DataFiles.Builder.withPartitionPath
URL: https://github.com/apache/incubator-iceberg/pull/554#issuecomment-543249742
 
 
   @rdblue When you merged #57 into "rblue/iceberg" branch in commit 
22d802aca84f27be4e95bda2030ca7f423e854fc on Mar 13th, did you add the changes 
to DataFiles.Builder.withPartitionPath? I have the suspicion because they were 
not in @aokolnychyi's commit 234f49ffdbae82566ef8971679576d8702571fd6 merged 
into master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] jzhuge edited a comment on issue #554: Fix IllegalArgumentException in DataFiles.Builder.withPartitionPath

2019-10-17 Thread GitBox

jzhuge edited a comment on issue #554: Fix IllegalArgumentException in 
DataFiles.Builder.withPartitionPath
URL: https://github.com/apache/incubator-iceberg/pull/554#issuecomment-543249742
 
 
   @rdblue When you merged #57 into "rblue/iceberg" branch in 
22d802aca84f27be4e95bda2030ca7f423e854fc on Mar 13th, did you add the changes 
to DataFiles.Builder.withPartitionPath? I have the suspicion because they were 
not in @aokolnychyi's 234f49ffdbae82566ef8971679576d8702571fd6 merged into 
master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] jzhuge edited a comment on issue #554: Fix IllegalArgumentException in DataFiles.Builder.withPartitionPath

2019-10-17 Thread GitBox

jzhuge edited a comment on issue #554: Fix IllegalArgumentException in 
DataFiles.Builder.withPartitionPath
URL: https://github.com/apache/incubator-iceberg/pull/554#issuecomment-543249742
 
 
   @rdblue When you merged #57 into "rblue/iceberg" branch in commit 
22d802aca84f27be4e95bda2030ca7f423e854fc on Mar 13th, did you add the changes 
to DataFiles.Builder.withPartitionPath? I have the suspicion because they were 
not in @aokolnychyi's #57 commit 234f49ffdbae82566ef8971679576d8702571fd6 
merged into master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] jzhuge commented on issue #554: Fix IllegalArgumentException in DataFiles.Builder.withPartitionPath

2019-10-17 Thread GitBox

jzhuge commented on issue #554: Fix IllegalArgumentException in 
DataFiles.Builder.withPartitionPath
URL: https://github.com/apache/incubator-iceberg/pull/554#issuecomment-543249742
 
 
   @rdblue When you merged #57 into "rblue/iceberg" branch in commit 
22d802aca84f27be4e95bda2030ca7f423e854fc on Mar 13th, did you add the changes 
to DataFiles.Builder.withPartitionPath? I have the suspicion because they were 
not in @aokolnychyi's original commit for #57.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] jzhuge commented on issue #554: Fix IllegalArgumentException in DataFiles.Builder.withPartitionPath

2019-10-17 Thread GitBox

jzhuge commented on issue #554: Fix IllegalArgumentException in 
DataFiles.Builder.withPartitionPath
URL: https://github.com/apache/incubator-iceberg/pull/554#issuecomment-543248490
 
 
   @rdblue this PR is probably no longer necessary because of #507, right?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] jzhuge commented on a change in pull request #549: Add Spark custom Kryo registrator

2019-10-17 Thread GitBox

jzhuge commented on a change in pull request #549: Add Spark custom Kryo 
registrator
URL: https://github.com/apache/incubator-iceberg/pull/549#discussion_r336087118
 
 

 ##
 File path: build.gradle
 ##
 @@ -429,6 +429,8 @@ project(':iceberg-spark') {
 compile project(':iceberg-parquet')
 compile project(':iceberg-hive')
 
+compile 'de.javakaffee:kryo-serializers'
 
 Review comment:
   Added additional LICENSE and NOTICE.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] aokolnychyi edited a comment on issue #553: Spark ReadTask is expensive to serialize

2019-10-17 Thread GitBox

aokolnychyi edited a comment on issue #553: Spark ReadTask is expensive to 
serialize
URL: 
https://github.com/apache/incubator-iceberg/issues/553#issuecomment-543199975
 
 
   As a short-term solution, we can broadcast `EncryptionManager` and `FileIO` 
in `IcebergSource`. Then `Reader` and `ReadTask` can store references to the 
broadcasted values and fetch actual ones in `createPartitionReader` while 
creating `TaskDataReader`. This seems to solve the scheduler delay issue.
   
   @rdblue thoughts?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] aokolnychyi commented on issue #553: Spark ReadTask is expensive to serialize

2019-10-17 Thread GitBox

aokolnychyi commented on issue #553: Spark ReadTask is expensive to serialize
URL: 
https://github.com/apache/incubator-iceberg/issues/553#issuecomment-543199975
 
 
   As a short-term solution, we can broadcast `EncryptionManager` and `FileIO` 
in `IcebergSource`. Then `Reader` and `ReadTask` can store references to the 
broadcasted values and fetch actual ones in `createPartitionReader` while 
creating `TaskDataReader`.
   
   @rdblue thoughts?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] andrei-ionescu commented on issue #510: Cannot update an Iceberg dataset from a Parquet file due to "field should be required, but is optional"

2019-10-17 Thread GitBox

andrei-ionescu commented on issue #510: Cannot update an Iceberg dataset from a 
Parquet file due to "field should be required, but is optional"
URL: 
https://github.com/apache/incubator-iceberg/issues/510#issuecomment-543192332
 
 
   @rdsr Given two different locations of data (`hdfs://host_1/input/data/` and 
`hdfs://host_2/input/data/`), how would you move the `day=2019-06-01` partition 
from **host_1** to **host_2** applying some transformations (host_1 data is 
parquet format, host_2 data is iceberg format)?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] aokolnychyi commented on issue #553: Spark ReadTask is expensive to serialize

2019-10-17 Thread GitBox

aokolnychyi commented on issue #553: Spark ReadTask is expensive to serialize
URL: 
https://github.com/apache/incubator-iceberg/issues/553#issuecomment-543149968
 
 
   I can confirm the issue is resolved if we avoid serializing `FileIO`. The 
main question is how to achieve that with minimum changes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] jzhuge opened a new pull request #554: Fix IllegalArgumentException in DataFiles.Builder.withPartitionPath

2019-10-17 Thread GitBox

jzhuge opened a new pull request #554: Fix IllegalArgumentException in 
DataFiles.Builder.withPartitionPath
URL: https://github.com/apache/incubator-iceberg/pull/554
 
 
   DataFiles.fillFromPath threw "Invalid partition data, too many fields 
(expecting 0)" when the path is empty.
   
   The fix was in Anton's #57 but somehow got lost.
   The ugly `var` code can be removed from SparkDataFile.toDataFile.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

53 matches

Mail list logo