date:20190614

[GitHub] [incubator-iceberg] rdsr opened a new pull request #220: Support parsing of special characters in TypeToSchema visitor

2019-06-14 Thread GitBox

rdsr opened a new pull request #220: Support parsing of special characters in 
TypeToSchema visitor
URL: https://github.com/apache/incubator-iceberg/pull/220
 
 
   This addresses #216


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] edgarRd commented on issue #213: Incomplete Iceberg to ORC column mapping

2019-06-14 Thread GitBox

edgarRd commented on issue #213: Incomplete Iceberg to ORC column mapping
URL: 
https://github.com/apache/incubator-iceberg/issues/213#issuecomment-502318792
 
 
   I have a preliminary fix for this issue in branch: 
https://github.com/apache/incubator-iceberg/compare/master...edgarRd:orc-column-map-fix
 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #219: Fix concurrent access to lazy PartitionSpec methods

2019-06-14 Thread GitBox

rdblue commented on a change in pull request #219: Fix concurrent access to 
lazy PartitionSpec methods
URL: https://github.com/apache/incubator-iceberg/pull/219#discussion_r294026110
 
 

 ##
 File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
 ##
 @@ -118,12 +118,18 @@ public int specId() {
 
   public Class[] javaClasses() {
 if (lazyJavaClasses == null) {
-  this.lazyJavaClasses = new Class[fields.length];
-  for (int i = 0; i < fields.length; i += 1) {
-PartitionField field = fields[i];
-Type sourceType = schema.findType(field.sourceId());
-Type result = field.transform().getResultType(sourceType);
-lazyJavaClasses[i] = result.typeId().javaClass();
+  synchronized (this) {
+if (lazyJavaClasses == null) {
 
 Review comment:
   You're right. It's still safe, but two threads may initialize the variable.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdsr commented on a change in pull request #219: Fix concurrent access to lazy PartitionSpec methods

2019-06-14 Thread GitBox

rdsr commented on a change in pull request #219: Fix concurrent access to lazy 
PartitionSpec methods
URL: https://github.com/apache/incubator-iceberg/pull/219#discussion_r294024626
 
 

 ##
 File path: api/src/main/java/org/apache/iceberg/PartitionSpec.java
 ##
 @@ -118,12 +118,18 @@ public int specId() {
 
   public Class[] javaClasses() {
 if (lazyJavaClasses == null) {
-  this.lazyJavaClasses = new Class[fields.length];
-  for (int i = 0; i < fields.length; i += 1) {
-PartitionField field = fields[i];
-Type sourceType = schema.findType(field.sourceId());
-Type result = field.transform().getResultType(sourceType);
-lazyJavaClasses[i] = result.typeId().javaClass();
+  synchronized (this) {
+if (lazyJavaClasses == null) {
 
 Review comment:
   I believe  for the double-check idiom to reliably work these fields need to 
be `volatile`
   http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue opened a new pull request #219: Fix concurrent access to lazy PartitionSpec methods

2019-06-14 Thread GitBox

rdblue opened a new pull request #219: Fix concurrent access to lazy 
PartitionSpec methods
URL: https://github.com/apache/incubator-iceberg/pull/219
 
 
   This was causing `NullPointerException` in `PartitionData.get`. When two 
Spark tasks started at the same time on an executor, both would load the same 
partition spec instance because of the spec parser's cache. One thread would 
create `lazyJavaClasses` and another would access it before it was fully 
initialized, causing Spark's converter to partition to cache null classes for 
some partitions, leading to the NPE from `PartitionData.get`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue opened a new pull request #218: Fix uncommitted file clean-up in transactions

2019-06-14 Thread GitBox

rdblue opened a new pull request #218: Fix uncommitted file clean-up in 
transactions
URL: https://github.com/apache/incubator-iceberg/pull/218
 
 
   This adds a callback for deleting files to `SnapshotProducer` and uses that 
callback in `Transaction`
   to track deletes instead of running them. When a transaction is committed, 
the last set of deletes are run so that the actual deletes are for the last 
commit of each operation in the transaction.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdsr edited a comment on issue #216: PartitionData cannot handle nested field names in PartitionSpec schema

2019-06-14 Thread GitBox

rdsr edited a comment on issue #216: PartitionData cannot handle nested field 
names in PartitionSpec schema
URL: 
https://github.com/apache/incubator-iceberg/issues/216#issuecomment-502308596
 
 
   The special char can only be DOT though right?
   Also it seems like you are suggesting that we should handle the sanitization 
in TypeToSchema visitor. That makes sense to me.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on issue #216: PartitionData cannot handle nested field names in PartitionSpec schema

2019-06-14 Thread GitBox

rdblue commented on issue #216: PartitionData cannot handle nested field names 
in PartitionSpec schema
URL: 
https://github.com/apache/incubator-iceberg/issues/216#issuecomment-502309454
 
 
   I think it could be anything since we allow users to pass strings into 
`UpdateSchema`. We should probably sanitize any character that isn't allowed by 
Avro, assuming that they might creep in.
   
   By the way, because we project based on ID and not by name, the names used 
in Avro files don't really matter. So this should be safe to change.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdsr edited a comment on issue #216: PartitionData cannot handle nested field names in PartitionSpec schema

2019-06-14 Thread GitBox

rdsr edited a comment on issue #216: PartitionData cannot handle nested field 
names in PartitionSpec schema
URL: 
https://github.com/apache/incubator-iceberg/issues/216#issuecomment-502308596
 
 
   The special char can only be DOT though right?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdsr commented on issue #216: PartitionData cannot handle nested field names in PartitionSpec schema

2019-06-14 Thread GitBox

rdsr commented on issue #216: PartitionData cannot handle nested field names in 
PartitionSpec schema
URL: 
https://github.com/apache/incubator-iceberg/issues/216#issuecomment-502308596
 
 
   Today the special char can only be DOT though right?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdsr edited a comment on issue #216: PartitionData cannot handle nested field names in PartitionSpec schema

2019-06-14 Thread GitBox

rdsr edited a comment on issue #216: PartitionData cannot handle nested field 
names in PartitionSpec schema
URL: 
https://github.com/apache/incubator-iceberg/issues/216#issuecomment-502306413
 
 
   @rdblue ,  as per the spec 
   > The source column, selected by id, must be a primitive type and cannot be 
contained in a map or list, but may be nested in a struct
   
   So we can possibly have 2 ways to address this. 
   1. Automatically replace the dot with underscore
   2. Allow user to specify the target column name in the partition field 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdsr edited a comment on issue #216: PartitionData cannot handle nested field names in PartitionSpec schema

2019-06-14 Thread GitBox

rdsr edited a comment on issue #216: PartitionData cannot handle nested field 
names in PartitionSpec schema
URL: 
https://github.com/apache/incubator-iceberg/issues/216#issuecomment-502306413
 
 
   @rdblue ,  as per the spec 
   > The source column, selected by id, must be a primitive type and cannot be 
contained in a map or list, but may be nested in a struct
   So we can possibly have 2 ways to address this. 
   
   1. Automatically replace the dot with underscore
   2. Allow user to specify the target column name in the partition field 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on issue #216: PartitionData cannot handle nested field names in PartitionSpec schema

2019-06-14 Thread GitBox

rdblue commented on issue #216: PartitionData cannot handle nested field names 
in PartitionSpec schema
URL: 
https://github.com/apache/incubator-iceberg/issues/216#issuecomment-502307787
 
 
   Iceberg supports special characters in names, looks like the failure is in 
Avro. I think Iceberg should sanitize special characters from the names when 
creating an Avro schema.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdsr opened a new issue #217: IcebergRuntime should not bundle hadoop-hdfs

2019-06-14 Thread GitBox

rdsr opened a new issue #217: IcebergRuntime should not bundle hadoop-hdfs
URL: https://github.com/apache/incubator-iceberg/issues/217
 
 
   The dep chain is 
   iceberg-runtime -> iceberg-spark -> iceberg-orc -> orc -> hadoop-hdfs
   
   ORC is bringing in an old version (2.2) of hadoop-hdfs 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdsr commented on issue #216: PartitionData cannot handle nested field names in PartitionSpec schema

2019-06-14 Thread GitBox

rdsr commented on issue #216: PartitionData cannot handle nested field names in 
PartitionSpec schema
URL: 
https://github.com/apache/incubator-iceberg/issues/216#issuecomment-502306413
 
 
   @rdblue ,  as per the spec 
   > The source column, selected by id, must be a primitive type and cannot be 
contained in a map or list, but may be nested in a struct
   So we can possibly have 2 ways to address this. 
   1. Automatically replace the dot with underscore
   2. Allow user to specify the target column name in the partition field 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdsr opened a new issue #216: `PartitionData` cannot handle nested field names in `PartitionSpec` schema

2019-06-14 Thread GitBox

rdsr opened a new issue #216: `PartitionData` cannot handle nested field names 
in `PartitionSpec` schema
URL: https://github.com/apache/incubator-iceberg/issues/216
 
 
   The following test will throw an exception when parsing the partition's 
Iceberg schema to Avro schema 
   
   Test: 
https://github.com/rdsr/incubator-iceberg/commit/95b95b0eed3b7fa074a65422aa26ab70c7e22444


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on issue #215: Remove uncessary precondition checks

2019-06-14 Thread GitBox

rdblue commented on issue #215: Remove uncessary precondition checks
URL: https://github.com/apache/incubator-iceberg/pull/215#issuecomment-502303647
 
 
   Here's the problem:
   
   ```
   [ant:checkstyle] [ERROR] 
/home/travis/build/apache/incubator-iceberg/core/src/main/java/org/apache/iceberg/avro/AvroFileAppender.java:22:8:
 Unused import - com.google.common.base.Preconditions. [UnusedImports]
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] danielcweeks commented on issue #215: Remove uncessary precondition checks

2019-06-14 Thread GitBox

danielcweeks commented on issue #215: Remove uncessary precondition checks
URL: https://github.com/apache/incubator-iceberg/pull/215#issuecomment-502251379
 
 
   > I think this also needs to update the `length` javadoc because the 
requirement that the appender has been closed is no longer needed.
   
   Good spot.  I've updated the javadoc.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdblue commented on issue #215: Remove uncessary precondition checks

2019-06-14 Thread GitBox

rdblue commented on issue #215: Remove uncessary precondition checks
URL: https://github.com/apache/incubator-iceberg/pull/215#issuecomment-502230983
 
 
   I think this also needs to update the `length` javadoc because the 
requirement that the appender has been closed is no longer needed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] danielcweeks opened a new pull request #215: Remove uncessary precondition checks

2019-06-14 Thread GitBox

danielcweeks opened a new pull request #215: Remove uncessary precondition 
checks
URL: https://github.com/apache/incubator-iceberg/pull/215
 
 
   These checks unnecessarily prevent checking how much data has been written.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] edgarRd commented on a change in pull request #199: ORC metrics

2019-06-14 Thread GitBox

edgarRd commented on a change in pull request #199: ORC metrics
URL: https://github.com/apache/incubator-iceberg/pull/199#discussion_r293859195
 
 

 ##
 File path: orc/src/main/java/org/apache/iceberg/orc/OrcMetrics.java
 ##
 @@ -45,16 +61,82 @@ public static Metrics fromInputFile(InputFile file, 
Configuration config) {
 try {
   final Reader orcReader = OrcFile.createReader(new Path(file.location()),
   OrcFile.readerOptions(config));
+  final Schema schema = TypeConversion.fromOrc(orcReader.getSchema());
+
+  ColumnStatistics[] colStats = orcReader.getStatistics();
+  Map columSizes = 
Maps.newHashMapWithExpectedSize(colStats.length);
+  Map valueCounts = 
Maps.newHashMapWithExpectedSize(colStats.length);
+  Map lowerBounds = Maps.newHashMap();
+  Map upperBounds = Maps.newHashMap();
+
+  for(Types.NestedField col : schema.columns()) {
+final int i = col.fieldId();
+columSizes.put(i, colStats[i].getBytesOnDisk());
+valueCounts.put(i, colStats[i].getNumberOfValues());
+
+Optional orcMin = fromOrcMin(col, colStats[i]);
+orcMin.ifPresent(byteBuffer -> lowerBounds.put(i, byteBuffer));
+Optional orcMax = fromOrcMax(col, colStats[i]);
+orcMax.ifPresent(byteBuffer -> upperBounds.put(i, byteBuffer));
+  }
 
-  // TODO: implement rest of the methods for ORC metrics
   return new Metrics(orcReader.getNumberOfRows(),
-  null,
-  null,
+  columSizes,
+  valueCounts,
   Collections.emptyMap(),
-  null,
-  null);
+  lowerBounds,
+  upperBounds);
 } catch (IOException ioe) {
   throw new RuntimeIOException(ioe, "Failed to read footer of file: %s", 
file);
 }
   }
+
+  private static Optional fromOrcMin(Types.NestedField column,
+ ColumnStatistics columnStats) 
{
+ByteBuffer min = null;
+if (columnStats instanceof IntegerColumnStatistics) {
+  IntegerColumnStatistics intColStats = (IntegerColumnStatistics) 
columnStats;
+  if (column.type().typeId() == Type.TypeID.INTEGER) {
+min = toByteBuffer(column.type(), (int) intColStats.getMinimum());
+  } else {
+min = toByteBuffer(column.type(), intColStats.getMinimum());
+  }
+} else if (columnStats instanceof DoubleColumnStatistics) {
+  min = toByteBuffer(column.type(), ((DoubleColumnStatistics) 
columnStats).getMinimum());
+} else if (columnStats instanceof StringColumnStatistics) {
+  min = toByteBuffer(column.type(), ((StringColumnStatistics) 
columnStats).getMinimum());
 
 Review comment:
   Yes, I’ll push an update for this in a bit. Thanks. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdsr commented on a change in pull request #199: ORC metrics

2019-06-14 Thread GitBox

rdsr commented on a change in pull request #199: ORC metrics
URL: https://github.com/apache/incubator-iceberg/pull/199#discussion_r293854867
 
 

 ##
 File path: orc/src/main/java/org/apache/iceberg/orc/OrcMetrics.java
 ##
 @@ -45,16 +61,82 @@ public static Metrics fromInputFile(InputFile file, 
Configuration config) {
 try {
   final Reader orcReader = OrcFile.createReader(new Path(file.location()),
   OrcFile.readerOptions(config));
+  final Schema schema = TypeConversion.fromOrc(orcReader.getSchema());
+
+  ColumnStatistics[] colStats = orcReader.getStatistics();
+  Map columSizes = 
Maps.newHashMapWithExpectedSize(colStats.length);
+  Map valueCounts = 
Maps.newHashMapWithExpectedSize(colStats.length);
+  Map lowerBounds = Maps.newHashMap();
+  Map upperBounds = Maps.newHashMap();
+
+  for(Types.NestedField col : schema.columns()) {
+final int i = col.fieldId();
+columSizes.put(i, colStats[i].getBytesOnDisk());
+valueCounts.put(i, colStats[i].getNumberOfValues());
+
+Optional orcMin = fromOrcMin(col, colStats[i]);
+orcMin.ifPresent(byteBuffer -> lowerBounds.put(i, byteBuffer));
+Optional orcMax = fromOrcMax(col, colStats[i]);
+orcMax.ifPresent(byteBuffer -> upperBounds.put(i, byteBuffer));
+  }
 
-  // TODO: implement rest of the methods for ORC metrics
   return new Metrics(orcReader.getNumberOfRows(),
-  null,
-  null,
+  columSizes,
+  valueCounts,
   Collections.emptyMap(),
-  null,
-  null);
+  lowerBounds,
+  upperBounds);
 } catch (IOException ioe) {
   throw new RuntimeIOException(ioe, "Failed to read footer of file: %s", 
file);
 }
   }
+
+  private static Optional fromOrcMin(Types.NestedField column,
+ ColumnStatistics columnStats) 
{
+ByteBuffer min = null;
+if (columnStats instanceof IntegerColumnStatistics) {
+  IntegerColumnStatistics intColStats = (IntegerColumnStatistics) 
columnStats;
+  if (column.type().typeId() == Type.TypeID.INTEGER) {
+min = toByteBuffer(column.type(), (int) intColStats.getMinimum());
+  } else {
+min = toByteBuffer(column.type(), intColStats.getMinimum());
+  }
+} else if (columnStats instanceof DoubleColumnStatistics) {
+  min = toByteBuffer(column.type(), ((DoubleColumnStatistics) 
columnStats).getMinimum());
+} else if (columnStats instanceof StringColumnStatistics) {
+  min = toByteBuffer(column.type(), ((StringColumnStatistics) 
columnStats).getMinimum());
 
 Review comment:
   Seems like DecimalColumnStatistics, DateColumnStatistics and 
TimestampColumnStatistics all can return null for min/max


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[GitHub] [incubator-iceberg] rdsr opened a new pull request #220: Support parsing of special characters in TypeToSchema visitor

[GitHub] [incubator-iceberg] edgarRd commented on issue #213: Incomplete Iceberg to ORC column mapping

[GitHub] [incubator-iceberg] rdblue commented on a change in pull request #219: Fix concurrent access to lazy PartitionSpec methods

[GitHub] [incubator-iceberg] rdsr commented on a change in pull request #219: Fix concurrent access to lazy PartitionSpec methods

[GitHub] [incubator-iceberg] rdblue opened a new pull request #219: Fix concurrent access to lazy PartitionSpec methods

[GitHub] [incubator-iceberg] rdblue opened a new pull request #218: Fix uncommitted file clean-up in transactions

[GitHub] [incubator-iceberg] rdsr edited a comment on issue #216: PartitionData cannot handle nested field names in PartitionSpec schema

[GitHub] [incubator-iceberg] rdblue commented on issue #216: PartitionData cannot handle nested field names in PartitionSpec schema

[GitHub] [incubator-iceberg] rdsr edited a comment on issue #216: PartitionData cannot handle nested field names in PartitionSpec schema

[GitHub] [incubator-iceberg] rdsr commented on issue #216: PartitionData cannot handle nested field names in PartitionSpec schema

[GitHub] [incubator-iceberg] rdsr edited a comment on issue #216: PartitionData cannot handle nested field names in PartitionSpec schema

[GitHub] [incubator-iceberg] rdsr edited a comment on issue #216: PartitionData cannot handle nested field names in PartitionSpec schema

[GitHub] [incubator-iceberg] rdblue commented on issue #216: PartitionData cannot handle nested field names in PartitionSpec schema

[GitHub] [incubator-iceberg] rdsr opened a new issue #217: IcebergRuntime should not bundle hadoop-hdfs

[GitHub] [incubator-iceberg] rdsr commented on issue #216: PartitionData cannot handle nested field names in PartitionSpec schema

[GitHub] [incubator-iceberg] rdsr opened a new issue #216: `PartitionData` cannot handle nested field names in `PartitionSpec` schema

[GitHub] [incubator-iceberg] rdblue commented on issue #215: Remove uncessary precondition checks

[GitHub] [incubator-iceberg] danielcweeks commented on issue #215: Remove uncessary precondition checks

[GitHub] [incubator-iceberg] rdblue commented on issue #215: Remove uncessary precondition checks

[GitHub] [incubator-iceberg] danielcweeks opened a new pull request #215: Remove uncessary precondition checks

[GitHub] [incubator-iceberg] edgarRd commented on a change in pull request #199: ORC metrics

[GitHub] [incubator-iceberg] rdsr commented on a change in pull request #199: ORC metrics

22 matches

Site Navigation

Mail list logo

Footer information