[GitHub] [incubator-iceberg] jun-he commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression

2020-02-02 Thread GitBox
jun-he commented on a change in pull request #749: Convert Spark In filter to 
iceberg IN Expression
URL: https://github.com/apache/incubator-iceberg/pull/749#discussion_r373900954
 
 

 ##
 File path: 
spark/src/test/java/org/apache/iceberg/spark/source/TestFilteredScan.java
 ##
 @@ -543,11 +579,11 @@ private File buildPartitionedTable(String desc, 
PartitionSpec spec, String udf,
 
   private List testRecords(org.apache.avro.Schema avroSchema) {
 return Lists.newArrayList(
-record(avroSchema, 0L, timestamp("2017-12-22T09:20:44.294658+00:00"), 
"junction"),
+record(avroSchema, 0L, timestamp("2017-12-22T09:20:44.294+00:00"), 
"junction"),
 
 Review comment:
   The test fails because those partitions picked in the tests have only one 
value (equals the lower and higher bound) so the Timestamp must exactly match.
   
   To avoid change those values, I will update the test to use the partition of 
`2017-12-21T15`, which contains two records. So any Timestamp between them will 
match.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] jun-he commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression

2020-02-02 Thread GitBox
jun-he commented on a change in pull request #749: Convert Spark In filter to 
iceberg IN Expression
URL: https://github.com/apache/incubator-iceberg/pull/749#discussion_r373900954
 
 

 ##
 File path: 
spark/src/test/java/org/apache/iceberg/spark/source/TestFilteredScan.java
 ##
 @@ -543,11 +579,11 @@ private File buildPartitionedTable(String desc, 
PartitionSpec spec, String udf,
 
   private List testRecords(org.apache.avro.Schema avroSchema) {
 return Lists.newArrayList(
-record(avroSchema, 0L, timestamp("2017-12-22T09:20:44.294658+00:00"), 
"junction"),
+record(avroSchema, 0L, timestamp("2017-12-22T09:20:44.294+00:00"), 
"junction"),
 
 Review comment:
   The test fails because those partitions picked in the tests have only one 
value (equals the lower and higher bound) so the Timestamp must exactly match.
   
   I will update the test to use the partition of `2017-12-21T15`, which 
contains two records. So any Timestamp between them will match.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] jun-he commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression

2020-02-02 Thread GitBox
jun-he commented on a change in pull request #749: Convert Spark In filter to 
iceberg IN Expression
URL: https://github.com/apache/incubator-iceberg/pull/749#discussion_r373900954
 
 

 ##
 File path: 
spark/src/test/java/org/apache/iceberg/spark/source/TestFilteredScan.java
 ##
 @@ -543,11 +579,11 @@ private File buildPartitionedTable(String desc, 
PartitionSpec spec, String udf,
 
   private List testRecords(org.apache.avro.Schema avroSchema) {
 return Lists.newArrayList(
-record(avroSchema, 0L, timestamp("2017-12-22T09:20:44.294658+00:00"), 
"junction"),
+record(avroSchema, 0L, timestamp("2017-12-22T09:20:44.294+00:00"), 
"junction"),
 
 Review comment:
   The test fails because those partitions picked in the tests have only one 
value (equals the lower and higher bound) so the Timestamp must exactly match.
   
   To avoid changing those values, I will update the test to use the partition 
of `2017-12-21T15`, which contains two records. So any Timestamp between them 
will match.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] jun-he commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression

2020-02-02 Thread GitBox
jun-he commented on a change in pull request #749: Convert Spark In filter to 
iceberg IN Expression
URL: https://github.com/apache/incubator-iceberg/pull/749#discussion_r373880462
 
 

 ##
 File path: 
spark/src/test/java/org/apache/iceberg/spark/source/TestFilteredScan.java
 ##
 @@ -543,11 +579,11 @@ private File buildPartitionedTable(String desc, 
PartitionSpec spec, String udf,
 
   private List testRecords(org.apache.avro.Schema avroSchema) {
 return Lists.newArrayList(
-record(avroSchema, 0L, timestamp("2017-12-22T09:20:44.294658+00:00"), 
"junction"),
+record(avroSchema, 0L, timestamp("2017-12-22T09:20:44.294+00:00"), 
"junction"),
 
 Review comment:
   @rdblue It is because `java.sql.Timestamp` constructor uses a milliseconds 
time value. 
   There is a deprecated `java.sql.Timestamp` constructor to use year, month, 
date, hour, minute, second, and nano. But we also need take care of timezone 
issue (java timestamp is always UTC). 
   
   So to avoid using deprecated method and make the test straightforward, I 
just update two records to be millisecond scale.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] jun-he commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression

2020-01-30 Thread GitBox
jun-he commented on a change in pull request #749: Convert Spark In filter to 
iceberg IN Expression
URL: https://github.com/apache/incubator-iceberg/pull/749#discussion_r372840052
 
 

 ##
 File path: 
spark/src/test/java/org/apache/iceberg/spark/source/TestFilteredScan.java
 ##
 @@ -425,6 +426,21 @@ public void testFilterByNonProjectedColumn() {
 }
   }
 
+  @Test
+  public void testInFilter() {
+File location = buildPartitionedTable("partitioned_by_data", 
PARTITION_BY_DATA, "data_ident", "data");
+
+DataSourceOptions options = new DataSourceOptions(ImmutableMap.of(
+"path", location.toString())
+);
+
+IcebergSource source = new IcebergSource();
+DataSourceReader reader = source.createReader(options);
+pushFilters(reader, new In("data", new String[]{"foo", "junction", 
"brush"}));
 
 Review comment:
    


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] jun-he commented on a change in pull request #749: Convert Spark In filter to iceberg IN Expression

2020-01-27 Thread GitBox
jun-he commented on a change in pull request #749: Convert Spark In filter to 
iceberg IN Expression
URL: https://github.com/apache/incubator-iceberg/pull/749#discussion_r371617351
 
 

 ##
 File path: spark/src/main/java/org/apache/iceberg/spark/SparkFilters.java
 ##
 @@ -122,11 +122,7 @@ public static Expression convert(Filter filter) {
 
 case IN:
   In inFilter = (In) filter;
-  Expression in = alwaysFalse();
-  for (Object value : inFilter.values()) {
-in = or(in, equal(inFilter.attribute(), convertLiteral(value)));
-  }
-  return in;
+  return in(inFilter.attribute(), inFilter.values());
 
 Review comment:
   Thanks @aokolnychyi for the comments. I will add additional tests for those 
cases. 
   @rdblue I am thinking if we can add this transformation `(in -> or(isNull, 
in))` into iceberg's `Expressions` so each caller does not need to repeatedly 
implement this logic.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org