[GitHub] [incubator-iceberg] rdblue commented on pull request #983: Convert date and timestamp values in generics

2020-05-26 Thread GitBox


rdblue commented on pull request #983:
URL: https://github.com/apache/incubator-iceberg/pull/983#issuecomment-633650626







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on pull request #983: Convert date and timestamp values in generics

2020-05-22 Thread GitBox


rdblue commented on pull request #983:
URL: https://github.com/apache/incubator-iceberg/pull/983#issuecomment-632781762


   Also, to address your comment that the accessors approach is cleaner, we 
already have two copies of the accessors (in Spark and in API) and running 
evaluators on another object model would require another copy. Moving the 
conversion responsibility to a wrapper means we can use the same accessors in 
other places. And a wrapper also fits well with the need to adapt other record 
models to `StructLike`. We can build a `StructLike` wrapper for Avro records, 
for example.
   
   This doesn't support lists and maps because those can't be used in 
expressions. If we start supporting expressions that operate on collections, we 
would need to build a visitor like you suggested.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on pull request #983: Convert date and timestamp values in generics

2020-05-22 Thread GitBox


rdblue commented on pull request #983:
URL: https://github.com/apache/incubator-iceberg/pull/983#issuecomment-632778769


   You're right that it's more similar to your original commit than to my 
suggestion to use accessors. The other issue with accessors made me think that 
conversion should not be tightly coupled there.
   
   This is also a bit different from your original version: the `GenericRecord` 
class isn't modified. Instead, it is wrapped to produce the internal 
representations. The original version detected when it needed to convert based 
on the requested class, which is what I wanted to avoid. This converts values 
using a list of transforms determined when the wrapper is created. For example, 
if position 5 is a date, then it is always transformed when it is returned. 
This cuts down on the amount of work done fetching each value.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on pull request #983: Convert date and timestamp values in generics

2020-05-21 Thread GitBox


rdblue commented on pull request #983:
URL: https://github.com/apache/incubator-iceberg/pull/983#issuecomment-632331000


   @chenjunjiedada, I think what you have here works, but right now we have two 
sets of accessors (Spark and internal) and we would need more for other data 
models.
   
   I think what we need to do instead is to separate the concerns. Accessors 
should understand structure and rows should return the correct value types. 
That way we can eventually move to use the same accessors and we will just need 
rows that correctly translate for StructLike.
   
   How about using a wrapper class like this one when passing records? Would 
that work?
   
   ```java
   class InternalRecordWrapper implements StructLike {
 private final Function[] transforms;
 private StructLike wrapped = null;
   
 @SuppressWarnings("unchecked")
 InternalRecordWrapper(Types.StructType struct) {
   this.transforms = struct.fields().stream()
   .map(field -> converter(field.type()))
   .toArray(length -> (Function[]) 
Array.newInstance(Function.class, length));
 }
   
 private static Function converter(Type type) {
   switch (type.typeId()) {
 case DATE:
   return date -> DateTimeUtil.daysFromDate((LocalDate) date);
 case TIME:
   return time -> DateTimeUtil.microsFromTime((LocalTime) time);
 case TIMESTAMP:
   if (((Types.TimestampType) type).shouldAdjustToUTC()) {
 return timestamp -> 
DateTimeUtil.microsFromTimestamptz((OffsetDateTime) timestamp);
   } else {
 return timestamp -> 
DateTimeUtil.microsFromTimestamp((LocalDateTime) timestamp);
   }
 case FIXED:
   return bytes -> ByteBuffer.wrap((byte[]) bytes);
 case STRUCT:
   InternalRecordWrapper wrapper = new 
InternalRecordWrapper(type.asStructType());
   return struct -> wrapper.wrap((StructLike) struct);
 default:
   }
   return null;
 }
   
 public InternalRecordWrapper wrap(StructLike record) {
   this.wrapped = record;
   return this;
 }
   
 @Override
 public int size() {
   return wrapped.size();
 }
   
 @Override
 public  T get(int pos, Class javaClass) {
   if (transforms[pos] != null) {
 return javaClass.cast(transforms[pos].apply(wrapped.get(pos, 
Object.class)));
   }
   return wrapped.get(pos, javaClass);
 }
   
 @Override
 public  void set(int pos, T value) {
   throw new UnsupportedOperationException("Cannot update 
InternalRecordWrapper");
 }
   }
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org



[GitHub] [incubator-iceberg] rdblue commented on pull request #983: Convert date and timestamp values in generics

2020-04-28 Thread GitBox


rdblue commented on pull request #983:
URL: https://github.com/apache/incubator-iceberg/pull/983#issuecomment-620722327


   Thanks, @chenjunjiedada, but I don't think this is the right solution to the 
problem. This adds quite a bit of logic to get and changes the class argument 
so that it is a type request -- so the record will convert a value to the 
requested type. But get is used in a tight loop and we don't want it to do that 
additional work to not only return a value, but detect what it is supposed to 
convert to and do that conversion. Also, this changes the contract of a public 
method and we don't want to make additional guarantees here.
   
   Instead, the accessors that are build when binding the message to an 
expression should be used. This happens in Spark, where string values are 
converted from UTF8String to a CharSequence using a 
[StringAccessor](https://github.com/apache/incubator-iceberg/blob/master/spark/src/main/java/org/apache/iceberg/spark/source/PartitionKey.java#L260).
 This should be similar, where a DateAccessor, or TimestampAccessor is used to 
return the right internal representation from a record that stores LocalDate or 
OffsetDateTime.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org