[GitHub] [incubator-iceberg] rdblue commented on pull request #983: Convert date and timestamp values in generics
rdblue commented on pull request #983: URL: https://github.com/apache/incubator-iceberg/pull/983#issuecomment-633650626 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] rdblue commented on pull request #983: Convert date and timestamp values in generics
rdblue commented on pull request #983: URL: https://github.com/apache/incubator-iceberg/pull/983#issuecomment-632781762 Also, to address your comment that the accessors approach is cleaner, we already have two copies of the accessors (in Spark and in API) and running evaluators on another object model would require another copy. Moving the conversion responsibility to a wrapper means we can use the same accessors in other places. And a wrapper also fits well with the need to adapt other record models to `StructLike`. We can build a `StructLike` wrapper for Avro records, for example. This doesn't support lists and maps because those can't be used in expressions. If we start supporting expressions that operate on collections, we would need to build a visitor like you suggested. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] rdblue commented on pull request #983: Convert date and timestamp values in generics
rdblue commented on pull request #983: URL: https://github.com/apache/incubator-iceberg/pull/983#issuecomment-632778769 You're right that it's more similar to your original commit than to my suggestion to use accessors. The other issue with accessors made me think that conversion should not be tightly coupled there. This is also a bit different from your original version: the `GenericRecord` class isn't modified. Instead, it is wrapped to produce the internal representations. The original version detected when it needed to convert based on the requested class, which is what I wanted to avoid. This converts values using a list of transforms determined when the wrapper is created. For example, if position 5 is a date, then it is always transformed when it is returned. This cuts down on the amount of work done fetching each value. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] rdblue commented on pull request #983: Convert date and timestamp values in generics
rdblue commented on pull request #983: URL: https://github.com/apache/incubator-iceberg/pull/983#issuecomment-632331000 @chenjunjiedada, I think what you have here works, but right now we have two sets of accessors (Spark and internal) and we would need more for other data models. I think what we need to do instead is to separate the concerns. Accessors should understand structure and rows should return the correct value types. That way we can eventually move to use the same accessors and we will just need rows that correctly translate for StructLike. How about using a wrapper class like this one when passing records? Would that work? ```java class InternalRecordWrapper implements StructLike { private final Function[] transforms; private StructLike wrapped = null; @SuppressWarnings("unchecked") InternalRecordWrapper(Types.StructType struct) { this.transforms = struct.fields().stream() .map(field -> converter(field.type())) .toArray(length -> (Function[]) Array.newInstance(Function.class, length)); } private static Function converter(Type type) { switch (type.typeId()) { case DATE: return date -> DateTimeUtil.daysFromDate((LocalDate) date); case TIME: return time -> DateTimeUtil.microsFromTime((LocalTime) time); case TIMESTAMP: if (((Types.TimestampType) type).shouldAdjustToUTC()) { return timestamp -> DateTimeUtil.microsFromTimestamptz((OffsetDateTime) timestamp); } else { return timestamp -> DateTimeUtil.microsFromTimestamp((LocalDateTime) timestamp); } case FIXED: return bytes -> ByteBuffer.wrap((byte[]) bytes); case STRUCT: InternalRecordWrapper wrapper = new InternalRecordWrapper(type.asStructType()); return struct -> wrapper.wrap((StructLike) struct); default: } return null; } public InternalRecordWrapper wrap(StructLike record) { this.wrapped = record; return this; } @Override public int size() { return wrapped.size(); } @Override public T get(int pos, Class javaClass) { if (transforms[pos] != null) { return javaClass.cast(transforms[pos].apply(wrapped.get(pos, Object.class))); } return wrapped.get(pos, javaClass); } @Override public void set(int pos, T value) { throw new UnsupportedOperationException("Cannot update InternalRecordWrapper"); } } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org
[GitHub] [incubator-iceberg] rdblue commented on pull request #983: Convert date and timestamp values in generics
rdblue commented on pull request #983: URL: https://github.com/apache/incubator-iceberg/pull/983#issuecomment-620722327 Thanks, @chenjunjiedada, but I don't think this is the right solution to the problem. This adds quite a bit of logic to get and changes the class argument so that it is a type request -- so the record will convert a value to the requested type. But get is used in a tight loop and we don't want it to do that additional work to not only return a value, but detect what it is supposed to convert to and do that conversion. Also, this changes the contract of a public method and we don't want to make additional guarantees here. Instead, the accessors that are build when binding the message to an expression should be used. This happens in Spark, where string values are converted from UTF8String to a CharSequence using a [StringAccessor](https://github.com/apache/incubator-iceberg/blob/master/spark/src/main/java/org/apache/iceberg/spark/source/PartitionKey.java#L260). This should be similar, where a DateAccessor, or TimestampAccessor is used to return the right internal representation from a record that stores LocalDate or OffsetDateTime. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org