[jira] [Commented] (PARQUET-2134) Incorrect type checking in HadoopStreams.wrap

2022-07-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17562229#comment-17562229
 ] 

ASF GitHub Bot commented on PARQUET-2134:
-

7c00 commented on PR #951:
URL: https://github.com/apache/parquet-mr/pull/951#issuecomment-1173982766

   @shangxinli Thank you for reminding me. I have squashed the PR and added 
@steveloughran as the co-author.




> Incorrect type checking in HadoopStreams.wrap
> -
>
> Key: PARQUET-2134
> URL: https://issues.apache.org/jira/browse/PARQUET-2134
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.8.3, 1.10.1, 1.11.2, 1.12.2
>Reporter: Todd Gao
>Priority: Minor
>
> The method 
> [HadoopStreams.wrap|https://github.com/apache/parquet-mr/blob/4d062dc37577e719dcecc666f8e837843e44a9be/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopStreams.java#L51]
>  wraps an FSDataInputStream to a SeekableInputStream. 
> It checks whether the underlying stream of the passed  FSDataInputStream 
> implements ByteBufferReadable: if true, wraps the FSDataInputStream to 
> H2SeekableInputStream; otherwise, wraps to H1SeekableInputStream.
> In some cases, we may add another wrapper over FSDataInputStream. For 
> example, 
> {code:java}
> class CustomDataInputStream extends FSDataInputStream {
> public CustomDataInputStream(FSDataInputStream original) {
> super(original);
> }
> }
> {code}
> When we create an FSDataInputStream, whose underlying stream does not 
> implements ByteBufferReadable, and then creates a CustomDataInputStream with 
> it. If we use HadoopStreams.wrap to create a SeekableInputStream, we may get 
> an error like 
> {quote}java.lang.UnsupportedOperationException: Byte-buffer read unsupported 
> by input stream{quote}
> We can fix this by taking recursive checks over the underlying stream of 
> FSDataInputStream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [parquet-mr] 7c00 commented on pull request #951: PARQUET-2134: Fix type checking in HadoopStreams.wrap

2022-07-04 Thread GitBox


7c00 commented on PR #951:
URL: https://github.com/apache/parquet-mr/pull/951#issuecomment-1173982766

   @shangxinli Thank you for reminding me. I have squashed the PR and added 
@steveloughran as the co-author.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay

2022-07-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17562101#comment-17562101
 ] 

ASF GitHub Bot commented on PARQUET-2042:
-

mwong38 commented on code in PR #900:
URL: https://github.com/apache/parquet-mr/pull/900#discussion_r912817641


##
parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java:
##
@@ -97,6 +127,46 @@ public MessageType convert(Class 
protobufClass) {
 
   private  Builder>, GroupBuilder> 
addField(FieldDescriptor descriptor, final GroupBuilder builder) {
 if (descriptor.getJavaType() == JavaType.MESSAGE) {
+  if (unwrapProtoWrappers) {
+String typeName = descriptor.getMessageType().getFullName();
+if (typeName.equals(PROTOBUF_TIMESTAMP_TYPE)) {
+  return builder.primitive(INT64, 
getRepetition(descriptor)).as(timestampType(true, TimeUnit.NANOS));

Review Comment:
   Isn't that outside the scope of ProtoSchemaConverter/ParquetWriter? I don't 
think it should go into the business of doing transformations. If the point of 
`ProtoSchemaConverter` is to give the closest representation of the Protobuf 
object in Parquet, then it should be NANOS and nothing more.





> Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
> ---
>
> Key: PARQUET-2042
> URL: https://issues.apache.org/jira/browse/PARQUET-2042
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-protobuf
>Reporter: Michael Wong
>Priority: Major
>
> Related to https://issues.apache.org/jira/browse/PARQUET-1595



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [parquet-mr] mwong38 commented on a diff in pull request #900: PARQUET-2042: Add support for unwrapping common Protobuf wrappers and…

2022-07-04 Thread GitBox


mwong38 commented on code in PR #900:
URL: https://github.com/apache/parquet-mr/pull/900#discussion_r912817641


##
parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java:
##
@@ -97,6 +127,46 @@ public MessageType convert(Class 
protobufClass) {
 
   private  Builder>, GroupBuilder> 
addField(FieldDescriptor descriptor, final GroupBuilder builder) {
 if (descriptor.getJavaType() == JavaType.MESSAGE) {
+  if (unwrapProtoWrappers) {
+String typeName = descriptor.getMessageType().getFullName();
+if (typeName.equals(PROTOBUF_TIMESTAMP_TYPE)) {
+  return builder.primitive(INT64, 
getRepetition(descriptor)).as(timestampType(true, TimeUnit.NANOS));

Review Comment:
   Isn't that outside the scope of ProtoSchemaConverter/ParquetWriter? I don't 
think it should go into the business of doing transformations. If the point of 
`ProtoSchemaConverter` is to give the closest representation of the Protobuf 
object in Parquet, then it should be NANOS and nothing more.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org