Re: Parquet vs. other Open Source Columnar Formats

2019-05-09 Thread Uwe L. Korn
Hello,

Be aware that Avro and Protobuf are general serialization formats, not columnar 
ones such as Parquet or ORC. They are good for RPC or row-wise streaming 
whereas the latter two are perfect for analytics.

Uwe

> Am 09.05.2019 um 20:33 schrieb David Mollitor :
> 
> I'm sure there are many different opinions on the matter, but in regards to
> Avro, I would say it is becoming more and more of a niche player.
> 
> Many folks are choosing to go with Google Protobufs for RPC and Parquet/ORC
> for analytic workloads.
> 
>> On Thu, May 9, 2019 at 2:30 PM Brian Bowman  wrote:
>> 
>> All,
>> 
>> Is it fair to say that Parquet is fast becoming the dominate open source
>> columnar storage format?   How do those of you with long-term Hadoop
>> experience see this?  For example, is Parquet overtaking ORC and Avro?
>> 
>> Thanks,
>> 
>> Brian
>> 



Re: Parquet vs. other Open Source Columnar Formats

2019-05-09 Thread David Mollitor
I'm sure there are many different opinions on the matter, but in regards to
Avro, I would say it is becoming more and more of a niche player.

Many folks are choosing to go with Google Protobufs for RPC and Parquet/ORC
for analytic workloads.

On Thu, May 9, 2019 at 2:30 PM Brian Bowman  wrote:

> All,
>
> Is it fair to say that Parquet is fast becoming the dominate open source
> columnar storage format?   How do those of you with long-term Hadoop
> experience see this?  For example, is Parquet overtaking ORC and Avro?
>
> Thanks,
>
> Brian
>


Parquet vs. other Open Source Columnar Formats

2019-05-09 Thread Brian Bowman
All,

Is it fair to say that Parquet is fast becoming the dominate open source 
columnar storage format?   How do those of you with long-term Hadoop experience 
see this?  For example, is Parquet overtaking ORC and Avro?

Thanks,

Brian


[jira] [Commented] (PARQUET-1572) Clarify the definition of timestamp types

2019-05-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836470#comment-16836470
 ] 

ASF GitHub Bot commented on PARQUET-1572:
-

zivanfi commented on pull request #130: PARQUET-1572: Clarify the definition of 
timestamp types
URL: https://github.com/apache/parquet-format/pull/130
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Clarify the definition of timestamp types
> -
>
> Key: PARQUET-1572
> URL: https://issues.apache.org/jira/browse/PARQUET-1572
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-format
>Reporter: Zoltan Ivanfi
>Assignee: Zoltan Ivanfi
>Priority: Major
>  Labels: pull-request-available
>
> The current definition only makes sense for the isUtcAdjusted=true case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1572) Clarify the definition of timestamp types

2019-05-09 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated PARQUET-1572:

Labels: pull-request-available  (was: )

> Clarify the definition of timestamp types
> -
>
> Key: PARQUET-1572
> URL: https://issues.apache.org/jira/browse/PARQUET-1572
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-format
>Reporter: Zoltan Ivanfi
>Assignee: Zoltan Ivanfi
>Priority: Major
>  Labels: pull-request-available
>
> The current definition only makes sense for the isUtcAdjusted=true case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1572) Clarify the definition of timestamp types

2019-05-09 Thread Zoltan Ivanfi (JIRA)
Zoltan Ivanfi created PARQUET-1572:
--

 Summary: Clarify the definition of timestamp types
 Key: PARQUET-1572
 URL: https://issues.apache.org/jira/browse/PARQUET-1572
 Project: Parquet
  Issue Type: Task
  Components: parquet-format
Reporter: Zoltan Ivanfi
Assignee: Zoltan Ivanfi


The current definition only makes sense for the isUtcAdjusted=true case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1555) Bump snappy-java to 1.1.7.3

2019-05-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836280#comment-16836280
 ] 

ASF GitHub Bot commented on PARQUET-1555:
-

zivanfi commented on pull request #632: PARQUET-1555: Bump snappy-java to 
1.1.7.3
URL: https://github.com/apache/parquet-mr/pull/632
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Bump snappy-java to 1.1.7.3
> ---
>
> Key: PARQUET-1555
> URL: https://issues.apache.org/jira/browse/PARQUET-1555
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.10.0
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.11.0
>
>
> Just to make sure that it compiles well against the latest 1.1.7.3 for Java9 
> compatibility.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1557) Replace deprecated Apache Avro methods

2019-05-09 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836114#comment-16836114
 ] 

ASF GitHub Bot commented on PARQUET-1557:
-

zivanfi commented on pull request #636: PARQUET-1557 Replace deprecated Avro 
methods
URL: https://github.com/apache/parquet-mr/pull/636
 
 
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Replace deprecated Apache Avro methods
> --
>
> Key: PARQUET-1557
> URL: https://issues.apache.org/jira/browse/PARQUET-1557
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Fokko Driesprong
>Assignee: Fokko Driesprong
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)