Re: [Draft report] Apache Parquet

2016-10-13 Thread Ryan Blue
+1

On Wed, Oct 12, 2016 at 11:40 PM, Uwe Korn  wrote:

> +1
>
>
>
> On 13.10.16 02:43, Julien Le Dem wrote:
>
>> Report from the Apache Parquet committee [Julien Le Dem]
>>
>> ## Description:
>> Parquet is a standard and interoperable columnar file format for
>> efficient analytics.
>>
>> ## Issues:
>> there are no issues requiring board attention at this time
>>
>> ## Activity:
>> The community has been converging toward a 1.9 release. The vote will
>> start
>> in the coming days. Discussion about better encoding and vectorization
>> apis
>> are ongoing.
>> The parquet-cpp repo has reached a stable state and should release soon.
>> Integration with arrow-cpp is now in the parquet-cpp repo.
>>
>> ## Health report:
>> The PMC and committer list are growing. Discussion is happening on the
>> mailing list, JIRA and regular hangout sync up. Notes are sent to the
>> mailing list.
>>
>> ## PMC changes:
>>
>>   - Currently 22 PMC members.
>>   - Wes McKinney was added to the PMC on Thu Sep 01 2016
>>
>> ## Committer base changes:
>>
>>   - Currently 25 committers.
>>   - Uwe Korn was added as a committer on Sun Sep 04 2016
>>
>> ## Releases:
>>
>>   - Last release was Format 2.3.1 on Thu Dec 17 2015
>>
>> ## Mailing list activity:
>>
>>   - Activity on the mailing list is still relatively the same
>>   - JIRAS are resolved about at the same pace they are opened.
>>
>>   - dev@parquet.apache.org:
>>  - 172 subscribers (up 9 in the last 3 months):
>>  - 486 emails sent to list (394 in previous quarter)
>>
>>
>> ## JIRA activity:
>>
>>   - 85 JIRA tickets created in the last 3 months
>>   - 74 JIRA tickets closed/resolved in the last 3 months
>>
>>
>


-- 
Ryan Blue
Software Engineer
Netflix


[VOTE] Release Apache Parquet 1.9.0 RC1

2016-10-13 Thread Ryan Blue
Hi everyone,

I propose the following RC to be released as official Apache Parquet 1.9.0
release.

The commit id is 2a99abf784cb6e76160d49506ea87581a2256021
* This corresponds to the tag: apache-parquet-1.9.0
* https://github.com/apache/parquet-mr/tree/2a99abf7
*
https://git-wip-us.apache.org/repos/asf/projects/repo?p=parquet-mr.git=commit=2a99abf7

The release tarball, signature, and checksums are here:
* https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-1.9.0-rc1

You can find the KEYS file here:
* https://dist.apache.org/repos/dist/dev/parquet/KEYS

Binary artifacts are staged in Nexus here:
*
https://repository.apache.org/content/groups/staging/org/apache/parquet/parquet/

This release includes:
* Support for Hadoop's ByteBuffer read API
* Dictionary-based row group filters
* Cascading3 integration
* Predicate push-down for Pig
* Decimal, timestamp, and date support in Avro
* Numerous bug fixes

Please verify, test, and vote by Monday, 17 October 2016.

[ ] +1 Release this as Apache Parquet 1.9.0
[ ] +0
[ ] -1 Do not release this because...


-- 
Ryan Blue


[jira] [Commented] (PARQUET-392) Release Parquet-mr 1.9.0

2016-10-13 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572834#comment-15572834
 ] 

Ryan Blue commented on PARQUET-392:
---

The [vote 
thread|http://mail-archives.apache.org/mod_mbox/incubator-parquet-dev/201610.mbox/%3CCAO4re1kic7NnAhtGRK%3D2QOZ1D%3Dyk5xMcKSg8%3DwGqe6OX4QnwLg%40mail.gmail.com%3E]
 for RC1 is open. Please take a look at the candidate and vote!

> Release Parquet-mr 1.9.0
> 
>
> Key: PARQUET-392
> URL: https://issues.apache.org/jira/browse/PARQUET-392
> Project: Parquet
>  Issue Type: Task
>Reporter: Julien Le Dem
>Assignee: Julien Le Dem
> Fix For: 1.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [Draft report] Apache Parquet

2016-10-13 Thread Wes McKinney
+1

On Thu, Oct 13, 2016 at 11:15 AM, Ryan Blue  wrote:
> +1
>
> On Wed, Oct 12, 2016 at 11:40 PM, Uwe Korn  wrote:
>
>> +1
>>
>>
>>
>> On 13.10.16 02:43, Julien Le Dem wrote:
>>
>>> Report from the Apache Parquet committee [Julien Le Dem]
>>>
>>> ## Description:
>>> Parquet is a standard and interoperable columnar file format for
>>> efficient analytics.
>>>
>>> ## Issues:
>>> there are no issues requiring board attention at this time
>>>
>>> ## Activity:
>>> The community has been converging toward a 1.9 release. The vote will
>>> start
>>> in the coming days. Discussion about better encoding and vectorization
>>> apis
>>> are ongoing.
>>> The parquet-cpp repo has reached a stable state and should release soon.
>>> Integration with arrow-cpp is now in the parquet-cpp repo.
>>>
>>> ## Health report:
>>> The PMC and committer list are growing. Discussion is happening on the
>>> mailing list, JIRA and regular hangout sync up. Notes are sent to the
>>> mailing list.
>>>
>>> ## PMC changes:
>>>
>>>   - Currently 22 PMC members.
>>>   - Wes McKinney was added to the PMC on Thu Sep 01 2016
>>>
>>> ## Committer base changes:
>>>
>>>   - Currently 25 committers.
>>>   - Uwe Korn was added as a committer on Sun Sep 04 2016
>>>
>>> ## Releases:
>>>
>>>   - Last release was Format 2.3.1 on Thu Dec 17 2015
>>>
>>> ## Mailing list activity:
>>>
>>>   - Activity on the mailing list is still relatively the same
>>>   - JIRAS are resolved about at the same pace they are opened.
>>>
>>>   - dev@parquet.apache.org:
>>>  - 172 subscribers (up 9 in the last 3 months):
>>>  - 486 emails sent to list (394 in previous quarter)
>>>
>>>
>>> ## JIRA activity:
>>>
>>>   - 85 JIRA tickets created in the last 3 months
>>>   - 74 JIRA tickets closed/resolved in the last 3 months
>>>
>>>
>>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix


Re: [Draft report] Apache Parquet

2016-10-13 Thread Jake Farrell
+1

-Jake

On Wed, Oct 12, 2016 at 8:43 PM, Julien Le Dem  wrote:

> Report from the Apache Parquet committee [Julien Le Dem]
>
> ## Description:
> Parquet is a standard and interoperable columnar file format for
> efficient analytics.
>
> ## Issues:
> there are no issues requiring board attention at this time
>
> ## Activity:
> The community has been converging toward a 1.9 release. The vote will start
> in the coming days. Discussion about better encoding and vectorization apis
> are ongoing.
> The parquet-cpp repo has reached a stable state and should release soon.
> Integration with arrow-cpp is now in the parquet-cpp repo.
>
> ## Health report:
> The PMC and committer list are growing. Discussion is happening on the
> mailing list, JIRA and regular hangout sync up. Notes are sent to the
> mailing list.
>
> ## PMC changes:
>
>  - Currently 22 PMC members.
>  - Wes McKinney was added to the PMC on Thu Sep 01 2016
>
> ## Committer base changes:
>
>  - Currently 25 committers.
>  - Uwe Korn was added as a committer on Sun Sep 04 2016
>
> ## Releases:
>
>  - Last release was Format 2.3.1 on Thu Dec 17 2015
>
> ## Mailing list activity:
>
>  - Activity on the mailing list is still relatively the same
>  - JIRAS are resolved about at the same pace they are opened.
>
>  - dev@parquet.apache.org:
> - 172 subscribers (up 9 in the last 3 months):
> - 486 emails sent to list (394 in previous quarter)
>
>
> ## JIRA activity:
>
>  - 85 JIRA tickets created in the last 3 months
>  - 74 JIRA tickets closed/resolved in the last 3 months
>
> --
> Julien
>


[jira] [Created] (PARQUET-749) Schema building ParquetFileWriter

2016-10-13 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created PARQUET-749:
---

 Summary: Schema building ParquetFileWriter
 Key: PARQUET-749
 URL: https://issues.apache.org/jira/browse/PARQUET-749
 Project: Parquet
  Issue Type: New Feature
  Components: parquet-cpp
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn


Sometimes you want to write a Parquet file column-by-column without knowing the 
whole schema upfront. Format-wise this should be possible if you only have a 
single RowGroup. The user would write column after column and only at the end 
(when we need to write the metadata), the whole schema can be assembled.

Currently ParquetFileWriter only supports writing files when the schema is 
known upfront.

Limitations for the start:
 * A single RowGroup is written
 * No nestings



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [Draft report] Apache Parquet

2016-10-13 Thread Uwe Korn

+1


On 13.10.16 02:43, Julien Le Dem wrote:

Report from the Apache Parquet committee [Julien Le Dem]

## Description:
Parquet is a standard and interoperable columnar file format for
efficient analytics.

## Issues:
there are no issues requiring board attention at this time

## Activity:
The community has been converging toward a 1.9 release. The vote will start
in the coming days. Discussion about better encoding and vectorization apis
are ongoing.
The parquet-cpp repo has reached a stable state and should release soon.
Integration with arrow-cpp is now in the parquet-cpp repo.

## Health report:
The PMC and committer list are growing. Discussion is happening on the
mailing list, JIRA and regular hangout sync up. Notes are sent to the
mailing list.

## PMC changes:

  - Currently 22 PMC members.
  - Wes McKinney was added to the PMC on Thu Sep 01 2016

## Committer base changes:

  - Currently 25 committers.
  - Uwe Korn was added as a committer on Sun Sep 04 2016

## Releases:

  - Last release was Format 2.3.1 on Thu Dec 17 2015

## Mailing list activity:

  - Activity on the mailing list is still relatively the same
  - JIRAS are resolved about at the same pace they are opened.

  - dev@parquet.apache.org:
 - 172 subscribers (up 9 in the last 3 months):
 - 486 emails sent to list (394 in previous quarter)


## JIRA activity:

  - 85 JIRA tickets created in the last 3 months
  - 74 JIRA tickets closed/resolved in the last 3 months