[jira] [Commented] (PARQUET-1432) ACID support

2018-10-01 Thread Ryan Blue (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634298#comment-16634298
 ] 

Ryan Blue commented on PARQUET-1432:


[~yumwang], ACID guarantees are a feature of the table layout, not the file 
format. I don't think Parquet needs to do anything differently to support this. 
What are you proposing to change in Parquet?

> ACID support
> 
>
> Key: PARQUET-1432
> URL: https://issues.apache.org/jira/browse/PARQUET-1432
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-format, parquet-mr
>Affects Versions: 1.10.1
>Reporter: Yuming Wang
>Priority: Major
>
> https://orc.apache.org/docs/acid.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1432) ACID support

2018-10-01 Thread Xinli Shang (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16634187#comment-16634187
 ] 

Xinli Shang commented on PARQUET-1432:
--

Had same thought earlier. Look forward to the design. 

> ACID support
> 
>
> Key: PARQUET-1432
> URL: https://issues.apache.org/jira/browse/PARQUET-1432
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-format, parquet-mr
>Affects Versions: 1.10.1
>Reporter: Yuming Wang
>Priority: Major
>
> https://orc.apache.org/docs/acid.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Release Apache Parquet format 2.6.0 RC0

2018-10-01 Thread Nandor Kollar
Hi All,

The vote for this parquet-format release have passed with
3   "+1" votes (and 1 non-binding)
0   "0" votes
0   "-1" votes

With 3 binding “+1” votes this vote PASSES. We’ll release the
artifacts and send an announcement soon.

Regards,
Nandor
On Sun, Sep 30, 2018 at 11:28 PM Ryan Blue  wrote:
>
> +1 (binding)
>
> On Sat, Sep 29, 2018 at 2:11 AM Wes McKinney  wrote:
>
> > +1 (binding)
> >
> > * Checked checksums, signature
> > * Ran unit tests
> >
> > Note that `mvn test` fails if Apache Thrift 0.10.0 or higher is
> > installed. It looks like this is a problem with the Maven Thrift
> > plugin and not a problem with parquet-format, but definitely a rough
> > edge that will affect users
> >
> > [ERROR] thrift failed output:
> >
> > [WARNING:/home/wesm/Downloads/apache-parquet-format-2.6.0/src/main/thrift/parquet.thrift:295]
> > The "byte" type is a compatibility alias for "i8". Use "i8" to
> > emphasize the signedness of this type.
> >
> > [ERROR] thrift failed error: [FAILURE:generation:1] Error: unknown
> > option java:hashcode
> >
> > - Wes
> > On Fri, Sep 28, 2018 at 2:52 AM Gabor Szadovszky
> >  wrote:
> > >
> > > +1 (non-binding)
> > >
> > > - Checked source tarball content
> > > - Checked checksums, signature
> > >
> > > Cheers,
> > > Gabor
> > >
> > > On Thu, Sep 27, 2018 at 5:10 PM Zoltan Ivanfi 
> > > wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > - contents look good
> > > > - units tests pass
> > > > - checksums match
> > > > - signature matches
> > > >
> > > > Thanks,
> > > >
> > > > Zoltan
> > > >
> > > > On Thu, Sep 27, 2018 at 5:02 PM Nandor Kollar
> >  > > > >
> > > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > I propose the following RC to be released as official Apache Parquet
> > > > > Format 2.6.0 release.
> > > > >
> > > > > The commit id is df6132b94f273521a418a74442085fdd5a0aa009
> > > > > * This corresponds to the tag: apache-parquet-format-2.6.0
> > > > > *
> > > > >
> > > >
> > https://github.com/apache/parquet-format/tree/df6132b94f273521a418a74442085fdd5a0aa009
> > > > > *
> > > > >
> > > >
> > https://gitbox.apache.org/repos/asf?p=parquet-format.git;a=commit;h=df6132b94f273521a418a74442085fdd5a0aa009
> > > > >
> > > > > The release tarball, signature, and checksums are here:
> > > > > *
> > > > >
> > > >
> > https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-format-2.6.0-rc0
> > > > >
> > > > > You can find the KEYS file here:
> > > > > * https://dist.apache.org/repos/dist/dev/parquet/KEYS
> > > > >
> > > > > Binary artifacts are staged in Nexus here:
> > > > > *
> > > > >
> > > >
> > https://repository.apache.org/content/groups/staging/org/apache/parquet/parquet-format/2.6.0
> > > > >
> > > > > This release includes following changes:
> > > > >
> > > > > PARQUET-1266 - LogicalTypes union in parquet-format doesn't include
> > UUID
> > > > > PARQUET-1290 - Clarify maximum run lengths for RLE encoding
> > > > > PARQUET-1387 - Nanosecond precision time and timestamp -
> > parquet-format
> > > > > PARQUET-1400 - Deprecate parquet-mr related code in parquet-format
> > > > > PARQUET-1429 - Turn off DocLint on parquet-format
> > > > >
> > > > > Please download, verify, and test.
> > > > >
> > > > > The voting will be open at least for 72 hour from now.
> > > > >
> > > > > [ ] +1 Release this as Apache Parquet Format 2.6.0
> > > > > [ ] +0
> > > > > [ ] -1 Do not release this because...
> > > > >
> > > > > Thanks,
> > > > > Nandor
> > > > >
> > > >
> >
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix


Row group layout anomalies

2018-10-01 Thread Zoltan Ivanfi
Hi,

PARQUET-1337 describes the problem of ending up with a drastically
different (and worse) row group layout than intended under certain
circumstances.

A few weeks ago I started tweaking the logic that controls this in a
test-driven fashion. I have found that fixing one problem repeatedly leads
to the discovery of another one. After playing this whack-a-mole for a
while, I ended up with a much more fundamental change than I originally
intended with still room (and need) for improvement.

Due to the potential impact of these changes, I have put together a design
doc that describes all the problems I could identify and some possible
fixes for them:

https://docs.google.com/document/d/1FJAVwzszZGkxZa8FtKtSbgBKm7qkS4cXuNW8hl4YKwU/edit#

If you are interested, please review and comment on the document.

Thanks,

Zoltan


[jira] [Assigned] (PARQUET-1433) Parquet-format doesn't compile with Thrift 0.10.0

2018-10-01 Thread Nandor Kollar (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nandor Kollar reassigned PARQUET-1433:
--

Assignee: Nandor Kollar

> Parquet-format doesn't compile with Thrift 0.10.0
> -
>
> Key: PARQUET-1433
> URL: https://issues.apache.org/jira/browse/PARQUET-1433
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-format
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>  Labels: pull-request-available
>
> Compilation of parquet-format fails with Thrift 0.10.0:
> [ERROR] thrift failed error: [FAILURE:generation:1] Error: unknown
> option java:hashcode



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (PARQUET-1433) Parquet-format doesn't compile with Thrift 0.10.0

2018-10-01 Thread ASF GitHub Bot (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated PARQUET-1433:

Labels: pull-request-available  (was: )

> Parquet-format doesn't compile with Thrift 0.10.0
> -
>
> Key: PARQUET-1433
> URL: https://issues.apache.org/jira/browse/PARQUET-1433
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-format
>Reporter: Nandor Kollar
>Priority: Major
>  Labels: pull-request-available
>
> Compilation of parquet-format fails with Thrift 0.10.0:
> [ERROR] thrift failed error: [FAILURE:generation:1] Error: unknown
> option java:hashcode



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1433) Parquet-format doesn't compile with Thrift 0.10.0

2018-10-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633787#comment-16633787
 ] 

ASF GitHub Bot commented on PARQUET-1433:
-

nandorKollar opened a new pull request #111: PARQUET-1433: Parquet-format 
doesn't compile with Thrift 0.10.0
URL: https://github.com/apache/parquet-format/pull/111
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Parquet-format doesn't compile with Thrift 0.10.0
> -
>
> Key: PARQUET-1433
> URL: https://issues.apache.org/jira/browse/PARQUET-1433
> Project: Parquet
>  Issue Type: Task
>  Components: parquet-format
>Reporter: Nandor Kollar
>Priority: Major
>  Labels: pull-request-available
>
> Compilation of parquet-format fails with Thrift 0.10.0:
> [ERROR] thrift failed error: [FAILURE:generation:1] Error: unknown
> option java:hashcode



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] Release Apache Parquet format 2.6.0 RC0

2018-10-01 Thread Nandor Kollar
Wes, I created a Jira for the failure with Thrift 0.10.0 problem:
PARQUET-1433 We should address it in the upcoming format release
On Sat, Sep 29, 2018 at 11:11 AM Wes McKinney  wrote:
>
> +1 (binding)
>
> * Checked checksums, signature
> * Ran unit tests
>
> Note that `mvn test` fails if Apache Thrift 0.10.0 or higher is
> installed. It looks like this is a problem with the Maven Thrift
> plugin and not a problem with parquet-format, but definitely a rough
> edge that will affect users
>
> [ERROR] thrift failed output:
> [WARNING:/home/wesm/Downloads/apache-parquet-format-2.6.0/src/main/thrift/parquet.thrift:295]
> The "byte" type is a compatibility alias for "i8". Use "i8" to
> emphasize the signedness of this type.
>
> [ERROR] thrift failed error: [FAILURE:generation:1] Error: unknown
> option java:hashcode
>
> - Wes
> On Fri, Sep 28, 2018 at 2:52 AM Gabor Szadovszky
>  wrote:
> >
> > +1 (non-binding)
> >
> > - Checked source tarball content
> > - Checked checksums, signature
> >
> > Cheers,
> > Gabor
> >
> > On Thu, Sep 27, 2018 at 5:10 PM Zoltan Ivanfi 
> > wrote:
> >
> > > +1 (binding)
> > >
> > > - contents look good
> > > - units tests pass
> > > - checksums match
> > > - signature matches
> > >
> > > Thanks,
> > >
> > > Zoltan
> > >
> > > On Thu, Sep 27, 2018 at 5:02 PM Nandor Kollar 
> > >  > > >
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > I propose the following RC to be released as official Apache Parquet
> > > > Format 2.6.0 release.
> > > >
> > > > The commit id is df6132b94f273521a418a74442085fdd5a0aa009
> > > > * This corresponds to the tag: apache-parquet-format-2.6.0
> > > > *
> > > >
> > > https://github.com/apache/parquet-format/tree/df6132b94f273521a418a74442085fdd5a0aa009
> > > > *
> > > >
> > > https://gitbox.apache.org/repos/asf?p=parquet-format.git;a=commit;h=df6132b94f273521a418a74442085fdd5a0aa009
> > > >
> > > > The release tarball, signature, and checksums are here:
> > > > *
> > > >
> > > https://dist.apache.org/repos/dist/dev/parquet/apache-parquet-format-2.6.0-rc0
> > > >
> > > > You can find the KEYS file here:
> > > > * https://dist.apache.org/repos/dist/dev/parquet/KEYS
> > > >
> > > > Binary artifacts are staged in Nexus here:
> > > > *
> > > >
> > > https://repository.apache.org/content/groups/staging/org/apache/parquet/parquet-format/2.6.0
> > > >
> > > > This release includes following changes:
> > > >
> > > > PARQUET-1266 - LogicalTypes union in parquet-format doesn't include UUID
> > > > PARQUET-1290 - Clarify maximum run lengths for RLE encoding
> > > > PARQUET-1387 - Nanosecond precision time and timestamp - parquet-format
> > > > PARQUET-1400 - Deprecate parquet-mr related code in parquet-format
> > > > PARQUET-1429 - Turn off DocLint on parquet-format
> > > >
> > > > Please download, verify, and test.
> > > >
> > > > The voting will be open at least for 72 hour from now.
> > > >
> > > > [ ] +1 Release this as Apache Parquet Format 2.6.0
> > > > [ ] +0
> > > > [ ] -1 Do not release this because...
> > > >
> > > > Thanks,
> > > > Nandor
> > > >
> > >


[jira] [Created] (PARQUET-1433) Parquet-format doesn't compile with Thrift 0.10.0

2018-10-01 Thread Nandor Kollar (JIRA)
Nandor Kollar created PARQUET-1433:
--

 Summary: Parquet-format doesn't compile with Thrift 0.10.0
 Key: PARQUET-1433
 URL: https://issues.apache.org/jira/browse/PARQUET-1433
 Project: Parquet
  Issue Type: Task
  Components: parquet-format
Reporter: Nandor Kollar


Compilation of parquet-format fails with Thrift 0.10.0:

[ERROR] thrift failed error: [FAILURE:generation:1] Error: unknown
option java:hashcode



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1402) incorrect calculation column start offset for files created by parquet-mr 1.8.1

2018-10-01 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16633697#comment-16633697
 ] 

ASF GitHub Bot commented on PARQUET-1402:
-

wesm closed pull request #494: PARQUET-1402: [C++] parquet-mr writes 
dictionary_page_offset == 0 when it is (supposed?) …
URL: https://github.com/apache/parquet-cpp/pull/494
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/src/parquet/file_reader.cc b/src/parquet/file_reader.cc
index c5a0f342..499101a4 100644
--- a/src/parquet/file_reader.cc
+++ b/src/parquet/file_reader.cc
@@ -103,7 +103,8 @@ class SerializedRowGroup : public RowGroupReader::Contents {
 auto col = row_group_metadata_->ColumnChunk(i);
 
 int64_t col_start = col->data_page_offset();
-if (col->has_dictionary_page() && col_start > 
col->dictionary_page_offset()) {
+if (col->has_dictionary_page() &&
+col->dictionary_page_offset() > 0 && col_start > 
col->dictionary_page_offset()) {
   col_start = col->dictionary_page_offset();
 }
 


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> incorrect calculation column start offset for files created by parquet-mr 
> 1.8.1
> ---
>
> Key: PARQUET-1402
> URL: https://issues.apache.org/jira/browse/PARQUET-1402
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-cpp
>Reporter: Renat Valiullin
>Assignee: Renat Valiullin
>Priority: Major
>  Labels: pull-request-available
> Attachments: test.parquet
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> parquet-mr (at least version 1.8.1-fast-201712141648170019-ab0622b)
> writes to ColumnChunk's metadata dictionary_page_offset == 0 when it is 
> (supposed?) equal to data_page_offset.
> calculation of col_start in std::unique_ptr 
> GetColumnPageReader(int i)
> works incorrectly in this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1432) ACID support

2018-10-01 Thread Yuming Wang (JIRA)
Yuming Wang created PARQUET-1432:


 Summary: ACID support
 Key: PARQUET-1432
 URL: https://issues.apache.org/jira/browse/PARQUET-1432
 Project: Parquet
  Issue Type: New Feature
  Components: parquet-format, parquet-mr
Affects Versions: 1.10.1
Reporter: Yuming Wang


https://orc.apache.org/docs/acid.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)