Re: Next parquet sync

2017-02-02 Thread Zoltan Ivanfi
Hi, Is this Monday timeslot a one-time change or a regular one? Sadly this timeslot does not work for me and I would be sad if I had to miss all future syncs. Zoltan On Thu, Feb 2, 2017 at 5:36 PM Julien Le Dem wrote: > As Uwe mentioned everybody is welcome. > > The next

Timestamp type [was: parquet sync starting now]

2017-02-28 Thread Zoltan Ivanfi
009-05-14 12:00:00'); > >> > >> show timezone; > >> name | setting > >> --+ > >> timezone | US/Eastern > >> > >> select * from ts1; > >> t > >> - > >> 200

Re: Timestamp type [was: parquet sync starting now]

2017-02-28 Thread Zoltan Ivanfi
I just realized that my example with the DECIMAL type was not a particularly good one as unlike the timezone of a timestamp, the denominator of a DECIMAL is the same for all values of a single column. Zoltan On Tue, Feb 28, 2017 at 4:22 PM Zoltan Ivanfi <z...@cloudera.com> wrote:

Re: parquet sync starting now

2017-02-27 Thread Zoltan Ivanfi
Hi, Although the draft of SQL-92[1] does not explicitly state that the time zone offset has to be stored, the following excerpts strongly suggest that the time zone has to be stored with each individual value of TIMESTAMP WITH TIME ZONE: The length of a TIMESTAMP is 19 positions [...] The

Re: parquet sync starting now

2017-02-27 Thread Zoltan Ivanfi
of the time does not change because it is effectively maintained in UTC. Zoltan On Mon, Feb 27, 2017 at 7:34 PM Marcel Kornacker <marc...@gmail.com> wrote: > On Mon, Feb 27, 2017 at 8:47 AM, Zoltan Ivanfi <z...@cloudera.com> wrote: > > Hi, > > > > Although the draft

Re: Day of Sync-up

2017-02-27 Thread Zoltan Ivanfi
I'm busy on Mondays and Tuesdays, the rest of the week is fine by me. Zoltan On Mon, Feb 27, 2017 at 8:28 AM Uwe L. Korn wrote: > All weekdays except Friday work for me. > > -- > Uwe L. Korn > uw...@xhochy.com > > On Sun, Feb 26, 2017, at 12:49 AM, Deepak Majeti wrote: >

What happened to parquet-tools?

2016-11-03 Thread Zoltan Ivanfi
Dear Parquet Experts, I tried running some parquet-tools commands using version 1.9.1 and version 1.6.1 and was quite surprised to see that the newer parquet-tools doesn't work for me at all: $ java -jar parquet-tools-*1.6.1*-SNAPSHOT.jar schema /tmp/test.par message hive_schema { optional

What are summary files?

2017-07-27 Thread Zoltan Ivanfi
Hi, I came across some references to so-called "summary files" in ParquetFileReader.java . I wanted to find out what they are, but could hardly find any information on

Re: What are summary files?

2017-08-01 Thread Zoltan Ivanfi
ummary file can provide > the schema for a group of files. This can also get out of date and cause > problems. The solution is to use a metastore to maintain the canonical > schema for a table. > > rb > > On Thu, Jul 27, 2017 at 7:03 AM, Zoltan Ivanfi <z...@cloudera.com>

Re: merging several parquet into one

2017-10-13 Thread Zoltan Ivanfi
Hi, What is your motivation for merging the files? My guess is that you want to achieve good performance and historically that has been associated with large Parquet files. However, merging Parquet files by placing the same row groups one after the other won't really improve the performance of

Re: Pros/cons of setting parquet.writer.version=v2

2017-11-15 Thread Zoltan Ivanfi
Hi, In my opinion, compatibility is the main thing to consider here. Some applications (Impala being a notable example) only support v1 at the moment. You should carefully consider what applications you might want to use in the future to process the data and check whether they all support v2.

Re: PARQUET-1025: Support new min-max statistics in parquet-mr

2017-11-14 Thread Zoltan Ivanfi
writing Parquet files. That's what we're working on > > adding to the data source v2 API in Spark. Tables should be able to > specify > > the expected clustering (for partitioning) and sort order for rows, then > > the query plan is automatically rewritten to make that happe

JIRA thinks parquet-format-2.4.0 is not released yet

2017-11-13 Thread Zoltan Ivanfi
Hi, When resolving JIRAs using the dev/merge_parquet_pr.py script, one has to specify a fix version, otherwise the script will use format-2.4.0, which is already released. There is no good choice for the fix version as format-2.4.0 is the largest version number JIRA knows about (and in fact it

Re: JIRA thinks parquet-format-2.4.0 is not released yet

2017-11-13 Thread Zoltan Ivanfi
d take care of this through the web UI. > > rb > > On Mon, Nov 13, 2017 at 5:04 AM, Zoltan Ivanfi <z...@cloudera.com> wrote: > > > Hi, > > > > When resolving JIRAs using the dev/merge_parquet_pr.py script, one has to > > specify a fix version, oth

Re: PARQUET-1025: Support new min-max statistics in parquet-mr

2017-11-14 Thread Zoltan Ivanfi
issues.apache.org/jira/browse/PARQUET-686>> > >> What still stands against my proposal (except that it does not support > the > >> proper comparison for UINT values) is that it breaks the backward > >> compatibility: Binary.compareTo(Binary) works in different

Are whitespace changes in unrelated lines acceptable?

2017-11-06 Thread Zoltan Ivanfi
Hi, The last three pull requests I reviewed included removal of trailing spaces in unrelated lines of the affected files. I was wondering whether we are fine with such changes. These trailing spaces should not have been committed in the first place, but removing them later adds unnecessary

Re: PARQUET-1025: Support new min-max statistics in parquet-mr

2017-11-08 Thread Zoltan Ivanfi
Hi, I don't know the solution, just adding my thoughts. In my opinion the underlying problem is that min-max ranges have to be small for the best filtering results. In order to achieve this, the data has to be sorted, but calculating min-max statistics is the responsibility of the library, while

Re: Parquet Sync timing

2017-12-04 Thread Zoltan Ivanfi
Hi, I would suggest voting for the weekdays and choosing the best one or choosing the two best ones and alternating between them. We could repeat this process every 3-6 months. I created a poll for this purpose, please vote here: https://goo.gl/forms/Pr8U1wsRmpEZhdHy1 Thanks, Zoltan On Fri,

Re: Rowgroup to hdfs block mapping / data locality

2017-12-07 Thread Zoltan Ivanfi
Hi Eric, The row group size is supposed to be an upper bound, but occasionally may be exceeded, because the checks for reaching the row group size only happen every once in a while. Based on the first few records the code makes an estimation for how much uncompressed data will result in the

Re: [VOTE] Release Apache Parquet MR 1.8.3 RC0

2018-05-09 Thread Zoltan Ivanfi
nd there should not be a md5 checksum: > http://www.apache.org/dev/release-distribution#sigs-and-sums > > Could you guys create a sha512 file and delete the other two checksums? > That would change my vote to a +1. > > rb > > On Tue, May 8, 2018 at 7:26 AM, Zoltan Ivanf

Re: [VOTE] Release Apache Parquet MR 1.8.3 RC0

2018-05-10 Thread Zoltan Ivanfi
oudera.com>> > wrote: > > > > > Created PARQUET-1294 < > https://issues.apache.org/jira/browse/PARQUET-1294 < > https://issues.apache.org/jira/browse/PARQUET-1294>> to > > > track it. > > > > > > Gabor > > > > > >

Re: [VOTE] Release Apache Parquet MR 1.8.3 RC0

2018-05-08 Thread Zoltan Ivanfi
+1 (binding) built and tested verified signature I agree with Uwe that a verification script would be useful. Zoltan On Mon, May 7, 2018 at 5:37 PM Uwe L. Korn wrote: > +1 (binding) > > * Built and tested on Debian 8 > * verified sha1 > * verified signature > > was quite a

Re: parquet-mr exposing public API

2018-05-15 Thread Zoltan Ivanfi
Hi, I think we should formalize this shortly for discussion. Based on your approach, I would suggest the following rules: _New methods/classes for internal use:_ 1. When possible, new internal methods/classes must have private or package private visibility. 2. When #1 is not possible, new

Feature branch for column indexes

2018-05-15 Thread Zoltan Ivanfi
Hi, We would like to create a feature branch for column indexes. Related commits would need to be reviewed by the community just the same, but would be merged to the feature branch instead of master. Once the feature works end-to-end, we would merge it into master in a single merge commit (also

Re: Feature branch for column indexes

2018-05-17 Thread Zoltan Ivanfi
Hi Uwe, Thanks, I just created the feature branch a few hours ago and Travis works indeed. The feature branch is in the apache repo as we would like the PR-s to be public and open to any committers and contributors so that we won't have to post the changes for review again when we would like to

More breaking changes in the Java API and how to deal with them

2018-05-18 Thread Zoltan Ivanfi
Hi, In recent weeks several breaking changes have been discovered in minor Parquet releases: * https://issues.apache.org/jira/browse/PARQUET-1295 * https://issues.apache.org/jira/browse/PARQUET-1304 * https://issues.apache.org/jira/browse/PARQUET-1305 Parquet uses semantic versioning. As a

Re: Move Dremel paper to parquet-format

2018-05-29 Thread Zoltan Ivanfi
Hi, Taking a step back, are we satisfied with the current web page mechanism? I find its dependence on subversion a real pain (checking it out, making patches for review, and the reviews themselves are a lot more complicated than with github). I think that's one of the main reasons it's so

Re: Too verbose GitHub bot comments

2018-06-05 Thread Zoltan Ivanfi
ged this in the Arrow project to be saved as Work Log > instead of comments in JIRA. Just open a ticket with INFRA, they can switch > this (+1 for this from me). > > On Wed, Apr 25, 2018, at 5:02 PM, Ryan Blue wrote: > > +1. I'd rather not have them. > > > > On Wed, Apr 2

Re: Recommended page size controversy

2018-01-10 Thread Zoltan Ivanfi
aries, but consume more memory > and may be less cache-efficient. I guess if you have 8KB pages you can fit > several pages in the L1 cache of a typical Intel processor (64kb), which > may help with performance. > > I'd be interested to know how the parquet-mr value was arrived at too. >

Re: Next parquet sync

2018-01-04 Thread Zoltan Ivanfi
Hi, According to the latest results of the availability poll, Tuesdays seems to work for slightly more people than Wednesdays. I'll try to post the chart below, let's see whether the mailing list allows it or removes it: [image: pasted1] I would suggest to either use Tuesdays or alternate

Recommended page size controversy

2018-01-08 Thread Zoltan Ivanfi
Hi, I noticed the following note regarding page sizes in the Parquet Format documentation : "We recommend 8KB for page sizes." In the Java implementation

Re: Recommended rowgroup size, and number of row groups for large table

2018-01-15 Thread Zoltan Ivanfi
Hi, If you use HDFS, then the row group size should match the HDFS block size, otherwise data locality (thus performance) will suffer. Regarding page size, in general larger pages lead to smaller files. On the other hand, the page-level metadata may include min and max values that can be used

Re: parquet-mr build fail with jdk7; move to jdk8?

2018-01-19 Thread Zoltan Ivanfi
+1 for moving to Java 8 Zoltan On Fri, Jan 19, 2018 at 5:48 PM Ryan Blue wrote: > We should probably make sure we have agreement from the community on this > before we move forward; either through replies to this thread or in > discussion at the next sync. Thanks,

Re: How to determine whether a given Parquet file uses v1 or v2 pages?

2018-01-25 Thread Zoltan Ivanfi
we should just add an optional record count to v1 and > stop using v2 pages. > > rb > > On Wed, Jan 24, 2018 at 8:07 AM, Zoltan Ivanfi <z...@cloudera.com> wrote: > > > Hi, > > > > We were looking for information in parquet-tools's output that would tell > > us

Re: Moving parquet-mr and parquet-format to Apache's GitBox service

2018-01-25 Thread Zoltan Ivanfi
+1, thanks! On Wed, Jan 24, 2018 at 6:02 PM Lars Volker wrote: > +1 > > Thank you Uwe! > > On Jan 24, 2018 08:48, "Ryan Blue" wrote: > > > +1 > > > > On Wed, Jan 24, 2018 at 5:13 AM, Uwe L. Korn wrote: > > > > > Hello all, > > >

Breaking changes in parquet-format without a major version bump

2018-01-29 Thread Zoltan Ivanfi
es may not have been actually used for the specific file (for example, one can write a PARQUET_2_3_2 file without using the new compressions, in which case older readers will be able to consume it). What do you think of this problem/proposal? Thanks, Zoltan On Thu, Jan 25, 2018 at 5:25

Re: Date and time for next parquet sync

2018-01-29 Thread Zoltan Ivanfi
+1 for Tuesday, this week I can't attend on Wednesday. Zoltan On Mon, Jan 29, 2018 at 7:29 AM Lars Volker wrote: > I'm good with either day. Does anyone prefer Wednesday over Tuesday? > > On Tue, Jan 23, 2018 at 11:27 PM, Gabor Szadovszky < > gabor.szadovs...@cloudera.com>

Re: Breaking changes in parquet-format without a major version bump

2018-02-12 Thread Zoltan Ivanfi
r thoughts. Thanks, Zoltan On Mon, Jan 29, 2018 at 6:56 PM Zoltan Ivanfi <z...@cloudera.com> wrote: > Hi, > > I have noticed that the recent addition of new compressions to > parquet-format happened in a patch version (in Semantic Versioning > terminology). I think

Re: Inconsistent float/double sort order in spec and implementations can lead to incorrect results

2018-02-16 Thread Zoltan Ivanfi
handle similar > > issues around NaNs/infinity (or infinities, in the case of IEEE-754). > > > > Thanks, > > > > - LaszloG > > > > > > On Thu, Feb 15, 2018 at 5:10 PM, Zoltan Ivanfi <z...@cloudera.com> wrote: > > > > > Dea

Inconsistent float/double sort order in spec and implementations can lead to incorrect results

2018-02-15 Thread Zoltan Ivanfi
Dear Parquet and Impala Developers, We have exposed min/max statistics to extensive compatibility testing and found troubling inconsistencies regarding float and double values. Under certain (fortunately rather extreme) circumstances, this can lead to predicate pushdown incorrectly discarding row

parquet-mr review request

2018-02-21 Thread Zoltan Ivanfi
Dear All, Our users encountered a concerning bug in parquet-mr that causes partial statistics to trigger a NPE when using predicate push-down. We would like to solve this issue with urgency and Gabor already uploaded a fix in PR #458 . I reviewed and

Re: [VOTE] Release Apache Parquet C++ 1.4.0 RC0

2018-02-19 Thread Zoltan Ivanfi
Hi, I wonder whether the fix for PARQUET-1225 should be included in the next release, even if it causes a delay. Br, Zoltan On Sun, Feb 18, 2018 at 10:10 PM Uwe L. Korn wrote: > +1 (binding) > > verified on Ubuntu 16.04 >

Re: Inconsistent float/double sort order in spec and implementations can lead to incorrect results

2018-02-20 Thread Zoltan Ivanfi
that the > whole row group can be included. The addition of NaNs doesn't change > that. > > OTOH, if b <= a <= c, then we have to check the whole row group, and > the addition of NaNs doesn't change that. > > On Tue, Feb 20, 2018 at 9:14 AM, Alexander Behm <alex.b...@cloude

Re: Inconsistent float/double sort order in spec and implementations can lead to incorrect results

2018-02-19 Thread Zoltan Ivanfi
Hi, Tim, I added your suggestion to introduce a new ColumnOrder to PARQUET-1222 as the preferred solution. Alex, not writing min/max if there is a NaN is indeed a feasible quick-fix, but I think it would be better to just ignore NaN-s for the

Re: Parquet sync meeting minutes

2018-08-17 Thread Zoltan Ivanfi
Hi, Sorry, that was an error on my side, I suggested Nandor to add a TLDR section with this title. I agree with your comment, Wes, outcome would have been a better choice of word than decision. Br, Zoltan On Fri, Aug 17, 2018 at 6:36 PM Wes McKinney wrote: > hi Nandor, > > A fine detail, and

Re: Same Travis failure for several parquet-format PR

2018-07-23 Thread Zoltan Ivanfi
You forgot to mention that you already fixed it. :) Travis for parquet-format PRs should now pass (except if the PR itself brakes it). Zoltan On Mon, Jul 23, 2018 at 2:40 PM Nandor Kollar wrote: > Looks like this was a problem with Java/TLS version, see details in > PARQUET-1351

Re: JVM gets killed when loading parquet file

2018-09-10 Thread Zoltan Ivanfi
Hi Nicolas, Have you tried increasing the maximum Java heap size? https://stackoverflow.com/a/15517399/5613485 Br, Zoltan On Wed, Aug 29, 2018 at 8:39 PM Nicolas Troncoso wrote: > > I'm clearly not understanding something its a 960MB file. Even if it go > fully loaded into memory it should

Re: page and record boundaries

2018-07-12 Thread Zoltan Ivanfi
Hi Zoltan, If I remember correctly, this is what we had in mind in this question: Although page boundaries for v1 pages do not have to be record boundaries, nothing prevents us from implementing a writer that does align pages to record boundaries. (Of course, on the read path, we have to be able

Travis considers my one-line change unworthy of being tested

2018-01-24 Thread Zoltan Ivanfi
Hi, I noticed that Travis did not test my PR #436 , although it has tested all other PR-s before and after mine. I tried rebasing and force pushing my change to the PR branch, still no reaction from Travis. I uploaded a new PR

Re: Travis considers my one-line change unworthy of being tested

2018-01-24 Thread Zoltan Ivanfi
ast 24 hours or so, so it might be worth opening an INFRA > ticket > > On Wed, Jan 24, 2018 at 8:00 AM, Zoltan Ivanfi <z...@cloudera.com> wrote: > > Hi, > > > > I noticed that Travis did not test my PR #436 > > <https://github.com/apache/parquet-mr/pull/43

How to determine whether a given Parquet file uses v1 or v2 pages?

2018-01-24 Thread Zoltan Ivanfi
Hi, We were looking for information in parquet-tools's output that would tell us whether a given Parquet file uses v1 or v2 pages but haven't found any. We also checked parquet-cli and haven't found anything specifically for this purpose there either, but we noticed in the source code that the

Re: [VOTE] Release Apache Parquet Java 1.10.0 RC0

2018-04-06 Thread Zoltan Ivanfi
I would have preferred waiting for the parquet-format release (which unfortunately failed the vote due to lack of interest) before making an RC for parquet-mr so that it would refer to the latest format. On the other hand, all relevant updates of parquet-format are in the documentation only, so

Re: [VOTE] Release Apache Parquet Format 2.5.0 RC0

2018-04-18 Thread Zoltan Ivanfi
+1 (binding) Checked sigs, built and tested. On Tue, Apr 17, 2018 at 1:39 PM Gabor Szadovszky < gabor.szadovs...@cloudera.com> wrote: > Hi everyone, > > We reached the required 3 binding +1 votes. As there was no deadline > defined for this vote we’ll wait another 24 hours and close it

Re: Date and time for the next Parquet sync

2018-04-18 Thread Zoltan Ivanfi
+1, thanks Lars! On Wed, Apr 18, 2018 at 6:20 PM Lars Volker wrote: > Hi All, > > It has been 3 weeks since our last Parquet community sync and I think it > would be great to have one next week. Last time we met on a Wednesday, so > this time it should be Tuesday. > > I'd

Re: Too verbose GitHub bot comments

2018-04-25 Thread Zoltan Ivanfi
+1. Since the comment contains the full diff, it really clutters the JIRA history. If making it less verbose is not possible, I would vote for turning it off completely. Zoltan On Wed, Apr 25, 2018 at 4:08 PM Nandor Kollar wrote: > Hi All, > > While working on a Parquet

Re: Merging changes in the GitBox era

2018-03-26 Thread Zoltan Ivanfi
To answer my second question: Just tried "Squash and Merge" and I could edit the commit message. Looks good. Zoltan On Mon, Mar 26, 2018 at 2:26 PM Zoltan Ivanfi <z...@cloudera.com> wrote: > +1 and thanks for taking care of this. I would like to have 2 questions > thoug

Re: Merging changes in the GitBox era

2018-03-26 Thread Zoltan Ivanfi
ovided by the > > > dev/merge.py > > > >> > script. > > > >> > > > > >> > Since 2016 Github supports "Rebase and Merge" as a second option > when > > > >> > submitting PRs: > > > >> > > htt

Re: Merging changes in the GitBox era

2018-03-26 Thread Zoltan Ivanfi
"Rebase and Merge" does prevent the kind of > merge commit that dirties the git log so I figured we might keep it. I'm > open to disabling it, too, if you feel we should. > > On Mon, Mar 26, 2018 at 6:01 AM, Zoltan Ivanfi <z...@cloudera.com> wrote: > > > To answer my sec

Re: Merging changes in the GitBox era

2018-03-26 Thread Zoltan Ivanfi
s > that will dirty the commit log. If you have to select "Rebase and Merge" to > use it, I think it will probably be fine. And it is nice to have the > option. > > rb > > On Mon, Mar 26, 2018 at 9:26 AM, Zoltan Ivanfi <z...@cloudera.com> wrote: > > > I th

Merging changes in the GitBox era

2018-03-23 Thread Zoltan Ivanfi
Hi, Parquet was recently migrated to GitBox (thanks, Uwe!). The dev/merge.py script can be still used by setting PUSH_REMOTE_NAME=apache-github before invoking it. There is also a new option of merging changes using the GitHub web UI. I personally continued using the merge script, while others

Re: [VOTE] Modular Encryption design sign-off

2018-10-16 Thread Zoltan Ivanfi
+1 (binding) Cheers, Zoltan On Tue, Oct 16, 2018 at 10:11 AM Gidon Gershinsky wrote: > Hello Parquet developers, > > Per the last sync discussion, it is time to call for a vote on the Parquet > Modular Encryption design sign-off. The design doc can be found at the > encryption branch of the

Re: [VOTE] Modular Encryption design sign-off

2018-10-22 Thread Zoltan Ivanfi
are mentioned is after the > footer > > >and encryption metadata, but the diagram shows that the first bytes > > in the > > >file are updated as well. This is also only in the encrypted footer > > mode. > > >Should PARE magic byt

Date and time of the next Parquet sync

2018-10-30 Thread Zoltan Ivanfi
Hi, I have sent an invitation for the next Parquet Sync for next Tuesday (November 6) at 6 PM CET / 9 AM PT. The meeting is open to anybody interested in Parquet. If you have not received the invitation but would like to attend, please send me a mail in private and I will add you. Thanks,

Row group layout anomalies

2018-10-01 Thread Zoltan Ivanfi
Hi, PARQUET-1337 describes the problem of ending up with a drastically different (and worse) row group layout than intended under certain circumstances. A few weeks ago I started tweaking the logic that controls this in a test-driven fashion. I have found that fixing one problem repeatedly leads

Re: Parquet-cpp has already released unofficial thrift changes

2018-09-26 Thread Zoltan Ivanfi
change > was minor and done for a good reason. > I hope there won't be new disturbances from now on. > > Cheers, Gidon. > > > On Wed, Sep 26, 2018 at 3:10 PM Zoltan Ivanfi > wrote: > > > Hi, > > > > It seems that I spoke too early. I just not

Re: Parquet-cpp has already released unofficial thrift changes

2018-09-26 Thread Zoltan Ivanfi
, Zoltan On Wed, Sep 26, 2018 at 1:41 PM Zoltan Ivanfi wrote: > Hi, > > If the encryption code release in parquet-cpp is unused at this moment > then I think we are fine. It means that we are still free to decide any way > about the data structures without the risk of incom

Re: Parquet-cpp has already released unofficial thrift changes

2018-09-26 Thread Zoltan Ivanfi
ersion of the Parquet format, so let's do that ASAP. > > On Tue, Sep 25, 2018 at 1:30 PM Gidon Gershinsky > wrote: > > > > > > Yep! (sent in parallel :) > > > > > > On Tue, Sep 25, 2018 at 8:19 PM Zoltan Ivanfi > > > > wrote: > > > >

Re: Parquet-cpp has already released unofficial thrift changes

2018-09-26 Thread Zoltan Ivanfi
feature are unlikely to change. I solely suggest these measures to remedy the situation of parquet.thrift having been released in parquet-cpp before officially getting released in parquet-format. Thanks, Zoltan On Wed, Sep 26, 2018 at 2:29 PM Zoltan Ivanfi wrote: > Hi, > > I think i

Re: Parquet-cpp has already released unofficial thrift changes

2018-09-26 Thread Zoltan Ivanfi
to worry about breaking changes that would make existing data files unreadable. Based on this, I would [once again :)] suggest moving forward as planned. Thanks, Zoltan On Wed, Sep 26, 2018 at 4:56 PM Zoltan Ivanfi wrote: > Hi, > > Please let me know your opinions as well. So far all

Re: parquet sync notes

2018-09-26 Thread Zoltan Ivanfi
Hi, It seems to me that PR #99 does not supersede PR #62, as the latter affects 16 files but the former only modifies a single one. Or has the rest of the changes been already merged to the codebase from another PR? I checked the history and I don't see anything related. Thanks, Zoltan On Wed,

Re: [VOTE] Release Apache Parquet format 2.6.0 RC0

2018-09-27 Thread Zoltan Ivanfi
+1 (binding) - contents look good - units tests pass - checksums match - signature matches Thanks, Zoltan On Thu, Sep 27, 2018 at 5:02 PM Nandor Kollar wrote: > Hi everyone, > > I propose the following RC to be released as official Apache Parquet > Format 2.6.0 release. > > The commit id is

Re: parquet sync notes

2018-09-27 Thread Zoltan Ivanfi
r document file later. > > Zoltan Ivanfi 于2018年9月26日周三 下午3:19写道: > > > Hi, > > > > It seems to me that PR #99 does not supersede PR #62, as the latter > affects > > 16 files but the former only modifies a single one. Or has the rest of > the > > changes be

Re: Parquet-cpp has already released unofficial thrift changes

2018-09-25 Thread Zoltan Ivanfi
Hi, As a short update, I just checked the PR for PARQUET-1419 and although in its current form it is a breaking change, it can be easily rewritten to become backwards-compatible so this part of the problem does not apply any more. Br, Zoltan On Tue, Sep 25, 2018 at 7:10 PM Zoltan Ivanfi wrote

Parquet-cpp has already released unofficial thrift changes

2018-09-25 Thread Zoltan Ivanfi
Hi, On the Parquet sync we discussed that the practice of maintaining a copy of parquet.thrift in parquet-cpp is dangerous and that we must take care to not release parquet-format changes in parquet-cpp before we officially release them in parquet-format. As I got back to my computer and started

Re: [VOTE] Release Apache Parquet 1.11.0 RC2

2019-01-03 Thread Zoltan Ivanfi
able to vote, but probably not until after the holidays. I > > presume many others got very busy with end of year and are in the same > > boat. > > > > - Wes > > > > On Mon, Dec 17, 2018 at 6:07 AM Zoltan Ivanfi > > wrote: > > > > > > Hi, > &g

Re: [VOTE] Release Apache Parquet 1.11.0 RC3

2019-01-17 Thread Zoltan Ivanfi
Hi, Friendly reminder to please vote for the release. We need 2 more binding +1 votes. Thanks, Zoltan On Sat, Jan 12, 2019 at 3:07 AM 俊杰陈 wrote: > +1 (non-binding) > * contents looks good > * unit tests passed > > > Zoltan Ivanfi 于2019年1月11日周五 下午9:31写道:

Adding more timestamp types to on-disk storage formats

2019-01-17 Thread Zoltan Ivanfi
Hi, There is an ongoing effort amongst the SQL engines of the Hadoop stack to support different timestamp semantics. This development has some implications for the low-level timestamp types as well. The new timestamp types added to the different SQL engines will rely on the decisions of the lower

Re: [Discussion] How to build bloom filter in parquet

2019-01-17 Thread Zoltan Ivanfi
Hi, I like the idea of specifying the maximum acceptable size of the bloom filter bit vector. I think it would be much better than specifying the expected number of distinct values (which we can not expect from the API consumer in my opinion). The desired false positives probability could still

Re: [VOTE] Release Apache Parquet 1.11.0 RC2

2018-12-17 Thread Zoltan Ivanfi
Hi, I would like to ask/encourage PMC members to vote on the release candidate. We have one binding +1 so far and need 2 more for the vote to pass. Thanks, Zoltan On Fri, Dec 14, 2018 at 2:10 PM Zoltan Ivanfi wrote: > +1 (binding) > > - contents look good > - unit tests pass &

Re: [VOTE] Release Apache Parquet 1.11.0 RC2

2018-12-14 Thread Zoltan Ivanfi
> > +1 (non-binding) > > On Thu, Dec 13, 2018 at 9:17 PM Zoltan Ivanfi > wrote: > > > Dear Parquet Users and Developers, > > > > I propose the following RC to be released as the official Apache > > Parquet 1.11.0 release: > > > > The commit id

Re: [VOTE] Modular Encryption design sign-off

2018-12-15 Thread Zoltan Ivanfi
+1 (binding) Thanks for your efforts, Gidon! On Fri, Dec 14, 2018 at 6:55 PM Ryan Blue wrote: > +1 (binding) > > Thanks for all your work on this, Gidon! > > On Fri, Dec 14, 2018 at 9:48 AM Gidon Gershinsky wrote: > > > Hello Parquet developers, > > > > After a productive round of

Summary of Column Index Testing Efforts

2018-12-13 Thread Zoltan Ivanfi
Hi, In the last Parquet sync we have been asked to share details of the testing we had done to validate the correctness of column indexes. We were also asked to validate the column index structures of a random-generated data set against the contracts. Please see my summary of our efforts below:

Re: [VOTE] Release Apache Parquet 1.11.0 RC1

2018-11-30 Thread Zoltan Ivanfi
r > On Mon, Nov 26, 2018 at 8:34 AM Gabor Szadovszky wrote: > > > > Checked source tarball content and the related checksum/signature. All > are > > correct. Unit tests pass. > > +1 (non-binding) > > > > Cheers, > > Gabor > > > > On Fri,

[VOTE] Release Apache Parquet 1.11.0 RC1

2018-11-23 Thread Zoltan Ivanfi
Dear Parquet Users and Developers, I propose the following RC to be released as the official Apache Parquet 1.11.0 release: The commit id is 10e63437cd69a8bb98edf30f5b299ab9b1a98fe7 * This corresponds to the tag: apache-parquet-1.11.0 *

Re: [VOTE] Release Apache Parquet 1.11.0 RC0

2018-11-22 Thread Zoltan Ivanfi
ng) > > > > Cheers, > > Gabor > > > > On Wed, Nov 21, 2018 at 7:11 PM Zoltan Ivanfi > > wrote: > > > > > Dear Parquet Users and Developers, > > > > > > I propose the following RC to be released as the official Apache > >

[VOTE] Release Apache Parquet 1.11.0 RC0

2018-11-21 Thread Zoltan Ivanfi
Dear Parquet Users and Developers, I propose the following RC to be released as the official Apache Parquet 1.11.0 release: The commit id is b873a0ab31da570bb615ab2253cf90a2f451b0e4 * This corresponds to the tag: apache-parquet-1.11.0 *

Re: Proposal for new LogicalType: QuantizedFloat

2018-11-20 Thread Zoltan Ivanfi
Hi, If we introduced such a type, personally I would prefer restricting its range to regular numbers. I would leave -0, ±inf and the various NaNs to the real float and double types. NULL will always be a possiblity of course, which already provides some flexibility. Br, Zoltan On Tue, Nov 20,

Date and time of the next Parquet sync: December 5 at 9AM PT (6PM CET)

2018-11-27 Thread Zoltan Ivanfi
Hi, I have sent an invitation for the next Parquet Sync for next Wednesday (December 5) at 9AM PT (6PM CET). The meeting is open to anybody interested in Parquet. If you have not received the invitation but would like to attend, please send me a mail in private and I will add you. Thanks,

Re: Deploy parquet-format snapshot to maven repo

2019-01-10 Thread Zoltan Ivanfi
ne' and' mvn install' to > before_install section in .travis.yml the build move forward to build. Also > I agree with you about never depend on SNAPSHOT jar. > > > Zoltan Ivanfi 于2019年1月7日周一 下午10:38写道: >> >> Hi Junjie, >> >> There seems to be some pr

Re: [VOTE] Release Apache Parquet 1.11.0 RC3

2019-01-11 Thread Zoltan Ivanfi
> > +1 (non-binding) > > Cheers, > Gabor > > On Wed, Jan 9, 2019 at 4:51 PM Zoltan Ivanfi > wrote: > > > Dear Parquet Users and Developers, > > > > I propose the following RC to be released as the official Apache > > Parquet 1.11.0 release: &g

[VOTE] Release Apache Parquet 1.11.0 RC3

2019-01-09 Thread Zoltan Ivanfi
Dear Parquet Users and Developers, I propose the following RC to be released as the official Apache Parquet 1.11.0 release: The commit id is 8be767d12cca295cf9858a521725fc440b0c6f93 * This corresponds to the tag: apache-parquet-1.11.0 *

Re: Old readers & encrypted files

2018-09-18 Thread Zoltan Ivanfi
Hi, Just to clarify: PF~ allows older readers to read data as long as they only try to access unencrypted columns. What happens when older readers do try to access encrypted columns? Also, by older readers do you specificially mean the current Java library or all existing language bindings?

Re: Date and time for next Parquet sync

2018-09-18 Thread Zoltan Ivanfi
Hi, It seems that I won't be able to attend after all, sorry for the late decline. Zoltan On Mon, Sep 10, 2018 at 7:21 PM Ryan Blue wrote: > Sorry, looks like I was wrong on the dates. Thanks, Nandor. > > On Mon, Sep 10, 2018 at 5:15 AM Nandor Kollar > wrote: > > > Ryan, I was aware of

***UNCHECKED*** Re: Old readers & encrypted files

2018-09-19 Thread Zoltan Ivanfi
. > > On Tue, Sep 18, 2018 at 5:07 PM Zoltan Ivanfi > wrote: > > > Hi, > > > > I'm a little bit worried that the misleading error message could lead to > > serious confusion. For this reason, I would slighlty prefer a truly > > incompatible format

Re: Old readers & encrypted files

2018-09-18 Thread Zoltan Ivanfi
laintext > columns in PF~ files. > > Cheers, Gidon. > > > On Tue, Sep 18, 2018 at 11:19 AM Zoltan Ivanfi > wrote: > > > Hi, > > > > Just to clarify: PF~ allows older readers to read data as long as they > only > > try to access unencrypted colu

Re: [VOTE] Release Apache Parquet 1.11.0 RC6

2019-04-03 Thread Zoltan Ivanfi
orking on setting up my m2 settings to be able to read from > there, but this is something that really needs to be documented. > > Once I figure it out, I will create a JIRA + PR to update the README. > > Thanks. > > On 4/3/19, 8:55 AM, "Zoltan Ivanfi&q

Re: [VOTE] Release Apache Parquet 1.11.0 RC6

2019-04-03 Thread Zoltan Ivanfi
to this jar in maven, so would appreciate some guidance. > > > > Thanks, > > > > Andy, > > > > > > On 3/21/19, 3:40 PM, "Zoltan Ivanfi" wrote: > > > > CAUTION – UNVERIFIED EXTERNAL EMAIL > > > > > > Hi Wes

Re: [VOTE] Release Apache Parquet 1.11.0 RC6

2019-03-20 Thread Zoltan Ivanfi
+1 (binding) signature matches git hash matches the git tag source tarball matches the git tag unit tests and integration tests pass On Tue, Mar 19, 2019 at 3:00 PM Gabor Szadovszky wrote: > Dear Parquet Users and Developers, > > I propose the following RC to be released as the official Apache

Re: [VOTE] Release Apache Parquet 1.11.0 RC6

2019-03-21 Thread Zoltan Ivanfi
the end (followed by a few extra lines). Br, Zoltan On Thu, Mar 21, 2019 at 7:58 PM Wes McKinney wrote: > Are there any instructions written down about how to verify this release? > > On Wed, Mar 20, 2019 at 8:50 AM Zoltan Ivanfi > wrote: > > > > +1 (binding) &

Re: [VOTE] Release Apache Parquet 1.11.0 RC6

2019-03-21 Thread Zoltan Ivanfi
PS: Oh, and Java 11 is not supported, only Java 8. You also need to have mvn installed. Zoltan On Thu, Mar 21, 2019 at 10:40 PM Zoltan Ivanfi wrote: > Hi Wes, > > Here is a list of steps (the first part is probably the same as for > parquet-cpp): > > 1. Download the

  1   2   3   4   >