Description:
Improper Input Validation vulnerability in Parquet-MR of Apache Parquet allows
an attacker to DoS by malicious Parquet files. This issue affects Apache
Parquet-MR version 1.9.0 and later versions.
This issue is being tracked as PARQUET-2094
Mitigation:
1.12.x users should
+1
About the naming. We already use INT_8, INT_16 etc. for logical types for
integer values. What do you think about FLOAT_16 to be consistent?
Cheers,
Gabor
On 2023/10/05 22:17:13 Ryan Blue wrote:
> +1
>
> I'm all for adding a 2-byte floating point representation since even 4-byte
> floats
+1 (binding)
Cheers,
Gabor
On 2023/11/07 02:46:37 Xinli shang wrote:
> +1 (binding)
>
> On Mon, Nov 6, 2023 at 4:56 PM Gang Wu wrote:
>
> > +1 (non-binding)
> >
> > Best,
> > Gang
> >
> > On Tue, Nov 7, 2023 at 3:57 AM Ed Seidl wrote:
> >
> > > +1 (non-binding)
> > >
> > > Thanks!
> > > Ed
>
Sorry for the late response.
Verified checksum and signature, diffed tarball and repo content, build/unit
tests pass.
I'm a bit confused of our current branching, though. We have one for
parquet-1.12.x. Every 1.12 release should be built/tagged there. Meanwhile we
have a separate branch for
l new commits after v1.12.3 release are in the master branch.
>- I did check that commits in the v1.12.2 are included in the v1.12.3
>release (as well as the master branch)
>
> So I think we are good.
>
> Best,
> Gang
>
> On Fri, Mar 31, 2023 at 9:49 PM Gábor
Verified checksum and signature, diffed tarball and repo content, build/unit
tests pass.
+1 (binding) for releasing this content as 1.13.0
NOTE: It is completely fine or even a good practice to release the first minor
release from its separate branch (instead of master). Do not forget to merge
gt; it takes to cherry-pick them to the 1.12.x branch. I would prefer the
> option one.
>
> WDYT?
>
> Best,
> Gang
>
>
> On Fri, Mar 31, 2023 at 11:17 PM Gábor Szádovszky wrote:
>
> > I think we are about the release under the wrong number then. We s
to release a 1.12.4 version until we have received sufficient
> feedback and requests from users.
>
>
> On Sat, Apr 1, 2023 at 2:39 PM Gábor Szádovszky wrote:
>
> > In the past we did not backport every bugfix for previous branches only
> > the serious ones that have no workarou
Thanks a lot for volunteering, Gang!
However it is more than 2 years indeed since the last release I think the
actual changes since then are more important. There are lots of
additions/corrections in the spec docs and the thrift file comments which are
very important but not tightly attached
Thanks for bringing this up, Fokko.
Unfortunately, I won't be able to join next week. (Hopefully I will be
there at the one after.)
So, let me write my thoughts here.
I agree it is time to start preparing the next parquet-mr release. I have
some thoughts:
- We should check that parquet-mr
Hey Gang, Kaili,
I think the easiest way to solve this issue is to completely remove the
spec from the site and add a reference to the parquet-format repo instead.
We should probably add the release tag links when we make a release of
parquet-format with a "latest" link. This way we would also
Thanks a lot Gang, for dealing with the release!
Checked checksum and signature; content of the tarball looks good; unit tests
pass
+1 (binding)
Cheers,
Gabor
On 2023/11/19 16:37:51 Gidon Gershinsky wrote:
> +1 (binding).
>
> Thanks Gang.
>
> Cheers, Gidon
>
>
> On Fri, Nov 17, 2023 at
Hi Claire,
I think you read it correctly. Your proposal sounds good to me but you need
to make it a separate way of reading instead of rewriting the current
behavior. The current implementation figures out the consecutive parts in
the file (multiple pages or even column chunks written after each
for how many pages, or page bytes, to buffer at a time, so that users can
> balance IO speed with memory usage. I'll try out a few approaches and aim
> to update this thread when I have something.
>
> Best,
> Claire
>
>
>
> On Tue, Mar 5, 2024 at 2:55 AM Gábor Szádovszky
Thank you, Bryce, for working on this!
Let me forward this to the private channel as well.
@Xinli, @Julien, do you have access to the twitter account to spread this?
Bryce Mecum ezt írta (időpont: 2024. márc. 5., K,
20:38):
> Hi all, the Parquet format now has an official IANA media type:
>
+1 (binding) - Not sure if "binding" matters for this case
Thanks, Antoine, for working on this!
Antoine Pitrou ezt írta (időpont: 2024. márc. 7., Cs,
14:18):
>
> Hello,
>
> As discussed previously on this ML [1], I am proposing to expand
> the types supported by the BYTE_STREAM_SPLIT encoding.
There is a big difference between the repos of Arrow, Avro, Iceberg etc.
and Parquet. The mentioned projects have everything in one repo including
the different language bindings etc. so it is natural to have the specs
there as well and having universal releases.
Meanwhile Parquet has different
ween Parquet files written thru V2 or V1 , no
> one in the community has a clear idea about this which is a bit
> astonishing .
>
> if any one is aware , it will be highly appreciated.
>
>
>
> On Thu, Apr 25, 2024 at 10:32 AM Gábor Szádovszky
> wrote:
>
> > I am n
I am not sure what "Parquet community V2 is not final yet" means. We are
now at parquet-format 2.10.0. The current parquet-mr supports most (if not
all) of its features. I agree the current mechanism in parquet-mr of
setting the writer version PARQUET_1_0 and PARQUET_2_0 is not
clear/misleading.
y Spark +
> Dremio).
>
>
> In the last Parquet meeting, I brought up discussing / planning for a
> parquet-mr 2.0 release which I think should at least establish a parquet-mr
> release as the "formal implementation" of the standard (even if it's mostly
> a vanity
Hey,
I don't think we should call Parquet v2.x features unstable. Since they
were released officially, we maintain backward compatibility. So, from
Parquet format point of view, these features are stable.
It is another question whether a Parquet implementation supports all of
these features or
Sorry, I was not able to attend the meeting. Let me put some notes here:
2. We have been fighting with compatibility issues for a while now. That's
why we introduced japicmp. I can see many exclusions in the master pom. I
think we should investigate if these exclusions cause any issues before the
Hi Gang,
Thank you for taking care of the release!
Unfortunately, the .asc check fails for me even after importing the KEYS
file. Could you double check if you signed it with the correct key?
No other issues were discovered, so no RC1 is required for now if you can
change the .asc file for the
nse to add my new key to the KEYS file instead?
>
> Best,
> Gang
>
> On Tue, Apr 30, 2024 at 3:11 PM Gábor Szádovszky wrote:
>
> > Hi Gang,
> >
> > Thank you for taking care of the release!
> >
> > Unfortunately, the .asc check fails for me even af
lease/parquet/KEYS
>
> On Tue, Apr 30, 2024 at 3:45 PM Gábor Szádovszky wrote:
>
> > Sure, please add your new public key to the referenced KEYS file then we
> > should be good. (The previous one would still be required to check the
> > previous releases, so do not remove
Thanks Fokko, Gang for working on this.
I have some findings:
* nit correction in the original mail: tag is apache-parquet-1.14.0-rc1
(not apache-parquet-1.4.0-rc1)
* The CHANGES.md should have been updated with the one fix you've mentioned
(PARQUET-2465)
Since I've never used CHANGES.md to
Thanks a lot Weston for bringing this up.
Last time we discussed a potential java upgrade, Hadoop was the one not
allowing us to do so. Hadoop is still on java 8.
If we want to keep Arrow on the latest version, we will need to upgrade to
java 11. In this case we won't be able to support Hadoop
Hi Antoine,
One quick note about this. Parquet min/max statistics need a total ordering
for each logical type. Without that we either use some default based on the
primitive type (that might not be suitable for the related extension type)
or we won't store min/max statistics for the related
28 matches
Mail list logo