e in if you have a strong
> > > preference for HALF or HALF_FLOAT over FLOAT16!
> > >
> > >
> > > This vote will be open for at least 72 hours.
> > >
> > > [ ] +1 Add this type to the format specification
> > > [ ] +0
> > > [ ] -1 Do not add this type to the format specification because...
> > >
> > > Thanks!
> > >
> > > Ben
> > >
> > > [1]:
> https://en.wikipedia.org/wiki/Half-precision_floating-point_format
> > >
> > >
> >
> >
> >
> >
>
> --
> Xinli Shang
>
--
Ryan Blue
Tabular
> > > This release includes important changes:
> > >
> > > * https://github.com/apache/parquet-mr/commits/parquet-1.13.x
> > >
> > >
> > > Handy commands for verifying the release:
> > >
> > > *
> > >
> >
> https://iceberg.apache.org/how-to-release/#validating-a-source-release-candidate
> > >
> > > Replace Iceberg with Parquet :)
> > >
> > >
> > > Please download, verify, and test.
> > >
> > >
> > > Please vote in the next 72 hours.
> > >
> > >
> > > [ ] +1 Release this as Apache Parquet 1.13.1
> > >
> > > [ ] +0
> > >
> > > [ ] -1 Do not release this because...
> > >
> >
>
--
Ryan Blue
Tabular
[
https://issues.apache.org/jira/browse/PARQUET-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276548#comment-17276548
]
Ryan Blue commented on PARQUET-1968:
Thank you! I'm not sure why it was no longer on my calendar. I
[
https://issues.apache.org/jira/browse/PARQUET-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276526#comment-17276526
]
Ryan Blue commented on PARQUET-1968:
I would really like to see a new Parquet API that can support
to discuss for the next community meeting.
> 3.
>
>Parquet 1.12.0
>
> a. Will cut RC release soon
>
> Please let me know if you have any questions.
>
> Xinli Shang | Tech Lead Manager @ Uber Data Infra
>
>
> --
> Xinli Shang
>
--
Ryan Blue
Software Engineer
Netflix
> >>> decoding typically dominates total read processing time, on average
> > I've
> > >>> seen 5-10x per cell cpu cost increase for variable reads over scalar
> > >>> reads). AFAIK, there is still no option for that in V1.
> > >>>
> > &
use them!
On Thu, Oct 8, 2020 at 12:44 PM Micah Kornfield
wrote:
> What is the current status of support for Data Page V2? Is it recommended
> for production workloads?
>
> Thanks,
> Micah
>
--
Ryan Blue
Software Engineer
Netflix
ad of the
> small tasks.
>
> I actually would like to have a design that would do the "fall-back" using
> the driver side pruning and uniform split planning for any footers missing
> from the summary file, but I thought that might add extra complexity to the
> discussion.
filename, the summary would
> need to contain file length info.
>
> There is also the possibility that parquet files could be deleted and
> rewritten in the same filenames, but this isn't common in any hadoop/spark
> ecosystem projects I know of, they all generate unique filenames
://github.com/apache/parquet-mr/pull/429
>> > > >
>> > > > There are other members of the broader parquet community that are
>> also
>> > > > confused by this deprecation, see this discussion in an arrow PR.
>> > > > https://github.com/apache/arrow/pull/4166
>> > > >
>> > > > In the course of making my small prototype I got an extra
>> performance
>> > > > boost by making spark write out metadata summary files, rather than
>> > > having
>> > > > to read all footers on the driver. This effect would be even more
>> > > > pronounced on a completely remote storage system like S3. Writing
>> these
>> > > > summary files was disabled by default in SPARK-15719, because of the
>> > > > performance impact of appending a small number of new files to an
>> > > existing
>> > > > dataset with many files.
>> > > >
>> > > > https://issues.apache.org/jira/browse/SPARK-15719
>> > > >
>> > > > This spark JIRA does make decent points considering how spark
>> operates
>> > > > today, but I think that there is a performance optimization
>> opportunity
>> > > > that is missed because the row group pruning is deferred to a bunch
>> of
>> > > > separate short lived tasks rather than done upfront, currently spark
>> > only
>> > > > uses footers on the driver for schema merging.
>> > > >
>> > > > Thanks for the help!
>> > > > Jason Altekruse
>> > > >
>> > >
>> >
>>
>
--
Ryan Blue
Software Engineer
Netflix
[
https://issues.apache.org/jira/browse/PARQUET-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183481#comment-17183481
]
Ryan Blue commented on PARQUET-1901:
It isn't clear to me how a filter implementation would handle
e are using thrift 0.12.0 on master since 1.5yrs and I
> > haven't experienced any issues with it in my environment (Linux) nor
> have I
> > met one in Travis builds.
> > Has anyone else experienced similar issues?
> >
> > Thanks,
> > Gabor
> >
> > O
> > > Binary artifacts are staged in Nexus here:
> > > *
> > https://repository.apache.org/content/groups/staging/org/apache/parquet/
> > >
> > > This release includes changes listed at
> > >
> >
> https://github.com/apache/parquet-mr
t; Please vote in the next 72 hours.
>
> [ ] +1 Release this as Apache Parquet 1.11.1
> [ ] +0
> [ ] -1 Do not release this because...
>
--
Ryan Blue
Software Engineer
Netflix
However, in current parquet-mr code, codec implementation can't be
> customized to leverage accelerators. We would like to proposal a pluggable
> API to support the customized compression codec.
> I've opened a JIRA https://issues.apache.org/jira/browse/PARQUET-1804 for
> this issue. What's your throughts on this issue?
> Best Regards,
> Xin Dong
>
--
Ryan Blue
Software Engineer
Netflix
[
https://issues.apache.org/jira/browse/PARQUET-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051585#comment-17051585
]
Ryan Blue commented on PARQUET-1809:
I think it should be fine to allow this. While there may
character in the
> keys but I guess it should be fine.
>
> What do you think?
>
> Cheers,
> Gabor
>
--
Ryan Blue
Software Engineer
Netflix
d would only be useful (if at all)
> to someone that really knows the library, not something that would be
> helpful to the higher level application developer.
>
> Thanks.
>
>
>
> On Fri, Jan 24, 2020 at 6:48 PM Ryan Blue wrote:
>
>> It sounds like we see log
gt; Parquet file", then I'm going to turn on DEBUG logging and try to reproduce
> the error.
>
> Thanks,
> David
>
> On Fri, Jan 24, 2020 at 12:01 PM Ryan Blue
> wrote:
>
>> I don't agree with the idea to convert all of Parquet's logs to DEBUG
>> level,
and write my own small example application of using the library
>> > directly.
>> >
>> > Is there some quick way that I can write a Parquet file to the local
>> file
>> > system using java.nio.Path (i.e., with no Hadoop dependencies?)
>> >
>> > Thanks!
>> >
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>
--
Ryan Blue
Software Engineer
Netflix
uet through Hive or Spark, but I wanted to sit
> down and write my own small example application of using the library
> directly.
>
> Is there some quick way that I can write a Parquet file to the local file
> system using java.nio.Path (i.e., with no Hadoop dependencies?)
>
> T
ent (DEBUG level logging). If things are going wrong, it should
> > throw
> > > > an Exception.
> > > >
> > > > If an operator suspects Parquet is the issue (and that's rarely the
> > first
> > > > thing to check), they can set the logging for all of the Loggers in
> the
> > > > entire Parquet package (org.apache.parquet) to DEBUG to get the
> > required
> > > > information. Not to mention, the less logging it does, the faster it
> > > will
> > > > be.
> > > >
> > > > I've opened this discussion because I've got two PRs related to this
> > > topic
> > > > ready to go:
> > > >
> > > > PARQUET-1758
> > > > PARQUET-1761
> > > >
> > > > Thanks,
> > > > David
> > >
> > >
> >
>
--
Ryan Blue
Software Engineer
Netflix
of the version control,
> because a lot of lines will be changed:
> https://github.com/apache/parquet-mr/pull/730/
>
> WDYT?
>
> Cheers, Fokko
>
--
Ryan Blue
Software Engineer
Netflix
raries
> anymore in C++ so anyone building the project would need to use a
> newer version. I don't see it as a major issue
>
> On Tue, Jan 7, 2020 at 12:21 PM Ryan Blue
> wrote:
> >
> > Looks like [this commit](
> >
> https://github.com/apache/parquet-format/commi
> > >> > You can find the KEYS file here:
> > >> > * https://apache.org/dist/parquet/KEYS
> > >> >
> > >> > Binary artifacts are staged in Nexus here:
> > >> > *
> > >> >
> > >>
> >
> https://repository.apache.org/content/groups/staging/org/apache/parquet/parquet-format/2.8.0
> > >> >
> > >> > This release includes changes listed here:
> > >> > *
> > >> >
> > >>
> >
> https://github.com/apache/parquet-format/blob/apache-parquet-format-2.8.0-rc0/CHANGES.md
> > >> >
> > >> > Please download, verify, and test.
> > >> >
> > >> > Please vote in the next 72 hours.
> > >> >
> > >> > [ ] +1 Release this as Apache Parquet Format 2.8.0
> > >> > [ ] +0
> > >> > [ ] -1 Do not release this because...
> > >>
> > >
> >
>
--
Ryan Blue
Software Engineer
Netflix
uet-mr) and dump as parquet .This works for
> > > primitive types without any issues but for nested types it will be
> little
> > > complicated so wanted to know if anything like this already exists or
> > > planned in near future .
> > >
> > > Please let me know if some other information is required from my side.
> > >
> > > Thanks in advance.
> > >
> > >
>
--
Ryan Blue
Software Engineer
Netflix
specified git hash matches the specified git tag.
> > > > - The contents of the source tarball match the contents of the git
> repo
> > > at
> > > > the specified tag.
> > > >
> > > > Br,
> > > >
> > > > Zoltan
> >
> > > >
> > > > >> I'm not sure that this is a binary compatibility issue. The
> missing
> > > > builder
> > > > >> method was recently added in 1.11.0 with the introduction of the
> new
> > > > >> logical type API, while t
> > __
> > / __/__ ___ _/ /__
> > _\ \/ _ \/ _ `/ __/ '_/
> >/___/ .__/\_,_/_/ /_/\_\ version 2.4.4
> > /_/
> >
> > Using Scala version 2.11.12, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_191
>
the Spark project
> > for a release after 2.4.4 and before 3.0 in which to bump the Parquet
> > dependency version to 1.11.x.
> >
> >michael
> >
> >
> > > On Nov 21, 2019, at 11:01 AM, Ryan Blue
> > wrote:
> > >
> > > Gabor,
> > > Caused by: java.lang.ClassNotFoundException:
> > org.apache.parquet.schema.LogicalTypeAnnotation
> > > at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
> > > at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> >
ight be decreased because of truncating the min/max values.
>
> Regards,
> Gabor
>
> On Mon, Nov 18, 2019 at 6:46 PM Ryan Blue
> wrote:
>
> > Gabor, do we have an idea of the additional overhead for a non-test data
> > file? It should be easy to validate that this doesn't
gt;
> > > > Binary artifacts are staged in Nexus here:
> > > > *
> > >
> >
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Fgroups%2Fstaging%2Forg%2Fapache%2Fparquet%2Fdata=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267sdata=lHtqLRQqQFwsyoaLSVaJuau5gxPKsCQFFVJaY8H0tZQ%3Dreserved=0
> > > >
> > > > This release includes the changes listed at:
> > > >
> > >
> >
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fparquet-mr%2Fblob%2Fapache-parquet-1.11.0-rc7%2FCHANGES.mddata=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267sdata=82BplI3bLAL6qArLHvVoYReZOk%2BboSP655rI8VX5Q5I%3Dreserved=0
> > > >
> > > > Please download, verify, and test.
> > > >
> > > > Please vote in the next 72 hours.
> > > >
> > > > [ ] +1 Release this as Apache Parquet 1.11.0
> > > > [ ] +0
> > > > [ ] -1 Do not release this because...
> > > >
> > > >
> > >
> >
>
--
Ryan Blue
Software Engineer
Netflix
[
https://issues.apache.org/jira/browse/PARQUET-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969493#comment-16969493
]
Ryan Blue commented on PARQUET-1681:
Looks like it might be https://issues.apache.org/jira/browse
[
https://issues.apache.org/jira/browse/PARQUET-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969491#comment-16969491
]
Ryan Blue commented on PARQUET-1681:
I think we should be able to work around this instead
[
https://issues.apache.org/jira/browse/PARQUET-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969489#comment-16969489
]
Ryan Blue commented on PARQUET-1681:
The Avro check should ignore record names if the record
> >
> >
> > Regards,
> >
> > Martin
> >
> >
> > From: Radev, Martin
> > Sent: Thursday, October 10, 2019 2:34:15 PM
> > To: Parquet Dev
> > Cc: Raoofy, Amir; Karlstetter, Roman
> > Subject: Re:
will create RC tags (e.g. apache-parquet-1.11.0-rc6) first and add the
> > final release tag (e.g. apache-parquet-1.11.0) after the vote passes.
> >
> > Regards,
> > Gabor
>
--
Ryan Blue
Software Engineer
Netflix
[
https://issues.apache.org/jira/browse/PARQUET-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961187#comment-16961187
]
Ryan Blue commented on PARQUET-1685:
Looks like Gabor is right. The stats fields used for each
.
> > >
> > > Cheers, Fokko
> > >
> > >
> > >
> > > Op di 15 okt. 2019 om 11:45 schreef Manik Singla >:
> > >
> > > > Hi Guys
> > > >
> > > > I was looking for tasks list or blockers which are required to
> support
> > > > multi-threaded writer( java specifically).
> > > > I did not find anything in JIRA or forums.
> > > >
> > > > Could someone help me to point some doc/link if exists
> > > >
> > > >
> > > > Regards
> > > > Manik Singla
> > > > +91-9996008893
> > > > +91-9665639677
> > > >
> > > > "Life doesn't consist in holding good cards but playing those you
> hold
> > > > well."
> > > >
> > >
> >
>
--
Ryan Blue
Software Engineer
Netflix
wse/PARQUET-1675> for moving the
> > > existing svn repo to git.
> > >
> > > If there are no objections I will create an infra ticket to move the
> svn
> > > repo https://svn.apache.org/repos/asf/parquet to the new git
> repository
> > > https://github.com/apache/parquet.
> > >
> > > Regards,
> > > Gabor
> > >
> >
>
--
Ryan Blue
Software Engineer
Netflix
t; > >
> > > > > The commit id is ee5cae066ed602bd969024eb308c5262c451b6cd
> > > > > * This corresponds to the tag: apache-parquet-format-2.7.0
> > > > > *
> > > >
> > >
> >
> https://github.com/apache/parquet-forma
gt;
>
> Regards,
>
> Martin
> ------
> *From:* Ryan Blue
> *Sent:* Saturday, September 14, 2019 2:23:20 AM
> *To:* Radev, Martin
> *Cc:* Parquet Dev; Raoofy, Amir; Karlstetter, Roman
> *Subject:* Re: [VOTE] Add BYTE_STREAM_SPLIT encoding
should be fast.
> There's an extra compression step so preferably there's very little
> latency before it.
>
> @Wes, can you have a look?
>
> More opinions are welcome.
>
> If you have floating point data available, I would be very happy to
> examine whether this approach o
te!
> >
> >
> > On Wed, Jul 24, 2019 at 9:30 PM 俊杰陈 wrote:
> > >
> > > Hi @Ryan Blue @Wes McKinney
> > >
> > > We need your valuable vote, any feedback is welcome as well.
> > >
> > > On Tue, Jul 23, 2019 at 1:24 PM 俊
t; > >
> > >
> > > An earlier report which examines other FP compressors (fpzip, spdp,
> fpc,
> > > zfp, sz) and new potential encodings is available here:
> > >
> >
> https://drive.google.com/file/d/1wfLQyO2G5nofYFkS7pVbUW0-oJkQqBvv/view?usp=sharing
> > > The report also covers lossy compression but the BYTE_STREAM_SPLIT
> > > encoding only has the focus of lossless compression.
> > >
> > >
> > > Can we have a vote?
> > >
> > >
> > > Regards,
> > >
> > > Martin
> > >
> > >
> >
>
--
Ryan Blue
Software Engineer
Netflix
a JIRA and I'd like to do some
> clean-up :
>
> https://issues.apache.org/jira/browse/PARQUET-1644
>
> Do I need to be assigned the JIRA, or do I just create the PR?
>
> I'm on the ASF slack #parquet channel, don't hesitate to say hi!
>
> All my best, Ryan
>
--
Ryan Blue
Software Engineer
Netflix
[
https://issues.apache.org/jira/browse/PARQUET-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911797#comment-16911797
]
Ryan Blue commented on PARQUET-722:
---
Looks like this was fixed when cascading3 support updated
[
https://issues.apache.org/jira/browse/PARQUET-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911797#comment-16911797
]
Ryan Blue edited comment on PARQUET-722 at 8/20/19 10:59 PM:
-
Looks like
Thanks for working on this, Jim! I merged the current PR.
On Tue, Aug 6, 2019 at 8:39 AM Jim Apple wrote:
> On 2019/08/05 18:05:53, Ryan Blue wrote:
> > At least getting a compression union into the bloom filter header
> > will help us with compatibility later if we choose to
choose to add compression
schemes. It think it may also be worth the overhead of naive compression in
some cases, though I didn't thoroughly read through that reference yet.
On Sun, Aug 4, 2019 at 7:56 PM Jim Apple wrote:
> On 2019/08/03 20:42:10, Ryan Blue wrote:
> >- Should the bloo
been trying to read such files produced by Spark. More
> > > comprehensive integration testing would help ensure that the libraries
> > > remain compatible.
> > >
> > > On Tue, Jul 30, 2019 at 9:17 PM 俊杰陈 wrote:
> > > >
> > > &g
gt; We still need your vote!
> >
> >
> > On Wed, Jul 24, 2019 at 9:30 PM 俊杰陈 wrote:
> > >
> > > Hi @Ryan Blue @Wes McKinney
> > >
> > > We need your valuable vote, any feedback is welcome as well.
> > >
> > > On
[
https://issues.apache.org/jira/browse/PARQUET-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891462#comment-16891462
]
Ryan Blue commented on PARQUET-1434:
My concern is that it has not been reviewed well enough
mn type from int32 to int64 in file metadata
> and
> > column (chunk) metadata directly, can compressed data be read correctly?
> If
> > not, what's problem?
> >
> > Thank you so much for your time and we would be appreciated if you could
> > reply.
> >
> > Best Regards,
> > Ronnie
> >
> >
> >
>
--
Ryan Blue
Software Engineer
Netflix
[
https://issues.apache.org/jira/browse/PARQUET-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884204#comment-16884204
]
Ryan Blue commented on PARQUET-1488:
We discussed this on SPARK-28371.
Previously, Parquet did
[
https://issues.apache.org/jira/browse/PARQUET-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryan Blue reassigned PARQUET-1488:
--
Assignee: Yuming Wang (was: Gabor Szadovszky)
> UserDefinedPredicate th
[
https://issues.apache.org/jira/browse/PARQUET-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryan Blue reopened PARQUET-1488:
> UserDefinedPredicate throw NullPointerExcept
Ryan Blue created PARQUET-1624:
--
Summary: ParquetFileReader.open ignores Hadoop configuration
options
Key: PARQUET-1624
URL: https://issues.apache.org/jira/browse/PARQUET-1624
Project: Parquet
; > temporary
> > > > agreement that the commit should make it into a future release. I
> > > > understand you to be saying that in parquet-format, a vote on format
> > > > additions is standard, whether or not a commit made it into HEAD.
> > > >
> > > > There have been previous discussions of Bloom filters in the pull
> > > > requests, on this list, and in live videochat meetups (from quite a
> > while
> > > > ago). In your opinion, should we start a new discussion, or start a
> > > [VOTE]
> > > > thread with pointers to the old discussions, or some third option?
> > > >
> > >
> >
> >
> > --
> > Thanks & Best Regards
> >
>
--
Ryan Blue
Software Engineer
Netflix
> > > Hi,
> > > > >
> > > > > The Project Management Committee (PMC) for Apache Parquet has
> invited
> > > > Gabor
> > > > > Szadovszky to become a member of the PMC and we are pleased to
> > announce
> > > > > that he has accepted.
> > > > >
> > > > > Congratulations, Gabor!
> > > > >
> > > > > Br,
> > > > >
> > > > > Zoltan
> > > > >
> > > >
> > >
> >
>
--
Ryan Blue
Software Engineer
Netflix
ppears to require a committer to do some
> > prep work first.
> >
> > https://parquet.apache.org/documentation/how-to-release/
> >
> > Any committer volunteers?
> >
>
--
Ryan Blue
Software Engineer
Netflix
I think we can add that one.
On Fri, May 31, 2019 at 9:18 AM Michael Heuer wrote:
> Might
>
> https://github.com/apache/parquet-mr/pull/560
>
> be included in the next 1.11.0 release candidate?
>
>michael
>
>
> On May 31, 2019, at 11:09 AM, Ryan Blue wrote:
&g
: TestSnappy() throws OOM exception with Parquet-1485
> change
> > - PARQUET-1531: Page row count limit causes empty pages to be written
> from
> > MessageColumnIO
> > - PARQUET-1544: Possible over-shading of modules
> >
> > The following change has been reverted so it is not
timizations we should do, such as
> > xxhash, folding bloom filters and etc., I think we can handle
> optimization
> > further on the master.
> >
> > Please help to vote on this.
> >
> >
> >
> > Thanks & Best Regards
>
--
Ryan Blue
Software Engineer
Netflix
ote here.
> Welcome to provide advise or vote.
>
>
> Thanks & Best Regards
>
--
Ryan Blue
Software Engineer
Netflix
you are not the intended recipient, please contact the
> sender by
> reply email and destroy all copies of the original message.
>
> -------
>
--
Ryan Blue
Software Engineer
Netflix
’t know if there is a local
FS that supports .crc files in C++.
--
Ryan Blue
Software Engineer
Netflix
th fields all over the
> place.
>
> External file references to BLOBS is doable but not the elegant,
> integrated solution I was hoping for.
>
> -Brian
>
> On Apr 5, 2019, at 1:53 PM, Ryan Blue wrote:
>
> *EXTERNAL*
> Looks like we will need a new encoding for this:
thrift.h
>
> inline void DeserializeThriftMsg(const uint8_t* buf, uint32_t* len, T*
> deserialized_msg) {
> inline int64_t SerializeThriftMsg(T* obj, uint32_t len, OutputStream* out)
>
> -Brian
>
> On 4/5/19, 1:32 PM, "Ryan Blue" wrote:
>
> EXTERNAL
>
>
at
> require file format versioning changes?
>
> I realize this a non-trivial ask. Thanks for considering it.
>
> -Brian
>
--
Ryan Blue
Software Engineer
Netflix
uses the
> > HeapByteBuffer and DirectByteBuffer as its ByteBuffer. In particular,
> > neither of them support lazy evaluation. So when you read data into them,
> > it actually reads the data right away.
> >
> > So, Is it possible to configure the ParquetFileReader to read pages in
> the
> > row-group lazily, and at each step read only the relevant pages for each
> > column?
> >
> > Reagrds,
> > Tomer Solomon
> >
>
--
Ryan Blue
Software Engineer
Netflix
timeline for the publication of 1.11.0 to Maven central
> repository?
> I see it was released on the repo 18 days ago.
>
> Is there any other maven repository that hosts the artifacts?
>
> Many thanks,
> Masih
--
Ryan Blue
Software Engineer
Netflix
e case.
> It also needs some changes in ValidTypeMap.java &
> SchemaCompatibilityValidator.java for Filter predicate.
>
> Can parquet support this type upcasting feature? I came across such
> scenario in one of my use case.
>
> Thanks,
> Swapnil
>
--
Ryan Blue
Software Engineer
Netflix
[
https://issues.apache.org/jira/browse/PARQUET-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16775448#comment-16775448
]
Ryan Blue commented on PARQUET-1142:
The next steps for this are to get compression working without
at are actually already leveraging the CRC
> field? And if not, should we have a discussion on refining the spec to
> remove the ambiguity?
>
> Thank you,
> Boudewijn
>
--
Ryan Blue
Software Engineer
Netflix
rrent release
> >> I think the most easier and painless solution is to revert it. Created
> the
> >> PR #620 for it.
> >>
> >> We usually don't do reverts especially for commits that are sitting in
> >> master for a while. I would like to ask your opini
ted
> jackson package in the code is kind of confusing.
>
> Qinghui
>
> Le lun. 18 févr. 2019 à 18:14, Ryan Blue a écrit :
>
>> Qinghui,
>>
>> Parquet source uses the unshaded dependencies, but those dependencies are
>> rewritten in every module's build. Tha
[
https://issues.apache.org/jira/browse/PARQUET-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryan Blue resolved PARQUET-1281.
Resolution: Not A Problem
> Jackson dependency
> --
>
>
Jackson.
> >
> > Spark 2.x is at 2.6, Spark 3.0 at 2.9.6, Hadoop at 2.9.x, Flink at 2.7.9,
> > but that one is shaded anyway :-) One problem might be Apache Avro which
> is
> > still using Jackson 1.x (codehause), until we release Avro 1.9.
> >
> > What are the t
>> snapshots.forEach(snapshot -> {
>> logger.info("Thread_id: {}, after committing
>> to table, " +
>> "snapshot.addedFiles() : {} " ,
>> Thread.currentThread().getId(), snapshot.addedFiles());
>>
>> snapshot.addedFiles().forEach(dataFile -> {
>> logger.info("Thread_id: {},
>> after committing to table, snapshot.dataFile() : {} " ,
>> Thread.currentThread().getId(), dataFile);
>> }
>> );
>> });
>> }
>> });
>>
>> }
>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Iceberg Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to iceberg-devel+unsubscr...@googlegroups.com.
> To post to this group, send email to iceberg-de...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/iceberg-devel/d2b0e038-d62a-4022-9af4-21775441ac94%40googlegroups.com
> <https://groups.google.com/d/msgid/iceberg-devel/d2b0e038-d62a-4022-9af4-21775441ac94%40googlegroups.com?utm_medium=email_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
--
Ryan Blue
Software Engineer
Netflix
ll values. (#603)
>
> Please download, verify, and test. The vote will be open for at least 72
> hours.
>
> Thanks,
> Gabor
>
--
Ryan Blue
Software Engineer
Netflix
Feb 6, 2019 at 12:10 PM Ryan Blue
> wrote:
> >
> > I disagree with Wes. He's right that you *could* just use binary and keep
> > extra metadata somewhere, it is very unlikely that Parquet would ever
> > support such a scheme. And it is bad for the community when people
&
encoding hint when saving ByteBuffer.
> > > >
> > > > I don't find way to use any thing other than UTF-8.
> > > > https://github.com/apache/parquet-format/blob/master/LogicalTypes.md
> > > says
> > > > we can extend primitive types to solve cases.
> > > >
> > > > Other thing I want to mention is I am only the producer of parquet
> file
> > > but
> > > > not consumer.
> > > >
> > > > Could you guide me which examples I can look into or which will be
> right
> > > way
> > > >
> > > >
> > > > Regards
> > > > Manik Singla
> > > > +91-9996008893
> > > > +91-9665639677
> > > >
> > > > "Life doesn't consist in holding good cards but playing those you
> hold
> > > > well."
> > >
>
--
Ryan Blue
Software Engineer
Netflix
[
https://issues.apache.org/jira/browse/PARQUET-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryan Blue resolved PARQUET-1512.
Resolution: Fixed
> Release Parquet Java 1.1
[
https://issues.apache.org/jira/browse/PARQUET-138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryan Blue reassigned PARQUET-138:
-
Assignee: Nicolas Trinquier (was: Ryan Blue)
> Parquet should allow a merge between requi
[
https://issues.apache.org/jira/browse/PARQUET-138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryan Blue reassigned PARQUET-138:
-
Assignee: Nicolas Trinquier (was: Nicolas Trinquier)
> Parquet should allow a merge betw
[
https://issues.apache.org/jira/browse/PARQUET-138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryan Blue reassigned PARQUET-138:
-
Assignee: Ryan Blue
> Parquet should allow a merge between required and optional sche
n, Jan 28, 2019 at 2:08 PM Ryan Blue
> wrote:
>
>> Hi everyone,
>>
>> I propose the following RC to be released as official Apache Parquet Java
>> 1.10.1 release.
>>
>> The commit id is a89df8f9932b6ef6633d06069e50c9b7970bebd1
>>
>>- This cor
ks!
>
> For future, it would be good to include one as we have in Arrow that also
> checks the signature. We have that in the main tree and the script also
> downloads the source tarball. Then the script is simply in git and not part
> of the release.
>
> Uwe
>
>
> On Thu, Ja
[
https://issues.apache.org/jira/browse/PARQUET-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757679#comment-16757679
]
Ryan Blue commented on PARQUET-1520:
Thanks for contributing!
> Update README to use correct bu
[
https://issues.apache.org/jira/browse/PARQUET-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryan Blue reassigned PARQUET-1520:
--
Assignee: Dongjoon Hyun
> Update README to use correct build and version i
[
https://issues.apache.org/jira/browse/PARQUET-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryan Blue resolved PARQUET-1520.
Resolution: Fixed
Fix Version/s: 1.10.2
> Update README to use correct build and vers
19 at 5:07 PM Dongjoon Hyun
> wrote:
>
>> Sure! I'll make a PR for that.
>>
>> Bests,
>> Dongjoon.
>>
>> On Wed, Jan 30, 2019 at 3:11 PM Ryan Blue wrote:
>>
>>> Looks like the README is out of date. I don't think we should fail this
>>>
E says `The current release is version 1.8.1` instead of
> `1.10.1`. Is it worth to fix?
>
> Bests,
> Dongjoon.
>
>
> On Wed, Jan 30, 2019 at 10:45 AM Ryan Blue
> wrote:
>
>> +1 (binding)
>>
>> Validated source signature, checksum. Ran uni
rect
> > based on the release tag. Unit tests pass.
> >
> > +1 (non-binding)
> >
> > Cheers,
> > Gabor
> >
> >
> > On Mon, Jan 28, 2019 at 11:08 PM Ryan Blue
> > wrote:
> >
> > > Hi everyone,
> > >
ng parquet-hive-* is a great idea, the code in Parquet is not
> > > maintained any more, it is just a burden there.
> > >
> > > As of parquet-pig, I'd prefer moving it to Pig (if Pig community
> accepts
> > it
> > > as it is) instead of dropping it or moving to a separate project.
-1510: Dictionary filter bug skips null for notEq with
dictionary of one value
Please download, verify, and test.
Please vote in the next 72 hours:
[ ] +1 Release this as Apache Parquet Java 1.10.1
[ ] +0
[ ] -1 Do not release this because…
--
Ryan Blue
Software Engineer
Netflix
, we’ve moved more to a model where processing
frameworks and engines maintain their own integration. Spark, Presto,
Iceberg, and Hive fall into this category. So I would prefer to drop Pig
and Cascading3. I’m fine keeping thrift if people think it is useful.
Thoughts?
rb
--
Ryan Blue
Software
[
https://issues.apache.org/jira/browse/PARQUET-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ryan Blue resolved PARQUET-1510.
Resolution: Fixed
> Dictionary filter skips null values when evaluating not-equ
1 - 100 of 713 matches
Mail list logo