Re: [VOTE][Format] Add Float16 type to specification

2023-10-05 Thread Ryan Blue
e in if you have a strong > > > preference for HALF or HALF_FLOAT over FLOAT16! > > > > > > > > > This vote will be open for at least 72 hours. > > > > > > [ ] +1 Add this type to the format specification > > > [ ] +0 > > > [ ] -1 Do not add this type to the format specification because... > > > > > > Thanks! > > > > > > Ben > > > > > > [1]: > https://en.wikipedia.org/wiki/Half-precision_floating-point_format > > > > > > > > > > > > > > > > -- > Xinli Shang > -- Ryan Blue Tabular

Re: [VOTE] Release Apache Parquet 1.13.1 RC0

2023-05-17 Thread Ryan Blue
> > > This release includes important changes: > > > > > > * https://github.com/apache/parquet-mr/commits/parquet-1.13.x > > > > > > > > > Handy commands for verifying the release: > > > > > > * > > > > > > https://iceberg.apache.org/how-to-release/#validating-a-source-release-candidate > > > > > > Replace Iceberg with Parquet :) > > > > > > > > > Please download, verify, and test. > > > > > > > > > Please vote in the next 72 hours. > > > > > > > > > [ ] +1 Release this as Apache Parquet 1.13.1 > > > > > > [ ] +0 > > > > > > [ ] -1 Do not release this because... > > > > > > -- Ryan Blue Tabular

[jira] [Commented] (PARQUET-1968) FilterApi support In predicate

2021-02-01 Thread Ryan Blue (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276548#comment-17276548 ] Ryan Blue commented on PARQUET-1968: Thank you! I'm not sure why it was no longer on my calendar. I

[jira] [Commented] (PARQUET-1968) FilterApi support In predicate

2021-02-01 Thread Ryan Blue (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17276526#comment-17276526 ] Ryan Blue commented on PARQUET-1968: I would really like to see a new Parquet API that can support

Re: Parquet sync meeting 11/24/2020

2020-11-24 Thread Ryan Blue
to discuss for the next community meeting. > 3. > >Parquet 1.12.0 > > a. Will cut RC release soon > > Please let me know if you have any questions. > > Xinli Shang | Tech Lead Manager @ Uber Data Infra > > > -- > Xinli Shang > -- Ryan Blue Software Engineer Netflix

Re: Current status of Data Page V2?

2020-10-12 Thread Ryan Blue
> >>> decoding typically dominates total read processing time, on average > > I've > > >>> seen 5-10x per cell cpu cost increase for variable reads over scalar > > >>> reads). AFAIK, there is still no option for that in V1. > > >>> > > &

Re: Current status of Data Page V2?

2020-10-08 Thread Ryan Blue
use them! On Thu, Oct 8, 2020 at 12:44 PM Micah Kornfield wrote: > What is the current status of support for Data Page V2? Is it recommended > for production workloads? > > Thanks, > Micah > -- Ryan Blue Software Engineer Netflix

Re: Metadata summary file deprecation

2020-09-30 Thread Ryan Blue
ad of the > small tasks. > > I actually would like to have a design that would do the "fall-back" using > the driver side pruning and uniform split planning for any footers missing > from the summary file, but I thought that might add extra complexity to the > discussion.

Re: Metadata summary file deprecation

2020-09-30 Thread Ryan Blue
filename, the summary would > need to contain file length info. > > There is also the possibility that parquet files could be deleted and > rewritten in the same filenames, but this isn't common in any hadoop/spark > ecosystem projects I know of, they all generate unique filenames

Re: Metadata summary file deprecation

2020-09-29 Thread Ryan Blue
://github.com/apache/parquet-mr/pull/429 >> > > > >> > > > There are other members of the broader parquet community that are >> also >> > > > confused by this deprecation, see this discussion in an arrow PR. >> > > > https://github.com/apache/arrow/pull/4166 >> > > > >> > > > In the course of making my small prototype I got an extra >> performance >> > > > boost by making spark write out metadata summary files, rather than >> > > having >> > > > to read all footers on the driver. This effect would be even more >> > > > pronounced on a completely remote storage system like S3. Writing >> these >> > > > summary files was disabled by default in SPARK-15719, because of the >> > > > performance impact of appending a small number of new files to an >> > > existing >> > > > dataset with many files. >> > > > >> > > > https://issues.apache.org/jira/browse/SPARK-15719 >> > > > >> > > > This spark JIRA does make decent points considering how spark >> operates >> > > > today, but I think that there is a performance optimization >> opportunity >> > > > that is missed because the row group pruning is deferred to a bunch >> of >> > > > separate short lived tasks rather than done upfront, currently spark >> > only >> > > > uses footers on the driver for schema merging. >> > > > >> > > > Thanks for the help! >> > > > Jason Altekruse >> > > > >> > > >> > >> > -- Ryan Blue Software Engineer Netflix

[jira] [Commented] (PARQUET-1901) Add filter null check for ColumnIndex

2020-08-24 Thread Ryan Blue (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183481#comment-17183481 ] Ryan Blue commented on PARQUET-1901: It isn't clear to me how a filter implementation would handle

Re: [VOTE] Release Apache Parquet 1.11.1 RC1

2020-08-05 Thread Ryan Blue
e are using thrift 0.12.0 on master since 1.5yrs and I > > haven't experienced any issues with it in my environment (Linux) nor > have I > > met one in Travis builds. > > Has anyone else experienced similar issues? > > > > Thanks, > > Gabor > > > > O

Re: [VOTE] Release Apache Parquet 1.11.1 RC1

2020-07-30 Thread Ryan Blue
> > > Binary artifacts are staged in Nexus here: > > > * > > https://repository.apache.org/content/groups/staging/org/apache/parquet/ > > > > > > This release includes changes listed at > > > > > > https://github.com/apache/parquet-mr

Re: Subject: [VOTE] Release Apache Parquet 1.11.1 RC0

2020-07-06 Thread Ryan Blue
t; Please vote in the next 72 hours. > > [ ] +1 Release this as Apache Parquet 1.11.1 > [ ] +0 > [ ] -1 Do not release this because... > -- Ryan Blue Software Engineer Netflix

Re: Provide pluggable APIs to support user customized compression codec

2020-03-06 Thread Ryan Blue
However, in current parquet-mr code, codec implementation can't be > customized to leverage accelerators. We would like to proposal a pluggable > API to support the customized compression codec. > I've opened a JIRA https://issues.apache.org/jira/browse/PARQUET-1804 for > this issue. What's your throughts on this issue? > Best Regards, > Xin Dong > -- Ryan Blue Software Engineer Netflix

[jira] [Commented] (PARQUET-1809) Add new APIs for nested predicate pushdown

2020-03-04 Thread Ryan Blue (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17051585#comment-17051585 ] Ryan Blue commented on PARQUET-1809: I think it should be fine to allow this. While there may

Re: Allow users to fine-tune parquet writing

2020-02-03 Thread Ryan Blue
character in the > keys but I guess it should be fine. > > What do you think? > > Cheers, > Gabor > -- Ryan Blue Software Engineer Netflix

Re: Parquet Verbose Logging

2020-01-24 Thread Ryan Blue
d would only be useful (if at all) > to someone that really knows the library, not something that would be > helpful to the higher level application developer. > > Thanks. > > > > On Fri, Jan 24, 2020 at 6:48 PM Ryan Blue wrote: > >> It sounds like we see log

Re: Parquet Verbose Logging

2020-01-24 Thread Ryan Blue
gt; Parquet file", then I'm going to turn on DEBUG logging and try to reproduce > the error. > > Thanks, > David > > On Fri, Jan 24, 2020 at 12:01 PM Ryan Blue > wrote: > >> I don't agree with the idea to convert all of Parquet's logs to DEBUG >> level,

Re: Writing to Local File

2020-01-24 Thread Ryan Blue
and write my own small example application of using the library >> > directly. >> > >> > Is there some quick way that I can write a Parquet file to the local >> file >> > system using java.nio.Path (i.e., with no Hadoop dependencies?) >> > >> > Thanks! >> > >> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >> > -- Ryan Blue Software Engineer Netflix

Re: Writing to Local File

2020-01-24 Thread Ryan Blue
uet through Hive or Spark, but I wanted to sit > down and write my own small example application of using the library > directly. > > Is there some quick way that I can write a Parquet file to the local file > system using java.nio.Path (i.e., with no Hadoop dependencies?) > > T

Re: Parquet Verbose Logging

2020-01-24 Thread Ryan Blue
ent (DEBUG level logging). If things are going wrong, it should > > throw > > > > an Exception. > > > > > > > > If an operator suspects Parquet is the issue (and that's rarely the > > first > > > > thing to check), they can set the logging for all of the Loggers in > the > > > > entire Parquet package (org.apache.parquet) to DEBUG to get the > > required > > > > information. Not to mention, the less logging it does, the faster it > > > will > > > > be. > > > > > > > > I've opened this discussion because I've got two PRs related to this > > > topic > > > > ready to go: > > > > > > > > PARQUET-1758 > > > > PARQUET-1761 > > > > > > > > Thanks, > > > > David > > > > > > > > > -- Ryan Blue Software Engineer Netflix

Re: Spotless

2020-01-08 Thread Ryan Blue
of the version control, > because a lot of lines will be changed: > https://github.com/apache/parquet-mr/pull/730/ > > WDYT? > > Cheers, Fokko > -- Ryan Blue Software Engineer Netflix

Re: [VOTE] Release Apache Parquet Format 2.8.0 RC0

2020-01-07 Thread Ryan Blue
raries > anymore in C++ so anyone building the project would need to use a > newer version. I don't see it as a major issue > > On Tue, Jan 7, 2020 at 12:21 PM Ryan Blue > wrote: > > > > Looks like [this commit]( > > > https://github.com/apache/parquet-format/commi

Re: [VOTE] Release Apache Parquet Format 2.8.0 RC0

2020-01-07 Thread Ryan Blue
> > >> > You can find the KEYS file here: > > >> > * https://apache.org/dist/parquet/KEYS > > >> > > > >> > Binary artifacts are staged in Nexus here: > > >> > * > > >> > > > >> > > > https://repository.apache.org/content/groups/staging/org/apache/parquet/parquet-format/2.8.0 > > >> > > > >> > This release includes changes listed here: > > >> > * > > >> > > > >> > > > https://github.com/apache/parquet-format/blob/apache-parquet-format-2.8.0-rc0/CHANGES.md > > >> > > > >> > Please download, verify, and test. > > >> > > > >> > Please vote in the next 72 hours. > > >> > > > >> > [ ] +1 Release this as Apache Parquet Format 2.8.0 > > >> > [ ] +0 > > >> > [ ] -1 Do not release this because... > > >> > > > > > > -- Ryan Blue Software Engineer Netflix

Re: Implementation of Arrow table to Parquet File Writer

2020-01-06 Thread Ryan Blue
uet-mr) and dump as parquet .This works for > > > primitive types without any issues but for nested types it will be > little > > > complicated so wanted to know if anything like this already exists or > > > planned in near future . > > > > > > Please let me know if some other information is required from my side. > > > > > > Thanks in advance. > > > > > > > -- Ryan Blue Software Engineer Netflix

Re: [RESULT] Release Apache Parquet 1.11.0 RC7

2019-12-06 Thread Ryan Blue
specified git hash matches the specified git tag. > > > > - The contents of the source tarball match the contents of the git > repo > > > at > > > > the specified tag. > > > > > > > > Br, > > > > > > > > Zoltan > >

Re: [VOTE] Release Apache Parquet 1.11.0 RC7

2019-11-25 Thread Ryan Blue
> > > > > > > > >> I'm not sure that this is a binary compatibility issue. The > missing > > > > builder > > > > >> method was recently added in 1.11.0 with the introduction of the > new > > > > >> logical type API, while t

Re: [VOTE] Release Apache Parquet 1.11.0 RC7

2019-11-22 Thread Ryan Blue
> > __ > > / __/__ ___ _/ /__ > > _\ \/ _ \/ _ `/ __/ '_/ > >/___/ .__/\_,_/_/ /_/\_\ version 2.4.4 > > /_/ > > > > Using Scala version 2.11.12, Java HotSpot(TM) 64-Bit Server VM, 1.8.0_191 >

Re: [VOTE] Release Apache Parquet 1.11.0 RC7

2019-11-21 Thread Ryan Blue
the Spark project > > for a release after 2.4.4 and before 3.0 in which to bump the Parquet > > dependency version to 1.11.x. > > > >michael > > > > > > > On Nov 21, 2019, at 11:01 AM, Ryan Blue > > wrote: > > > > > > Gabor,

Re: [VOTE] Release Apache Parquet 1.11.0 RC7

2019-11-21 Thread Ryan Blue
> > > Caused by: java.lang.ClassNotFoundException: > > org.apache.parquet.schema.LogicalTypeAnnotation > > > at java.net.URLClassLoader.findClass(URLClassLoader.java:382) > > > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > >

Re: [VOTE] Release Apache Parquet 1.11.0 RC7

2019-11-19 Thread Ryan Blue
ight be decreased because of truncating the min/max values. > > Regards, > Gabor > > On Mon, Nov 18, 2019 at 6:46 PM Ryan Blue > wrote: > > > Gabor, do we have an idea of the additional overhead for a non-test data > > file? It should be easy to validate that this doesn't

Re: [VOTE] Release Apache Parquet 1.11.0 RC7

2019-11-18 Thread Ryan Blue
gt; > > > > Binary artifacts are staged in Nexus here: > > > > * > > > > > > https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Frepository.apache.org%2Fcontent%2Fgroups%2Fstaging%2Forg%2Fapache%2Fparquet%2Fdata=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267sdata=lHtqLRQqQFwsyoaLSVaJuau5gxPKsCQFFVJaY8H0tZQ%3Dreserved=0 > > > > > > > > This release includes the changes listed at: > > > > > > > > > > https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fparquet-mr%2Fblob%2Fapache-parquet-1.11.0-rc7%2FCHANGES.mddata=02%7C01%7Cyumwang%40ebay.com%7C8d588ca5855842a94bed08d7683e1221%7C46326bff992841a0baca17c16c94ea99%7C0%7C0%7C637092488114756267sdata=82BplI3bLAL6qArLHvVoYReZOk%2BboSP655rI8VX5Q5I%3Dreserved=0 > > > > > > > > Please download, verify, and test. > > > > > > > > Please vote in the next 72 hours. > > > > > > > > [ ] +1 Release this as Apache Parquet 1.11.0 > > > > [ ] +0 > > > > [ ] -1 Do not release this because... > > > > > > > > > > > > > > -- Ryan Blue Software Engineer Netflix

[jira] [Commented] (PARQUET-1681) Avro's isElementType() change breaks the reading of some parquet(1.8.1) files

2019-11-07 Thread Ryan Blue (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969493#comment-16969493 ] Ryan Blue commented on PARQUET-1681: Looks like it might be https://issues.apache.org/jira/browse

[jira] [Commented] (PARQUET-1681) Avro's isElementType() change breaks the reading of some parquet(1.8.1) files

2019-11-07 Thread Ryan Blue (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969491#comment-16969491 ] Ryan Blue commented on PARQUET-1681: I think we should be able to work around this instead

[jira] [Commented] (PARQUET-1681) Avro's isElementType() change breaks the reading of some parquet(1.8.1) files

2019-11-07 Thread Ryan Blue (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16969489#comment-16969489 ] Ryan Blue commented on PARQUET-1681: The Avro check should ignore record names if the record

Re: [VOTE] Add BYTE_STREAM_SPLIT encoding to Apache Parquet

2019-11-01 Thread Ryan Blue
> > > > > > Regards, > > > > Martin > > > > > > From: Radev, Martin > > Sent: Thursday, October 10, 2019 2:34:15 PM > > To: Parquet Dev > > Cc: Raoofy, Amir; Karlstetter, Roman > > Subject: Re:

Re: release process - using rc tags

2019-10-30 Thread Ryan Blue
will create RC tags (e.g. apache-parquet-1.11.0-rc6) first and add the > > final release tag (e.g. apache-parquet-1.11.0) after the vote passes. > > > > Regards, > > Gabor > -- Ryan Blue Software Engineer Netflix

[jira] [Commented] (PARQUET-1685) Truncate the stored min and max for String statistics to reduce the footer size

2019-10-28 Thread Ryan Blue (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16961187#comment-16961187 ] Ryan Blue commented on PARQUET-1685: Looks like Gabor is right. The stats fields used for each

Re: multi threading support

2019-10-21 Thread Ryan Blue
. > > > > > > Cheers, Fokko > > > > > > > > > > > > Op di 15 okt. 2019 om 11:45 schreef Manik Singla >: > > > > > > > Hi Guys > > > > > > > > I was looking for tasks list or blockers which are required to > support > > > > multi-threaded writer( java specifically). > > > > I did not find anything in JIRA or forums. > > > > > > > > Could someone help me to point some doc/link if exists > > > > > > > > > > > > Regards > > > > Manik Singla > > > > +91-9996008893 > > > > +91-9665639677 > > > > > > > > "Life doesn't consist in holding good cards but playing those you > hold > > > > well." > > > > > > > > > > -- Ryan Blue Software Engineer Netflix

Re: Updating parquet web site

2019-10-18 Thread Ryan Blue
wse/PARQUET-1675> for moving the > > > existing svn repo to git. > > > > > > If there are no objections I will create an infra ticket to move the > svn > > > repo https://svn.apache.org/repos/asf/parquet to the new git > repository > > > https://github.com/apache/parquet. > > > > > > Regards, > > > Gabor > > > > > > -- Ryan Blue Software Engineer Netflix

Re: [VOTE] Release Apache Parquet Format 2.7.0 RC0

2019-09-26 Thread Ryan Blue
t; > > > > > > > The commit id is ee5cae066ed602bd969024eb308c5262c451b6cd > > > > > * This corresponds to the tag: apache-parquet-format-2.7.0 > > > > > * > > > > > > > > > > https://github.com/apache/parquet-forma

Re: [VOTE] Add BYTE_STREAM_SPLIT encoding to Apache Parquet

2019-09-19 Thread Ryan Blue
gt; > > Regards, > > Martin > ------ > *From:* Ryan Blue > *Sent:* Saturday, September 14, 2019 2:23:20 AM > *To:* Radev, Martin > *Cc:* Parquet Dev; Raoofy, Amir; Karlstetter, Roman > *Subject:* Re: [VOTE] Add BYTE_STREAM_SPLIT encoding

Re: [VOTE] Add BYTE_STREAM_SPLIT encoding to Apache Parquet

2019-09-13 Thread Ryan Blue
should be fast. > There's an extra compression step so preferably there's very little > latency before it. > > @Wes, can you have a look? > > More opinions are welcome. > > If you have floating point data available, I would be very happy to > examine whether this approach o

Re: [VOTE] Parquet Bloom filter spec sign-off

2019-09-06 Thread Ryan Blue
te! > > > > > > On Wed, Jul 24, 2019 at 9:30 PM 俊杰陈 wrote: > > > > > > Hi @Ryan Blue @Wes McKinney > > > > > > We need your valuable vote, any feedback is welcome as well. > > > > > > On Tue, Jul 23, 2019 at 1:24 PM 俊

Re: [VOTE] Add BYTE_STREAM_SPLIT encoding to Apache Parquet

2019-09-03 Thread Ryan Blue
t; > > > > > > > > An earlier report which examines other FP compressors (fpzip, spdp, > fpc, > > > zfp, sz) and new potential encodings is available here: > > > > > > https://drive.google.com/file/d/1wfLQyO2G5nofYFkS7pVbUW0-oJkQqBvv/view?usp=sharing > > > The report also covers lossy compression but the BYTE_STREAM_SPLIT > > > encoding only has the focus of lossless compression. > > > > > > > > > Can we have a vote? > > > > > > > > > Regards, > > > > > > Martin > > > > > > > > > -- Ryan Blue Software Engineer Netflix

Re: parquet-benchmarks cleanup

2019-08-29 Thread Ryan Blue
a JIRA and I'd like to do some > clean-up : > > https://issues.apache.org/jira/browse/PARQUET-1644 > > Do I need to be assigned the JIRA, or do I just create the PR? > > I'm on the ASF slack #parquet channel, don't hesitate to say hi! > > All my best, Ryan > -- Ryan Blue Software Engineer Netflix

[jira] [Commented] (PARQUET-722) Building with JDK 8 fails over a maven bug

2019-08-20 Thread Ryan Blue (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911797#comment-16911797 ] Ryan Blue commented on PARQUET-722: --- Looks like this was fixed when cascading3 support updated

[jira] [Comment Edited] (PARQUET-722) Building with JDK 8 fails over a maven bug

2019-08-20 Thread Ryan Blue (Jira)
[ https://issues.apache.org/jira/browse/PARQUET-722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16911797#comment-16911797 ] Ryan Blue edited comment on PARQUET-722 at 8/20/19 10:59 PM: - Looks like

Re: [VOTE] Parquet Bloom filter spec sign-off

2019-08-08 Thread Ryan Blue
Thanks for working on this, Jim! I merged the current PR. On Tue, Aug 6, 2019 at 8:39 AM Jim Apple wrote: > On 2019/08/05 18:05:53, Ryan Blue wrote: > > At least getting a compression union into the bloom filter header > > will help us with compatibility later if we choose to

Re: [VOTE] Parquet Bloom filter spec sign-off

2019-08-05 Thread Ryan Blue
choose to add compression schemes. It think it may also be worth the overhead of naive compression in some cases, though I didn't thoroughly read through that reference yet. On Sun, Aug 4, 2019 at 7:56 PM Jim Apple wrote: > On 2019/08/03 20:42:10, Ryan Blue wrote: > >- Should the bloo

Re: [VOTE] Parquet Bloom filter spec sign-off

2019-08-03 Thread Ryan Blue
been trying to read such files produced by Spark. More > > > comprehensive integration testing would help ensure that the libraries > > > remain compatible. > > > > > > On Tue, Jul 30, 2019 at 9:17 PM 俊杰陈 wrote: > > > > > > > &g

Re: [VOTE] Parquet Bloom filter spec sign-off

2019-07-31 Thread Ryan Blue
gt; We still need your vote! > > > > > > On Wed, Jul 24, 2019 at 9:30 PM 俊杰陈 wrote: > > > > > > Hi @Ryan Blue @Wes McKinney > > > > > > We need your valuable vote, any feedback is welcome as well. > > > > > > On

[jira] [Commented] (PARQUET-1434) Release parquet-mr 1.11.0

2019-07-23 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16891462#comment-16891462 ] Ryan Blue commented on PARQUET-1434: My concern is that it has not been reviewed well enough

Re: [Question] Change Column Type in Parquet File

2019-07-17 Thread Ryan Blue
mn type from int32 to int64 in file metadata > and > > column (chunk) metadata directly, can compressed data be read correctly? > If > > not, what's problem? > > > > Thank you so much for your time and we would be appreciated if you could > > reply. > > > > Best Regards, > > Ronnie > > > > > > > -- Ryan Blue Software Engineer Netflix

[jira] [Commented] (PARQUET-1488) UserDefinedPredicate throw NullPointerException

2019-07-12 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16884204#comment-16884204 ] Ryan Blue commented on PARQUET-1488: We discussed this on SPARK-28371. Previously, Parquet did

[jira] [Assigned] (PARQUET-1488) UserDefinedPredicate throw NullPointerException

2019-07-12 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Blue reassigned PARQUET-1488: -- Assignee: Yuming Wang (was: Gabor Szadovszky) > UserDefinedPredicate th

[jira] [Reopened] (PARQUET-1488) UserDefinedPredicate throw NullPointerException

2019-07-12 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Blue reopened PARQUET-1488: > UserDefinedPredicate throw NullPointerExcept

[jira] [Created] (PARQUET-1624) ParquetFileReader.open ignores Hadoop configuration options

2019-07-11 Thread Ryan Blue (JIRA)
Ryan Blue created PARQUET-1624: -- Summary: ParquetFileReader.open ignores Hadoop configuration options Key: PARQUET-1624 URL: https://issues.apache.org/jira/browse/PARQUET-1624 Project: Parquet

Re: [DISCUSS] Prepare release for parquet-format 2.7.0?

2019-06-28 Thread Ryan Blue
; > temporary > > > > agreement that the commit should make it into a future release. I > > > > understand you to be saying that in parquet-format, a vote on format > > > > additions is standard, whether or not a commit made it into HEAD. > > > > > > > > There have been previous discussions of Bloom filters in the pull > > > > requests, on this list, and in live videochat meetups (from quite a > > while > > > > ago). In your opinion, should we start a new discussion, or start a > > > [VOTE] > > > > thread with pointers to the old discussions, or some third option? > > > > > > > > > > > > > -- > > Thanks & Best Regards > > > -- Ryan Blue Software Engineer Netflix

Re: New PMC member: Gabor Szadovszky

2019-06-28 Thread Ryan Blue
> > > Hi, > > > > > > > > > > The Project Management Committee (PMC) for Apache Parquet has > invited > > > > Gabor > > > > > Szadovszky to become a member of the PMC and we are pleased to > > announce > > > > > that he has accepted. > > > > > > > > > > Congratulations, Gabor! > > > > > > > > > > Br, > > > > > > > > > > Zoltan > > > > > > > > > > > > > > > -- Ryan Blue Software Engineer Netflix

Re: [DISCUSS] Prepare release for parquet-format 2.7.0?

2019-06-27 Thread Ryan Blue
ppears to require a committer to do some > > prep work first. > > > > https://parquet.apache.org/documentation/how-to-release/ > > > > Any committer volunteers? > > > -- Ryan Blue Software Engineer Netflix

Re: [VOTE] Release Apache Parquet 1.11.0 RC6

2019-05-31 Thread Ryan Blue
I think we can add that one. On Fri, May 31, 2019 at 9:18 AM Michael Heuer wrote: > Might > > https://github.com/apache/parquet-mr/pull/560 > > be included in the next 1.11.0 release candidate? > >michael > > > On May 31, 2019, at 11:09 AM, Ryan Blue wrote: &g

Re: [VOTE] Release Apache Parquet 1.11.0 RC6

2019-05-31 Thread Ryan Blue
: TestSnappy() throws OOM exception with Parquet-1485 > change > > - PARQUET-1531: Page row count limit causes empty pages to be written > from > > MessageColumnIO > > - PARQUET-1544: Possible over-shading of modules > > > > The following change has been reverted so it is not

Re: [vote] Merge bloom-filter branch to master

2019-05-31 Thread Ryan Blue
timizations we should do, such as > > xxhash, folding bloom filters and etc., I think we can handle > optimization > > further on the master. > > > > Please help to vote on this. > > > > > > > > Thanks & Best Regards > -- Ryan Blue Software Engineer Netflix

Re: Plan to merge bloom filter branch

2019-05-30 Thread Ryan Blue
ote here. > Welcome to provide advise or vote. > > > Thanks & Best Regards > -- Ryan Blue Software Engineer Netflix

Re: Proper use of the ColumnChunk `file_path` attribute

2019-05-24 Thread Ryan Blue
you are not the intended recipient, please contact the > sender by > reply email and destroy all copies of the original message. > > ------- > -- Ryan Blue Software Engineer Netflix

Re: Parquet File Naming Convention Standards

2019-05-22 Thread Ryan Blue
’t know if there is a local FS that supports .crc files in C++. -- Ryan Blue Software Engineer Netflix

Re: Need 64-bit Integer length for Parquet ByteArray Type

2019-04-05 Thread Ryan Blue
th fields all over the > place. > > External file references to BLOBS is doable but not the elegant, > integrated solution I was hoping for. > > -Brian > > On Apr 5, 2019, at 1:53 PM, Ryan Blue wrote: > > *EXTERNAL* > Looks like we will need a new encoding for this:

Re: Need 64-bit Integer length for Parquet ByteArray Type

2019-04-05 Thread Ryan Blue
thrift.h > > inline void DeserializeThriftMsg(const uint8_t* buf, uint32_t* len, T* > deserialized_msg) { > inline int64_t SerializeThriftMsg(T* obj, uint32_t len, OutputStream* out) > > -Brian > > On 4/5/19, 1:32 PM, "Ryan Blue" wrote: > > EXTERNAL > >

Re: Need 64-bit Integer length for Parquet ByteArray Type

2019-04-05 Thread Ryan Blue
at > require file format versioning changes? > > I realize this a non-trivial ask. Thanks for considering it. > > -Brian > -- Ryan Blue Software Engineer Netflix

Re: Parquet-mr - ParquetFileReader IO and memory foot-print

2019-03-21 Thread Ryan Blue
uses the > > HeapByteBuffer and DirectByteBuffer as its ByteBuffer. In particular, > > neither of them support lazy evaluation. So when you read data into them, > > it actually reads the data right away. > > > > So, Is it possible to configure the ParquetFileReader to read pages in > the > > row-group lazily, and at each step read only the relevant pages for each > > column? > > > > Reagrds, > > Tomer Solomon > > > -- Ryan Blue Software Engineer Netflix

Re: Parquet 1.11.0 Release to Maven Central

2019-03-01 Thread Ryan Blue
timeline for the publication of 1.11.0 to Maven central > repository? > I see it was released on the repo 18 days ago. > > Is there any other maven repository that hosts the artifacts? > > Many thanks, > Masih -- Ryan Blue Software Engineer Netflix

Re: Overridden methods of dictionary

2019-02-26 Thread Ryan Blue
e case. > It also needs some changes in ValidTypeMap.java & > SchemaCompatibilityValidator.java for Filter predicate. > > Can parquet support this type upcasting feature? I came across such > scenario in one of my use case. > > Thanks, > Swapnil > -- Ryan Blue Software Engineer Netflix

[jira] [Commented] (PARQUET-1142) Avoid leaking Hadoop API to downstream libraries

2019-02-22 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16775448#comment-16775448 ] Ryan Blue commented on PARQUET-1142: The next steps for this are to get compression working without

Re: Clarification on CRC checksum field

2019-02-21 Thread Ryan Blue
at are actually already leveraging the CRC > field? And if not, should we have a discussion on refining the spec to > remove the ambiguity? > > Thank you, > Boudewijn > -- Ryan Blue Software Engineer Netflix

Re: Reverting the merge blocks command feature

2019-02-21 Thread Ryan Blue
rrent release > >> I think the most easier and painless solution is to revert it. Created > the > >> PR #620 for it. > >> > >> We usually don't do reverts especially for commits that are sitting in > >> master for a while. I would like to ask your opini

Re: [DISCUSS] Upgrade to Jackson 2.x and remove the shading

2019-02-19 Thread Ryan Blue
ted > jackson package in the code is kind of confusing. > > Qinghui > > Le lun. 18 févr. 2019 à 18:14, Ryan Blue a écrit : > >> Qinghui, >> >> Parquet source uses the unshaded dependencies, but those dependencies are >> rewritten in every module's build. Tha

[jira] [Resolved] (PARQUET-1281) Jackson dependency

2019-02-18 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Blue resolved PARQUET-1281. Resolution: Not A Problem > Jackson dependency > -- > >

Re: [DISCUSS] Upgrade to Jackson 2.x and remove the shading

2019-02-18 Thread Ryan Blue
Jackson. > > > > Spark 2.x is at 2.6, Spark 3.0 at 2.9.6, Hadoop at 2.9.x, Flink at 2.7.9, > > but that one is shaded anyway :-) One problem might be Apache Avro which > is > > still using Jackson 1.x (codehause), until we release Avro 1.9. > > > > What are the t

Re: Table.newAppend or newReplacePartitions, local testing with multiple threads facing issue.

2019-02-15 Thread Ryan Blue
>> snapshots.forEach(snapshot -> { >> logger.info("Thread_id: {}, after committing >> to table, " + >> "snapshot.addedFiles() : {} " , >> Thread.currentThread().getId(), snapshot.addedFiles()); >> >> snapshot.addedFiles().forEach(dataFile -> { >> logger.info("Thread_id: {}, >> after committing to table, snapshot.dataFile() : {} " , >> Thread.currentThread().getId(), dataFile); >> } >> ); >> }); >> } >> }); >> >> } >> >> -- > You received this message because you are subscribed to the Google Groups > "Iceberg Developers" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to iceberg-devel+unsubscr...@googlegroups.com. > To post to this group, send email to iceberg-de...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/iceberg-devel/d2b0e038-d62a-4022-9af4-21775441ac94%40googlegroups.com > <https://groups.google.com/d/msgid/iceberg-devel/d2b0e038-d62a-4022-9af4-21775441ac94%40googlegroups.com?utm_medium=email_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- Ryan Blue Software Engineer Netflix

Re: [VOTE] Release Apache Parquet 1.11.0 RC4

2019-02-14 Thread Ryan Blue
ll values. (#603) > > Please download, verify, and test. The vote will be open for at least 72 > hours. > > Thanks, > Gabor > -- Ryan Blue Software Engineer Netflix

Re: parquet using encoding other than UTF-8

2019-02-06 Thread Ryan Blue
Feb 6, 2019 at 12:10 PM Ryan Blue > wrote: > > > > I disagree with Wes. He's right that you *could* just use binary and keep > > extra metadata somewhere, it is very unlikely that Parquet would ever > > support such a scheme. And it is bad for the community when people &

Re: parquet using encoding other than UTF-8

2019-02-06 Thread Ryan Blue
encoding hint when saving ByteBuffer. > > > > > > > > I don't find way to use any thing other than UTF-8. > > > > https://github.com/apache/parquet-format/blob/master/LogicalTypes.md > > > says > > > > we can extend primitive types to solve cases. > > > > > > > > Other thing I want to mention is I am only the producer of parquet > file > > > but > > > > not consumer. > > > > > > > > Could you guide me which examples I can look into or which will be > right > > > way > > > > > > > > > > > > Regards > > > > Manik Singla > > > > +91-9996008893 > > > > +91-9665639677 > > > > > > > > "Life doesn't consist in holding good cards but playing those you > hold > > > > well." > > > > -- Ryan Blue Software Engineer Netflix

[jira] [Resolved] (PARQUET-1512) Release Parquet Java 1.10.1

2019-02-04 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Blue resolved PARQUET-1512. Resolution: Fixed > Release Parquet Java 1.1

[jira] [Assigned] (PARQUET-138) Parquet should allow a merge between required and optional schemas

2019-02-01 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Blue reassigned PARQUET-138: - Assignee: Nicolas Trinquier (was: Ryan Blue) > Parquet should allow a merge between requi

[jira] [Assigned] (PARQUET-138) Parquet should allow a merge between required and optional schemas

2019-02-01 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Blue reassigned PARQUET-138: - Assignee: Nicolas Trinquier (was: Nicolas Trinquier) > Parquet should allow a merge betw

[jira] [Assigned] (PARQUET-138) Parquet should allow a merge between required and optional schemas

2019-02-01 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Blue reassigned PARQUET-138: - Assignee: Ryan Blue > Parquet should allow a merge between required and optional sche

Re: [VOTE] Release Apache Parquet 1.10.1 RC0

2019-01-31 Thread Ryan Blue
n, Jan 28, 2019 at 2:08 PM Ryan Blue > wrote: > >> Hi everyone, >> >> I propose the following RC to be released as official Apache Parquet Java >> 1.10.1 release. >> >> The commit id is a89df8f9932b6ef6633d06069e50c9b7970bebd1 >> >>- This cor

Re: [VOTE] Release Apache Parquet 1.10.1 RC0

2019-01-31 Thread Ryan Blue
ks! > > For future, it would be good to include one as we have in Arrow that also > checks the signature. We have that in the main tree and the script also > downloads the source tarball. Then the script is simply in git and not part > of the release. > > Uwe > > > On Thu, Ja

[jira] [Commented] (PARQUET-1520) Update README to use correct build and version info

2019-01-31 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757679#comment-16757679 ] Ryan Blue commented on PARQUET-1520: Thanks for contributing! > Update README to use correct bu

[jira] [Assigned] (PARQUET-1520) Update README to use correct build and version info

2019-01-31 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Blue reassigned PARQUET-1520: -- Assignee: Dongjoon Hyun > Update README to use correct build and version i

[jira] [Resolved] (PARQUET-1520) Update README to use correct build and version info

2019-01-31 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Blue resolved PARQUET-1520. Resolution: Fixed Fix Version/s: 1.10.2 > Update README to use correct build and vers

Re: [VOTE] Release Apache Parquet 1.10.1 RC0

2019-01-31 Thread Ryan Blue
19 at 5:07 PM Dongjoon Hyun > wrote: > >> Sure! I'll make a PR for that. >> >> Bests, >> Dongjoon. >> >> On Wed, Jan 30, 2019 at 3:11 PM Ryan Blue wrote: >> >>> Looks like the README is out of date. I don't think we should fail this >>>

Re: [VOTE] Release Apache Parquet 1.10.1 RC0

2019-01-30 Thread Ryan Blue
E says `The current release is version 1.8.1` instead of > `1.10.1`. Is it worth to fix? > > Bests, > Dongjoon. > > > On Wed, Jan 30, 2019 at 10:45 AM Ryan Blue > wrote: > >> +1 (binding) >> >> Validated source signature, checksum. Ran uni

Re: [VOTE] Release Apache Parquet 1.10.1 RC0

2019-01-30 Thread Ryan Blue
rect > > based on the release tag. Unit tests pass. > > > > +1 (non-binding) > > > > Cheers, > > Gabor > > > > > > On Mon, Jan 28, 2019 at 11:08 PM Ryan Blue > > wrote: > > > > > Hi everyone, > > >

Re: [DISCUSS] Remove old modules?

2019-01-29 Thread Ryan Blue
ng parquet-hive-* is a great idea, the code in Parquet is not > > > maintained any more, it is just a burden there. > > > > > > As of parquet-pig, I'd prefer moving it to Pig (if Pig community > accepts > > it > > > as it is) instead of dropping it or moving to a separate project.

[VOTE] Release Apache Parquet 1.10.1 RC0

2019-01-28 Thread Ryan Blue
-1510: Dictionary filter bug skips null for notEq with dictionary of one value Please download, verify, and test. Please vote in the next 72 hours: [ ] +1 Release this as Apache Parquet Java 1.10.1 [ ] +0 [ ] -1 Do not release this because… -- Ryan Blue Software Engineer Netflix

[DISCUSS] Remove old modules?

2019-01-28 Thread Ryan Blue
, we’ve moved more to a model where processing frameworks and engines maintain their own integration. Spark, Presto, Iceberg, and Hive fall into this category. So I would prefer to drop Pig and Cascading3. I’m fine keeping thrift if people think it is useful. Thoughts? rb -- Ryan Blue Software

[jira] [Resolved] (PARQUET-1510) Dictionary filter skips null values when evaluating not-equals.

2019-01-28 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/PARQUET-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Blue resolved PARQUET-1510. Resolution: Fixed > Dictionary filter skips null values when evaluating not-equ

  1   2   3   4   5   6   7   8   >