Any chance we can have these on either a different day or time? The Drill
hangout is every Tuesday at 10am so I always have to pick one or the other.
On Tue, Jul 21, 2015 at 10:56 AM, Nezih Yigitbasi
nyigitb...@netflix.com.invalid wrote:
An update to actions, I will create a PR for the
on a Monday.
Anybody objects?
Julien
On Jul 21, 2015, at 17:37, Jacques Nadeau jacq...@apache.org
wrote:
Any chance we can have these on either a different day or time?
The
Drill
hangout is every Tuesday at 10am so I always have to pick one or
the
other
I think we should start a design discussion around this. I think there
were early ideas by some of the initial authors. However, I don't think it
has been designed.
On Jul 9, 2015 9:16 AM, Patrick Woody patrick.woo...@gmail.com wrote:
Just wanted to follow up here. Is there any information on
By Wednesday, you mean the day after tomorrow, right? :)
On Mon, Aug 31, 2015 at 10:29 PM, Julien Le Dem wrote:
> Wed, September 2, 10:00 AM PDT
> https://plus.google.com/hangouts/_/event/cob1rrt1spt1f15qbsfeqv51cmc
>
> --
> Julien
>
>
>
> --
> Julien
>
A non-binding +1 from me on releasing sooner/more often.
On Thu, Sep 8, 2016 at 5:44 PM, Ryan Blue wrote:
> Hey everyone,
>
> I'd like to put together a release candidate for 1.9.0. The other issues
> are done, but the sort order min/max issue, PARQUET-686 is still
> > > >
> > > > >> So I’m cool with making necessary changes to get this in sooner
> > rather
> > > > >> than later, I’ve mostly been blocking on code reviews. If there’s
> a
> > > > >> commitment made to releasing 1.9.1
Thanks for sharing these Ryan. Definitely intriguing.
On Wed, Sep 27, 2017 at 5:38 PM, Ryan Blue
wrote:
> For anyone that would also like to test the compression codecs, I’ve
> uploaded a copy of parquet-cli that can read and write zstd, lz4, and
> brotli to my Apache
One of our community members hit an issue where we couldn't parse a Parquet
footer. It looks like the file is missing the Codec field for a column but
the Parquet Thrift spec expects one.
https://community.dremio.com/t/unable-to-read-parquet-footer-with-file-generated-with-turbodbc/474/9
Was
r readers can't read the files or metadata because of
> how Thrift handles enums.
>
> rb
>
> On Mon, Nov 20, 2017 at 8:34 AM, Jacques Nadeau <jacq...@apache.org>
> wrote:
>
> > One of our community members hit an issue where we couldn't parse a
> Parquet
> >
Sounds super interesting. Would love to collaborate on this. Do you have a
repo or mailing list where you are working on this?
On Wed, Dec 6, 2017 at 4:20 PM, Ryan Blue wrote:
> Hi everyone,
>
> I mentioned in the sync-up this morning that I’d send out an
+1 (non-binding)
On Tue, Mar 6, 2018 at 12:31 PM, Uwe L. Korn wrote:
> +1
>
> On Tue, Mar 6, 2018, at 9:29 PM, Ryan Blue wrote:
> > +1
> >
> > Thanks for starting a vote, Wes!
> >
> > On Tue, Mar 6, 2018 at 12:24 PM, Wes McKinney
> wrote:
> >
> > > Dear
I haven't looked at the usage but would wonder if the core modules truly
need jackson. I don't think most of the systems that read Parquet use the
jackson part (?). If so, maybe the code could be refactored to remove the
dependency and it be moved to an optional component. We want to do the same
The big win in v2 pages (if I remember correctly) is that the variable
length encoding is no longer interleaved. That would provide a big
performance lift when pulling into arrow vectors (and variable length
decoding typically dominates total read processing time, on average I've
seen 5-10x per
t; v1 vs v2?
>
> Thanks,
> Micah
>
>
> [1]
> https://github.com/apache/parquet-format/blob/master/Encodings.md#delta-length-byte-array-delta_length_byte_array--6
> [2]
> https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L407
>
> On Fr
Gabor seems to agree that delta is V2 only.
To summarize, no delta encodings are used for V1 pages. They are available
> for V2 only.
https://www.mail-archive.com/dev@parquet.apache.org/msg11826.html
On Fri, Oct 9, 2020 at 5:06 PM Jacques Nadeau wrote:
> Good point. I had me
Hey Jason,
I'd suggest you look at Apache Iceberg. It is a much more mature way of
handling metadata efficiency issues and provides a substantial superset of
functionality over the old metadata cache files.
On Wed, Sep 23, 2020 at 4:16 PM Jason Altekruse
wrote:
> Hello again,
>
> I took a look
I'd suggest a new write pattern. Write the columns page at a time to
separate files then use a second process to concatenate the columns and
append the footer. Odds are you would do better than os swapping and take
memory requirements down to page size times field count.
In s3 I believe you could
ected to be at least 5mb if I read their docs correctly
>> [1])
>>
>> [1] https://docs.aws.amazon.com/AmazonS3/latest/dev/qfacts.html
>>
>>
>> On Saturday, July 11, 2020, Jacques Nadeau wrote:
>>
>> > I'd suggest a new write pattern. Write the columns page
There is some ambiguity in the discussion and proposals here around
deprecating future writing versus supporting reading of already written
data and what it means to deprecate something in the format specification.
I think it would be a mistake for someone who has written Hadoop-Lz4 for
several
I can take your comment two ways: what is the downside to large pages or
what is the downside to small row groups.
One of the key considerations I've dealt with is that page is the unit of
compression and if I recall correctly, parquet uses block rather than
stream compression. This means you
[
https://issues.apache.org/jira/browse/PARQUET-369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059056#comment-15059056
]
Jacques Nadeau commented on PARQUET-369:
+1 for Ryan's suggestion. Not sure how many Java users
Jacques Nadeau created PARQUET-1028:
---
Summary: [JAVA] When reading old Spark-generated files with INT96,
stats are reported as valid when they aren't
Key: PARQUET-1028
URL: https://issues.apache.org/jira
[
https://issues.apache.org/jira/browse/PARQUET-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16239319#comment-16239319
]
Jacques Nadeau commented on PARQUET-1154:
-
As an aside, it would be really nice
Jacques Nadeau created PARQUET-1475:
---
Summary: DirectCodecFactory's ParquetCompressionCodecException
drops a passed in cause in one constructor
Key: PARQUET-1475
URL: https://issues.apache.org/jira/browse
[
https://issues.apache.org/jira/browse/PARQUET-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014777#comment-17014777
]
Jacques Nadeau commented on PARQUET-1698:
-
In our internal work we actually separate this out
25 matches
Mail list logo