Hi,
I was wondering if anyone had any concerns or things they wanted to discuss
about this proposed patch. Would you like some benchmarking results? I'm
currently running the whole TPCDS quite in Trino, where I'm comparing with and
without this patch.
Also, are there any bugs in the JIRA that
Hi,
I had meant to just discuss my PR on the mailing list, but the mailing list
software evidently detected that the email was associated with my JIRA entry
and posted my email as a comment. I don't want to spam the JIRA. I'd delete the
comment, but I'm not sure that I can.
Sorry about
Can you provide more information about how you're writing the files?
Also, does this help?
https://stackoverflow.com/questions/41700231/spark-parquet-statisticsmin-max-integration
On 4/14/22, 8:42 PM, "p_agar...@yahoo.com.INVALID"
wrote:
CAUTION: This email originated from outside of the
Hi,
This reminds me of some similar problems I've seen in the bug tracker. I
suggest creating a JIRA ticket, with some instructions, and attaching a parquet
file for others to look at. Also include how you did the writing. If you're
linking ParquetMR to your own code, please include minimal
Also, using the API is a pain, because you have to use Hadoop. Various people
have found work-arounds for this, such as:
Comments on: https://issues.apache.org/jira/browse/PARQUET-1822
I also assembled a minimal reader myself (from code I found elsewhere on
github, which I should add
not exist and that is not the purpose of
parquet-mr?
Thanks
On Tue, Apr 26, 2022 at 9:37 PM Miller, Tim
wrote:
>
> Also, using the API is a pain, because you have to use Hadoop. Various
people have found work-arounds for this, such as:
> Comments on: https://issues.apa
You might also consider looking for fallback options. For instance, in
https://github.com/apache/parquet-mr/pull/957, I figured out a good spot to
catch the exception and then fall-back to a converted schema.
On 5/29/22, 1:53 PM, "Micah Kornfield" wrote:
CAUTION: This email originated
In my own profiling of ParquetMR (as it is used by Trino), I have also found
these bit-packing methods to be a performance bottleneck. Of the existing ones,
the ones that take an array are faster than the one that take a ByteBuffer. It
sure would be nice to have even faster ones!
From: "Xie,
I just wanted to bounce an idea off of everyone. One thing I notice is that
there are certain bugs that show up when using the parquet-cli that don't show
up when using it as an SDK in a Java program, even when reading the same files.
There appears to be some duplicated code between the CLI and
Hi, everyone,
I've been working on adding some performance improvements to ParquetMR. During
the last sync meeting, I was asked to write up a design doc that describes my
plans for PRs related to this. Please feel free to email me or add comments to
the google doc.
Is this something anyone can join? How?
Thanks.
On 4/27/22, 11:19 AM, "Xinli shang" wrote:
CAUTION: This email originated from outside of the organization. Do not
click links or open attachments unless you can confirm the sender and know the
content is safe.
Hi all,
Sorry
[
https://issues.apache.org/jira/browse/PARQUET-2135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17517681#comment-17517681
]
Miller, Tim commented on PARQUET-2135:
--
Hi,
I was wondering if anyone had any concerns or things
12 matches
Mail list logo