Re: How many columns limitation for parquet files?
I don't know about a fixed max, but I do know that writing to parquet files with lots of columns is very challenging from java memory consumption standpoint and forces you to downsize rowgroup size, with cascading negative effect on read speed. Hope this helps Eric Sent from my Samsung Galaxy smartphone. Original message From: big data Date: 11/1/18 5:21 AM (GMT-06:00) To: dev@parquet.apache.org Subject: How many columns limitation for parquet files? External Hi, Anybody know how many MAX columns for parquet files? And Do huge columns affect query performance? Thanks.
When the next release with page index release?
Hi Recently I’m reading Parquet’s page index in branch master, and it seems good. And I notice it has been a while since the previous release, so I’m wondering when the next release? -- Regards! Aron Tao
How many columns limitation for parquet files?
Hi, Anybody know how many MAX columns for parquet files? And Do huge columns affect query performance? Thanks.
Parquet-cpp schema does not conform to the Thrift definition
Hi all, When debugging files written by parquet-cpp, I found that the library does not set num_children field to None for PrimitiveType, when serialising to Thrift, this causes all fields to have Some(0), but the definition says that num_chilldren should be set only for GroupType. Another odd behaviour that I found is setting or keeping repetition level for message type and not having the corresponding check on that when reading the schema from Thrift. In the example file, the root type has REQUIRED repetition, but it should not be set at all according to the Thrift definition. I looked at the code and it seems like those inconsistencies still exist in the current master branch. Example file and discussion are here: https://github.com/sunchao/parquet-rs/issues/178. It may not be an issue at all, I would appreciate any suggestions and feedback. Thanks! It looks like our parsing is not as robust as parquet-cpp or parquet-mr, I am going to update that as well. Kind regards, Ivan
[jira] [Commented] (PARQUET-1454) ld-linux-x86-64.so.2 is missing
[ https://issues.apache.org/jira/browse/PARQUET-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16671155#comment-16671155 ] Stets Alexander commented on PARQUET-1454: -- I use openjdk:8-jre-alpine in docker container. I have added libc6-compat dependency to dockerfile to resolve this exception. > ld-linux-x86-64.so.2 is missing > --- > > Key: PARQUET-1454 > URL: https://issues.apache.org/jira/browse/PARQUET-1454 > Project: Parquet > Issue Type: Bug > Components: parquet-avro >Affects Versions: 1.10.0 >Reporter: Stets Alexander >Priority: Minor > Labels: documentation > > parquet-avro uses dependensy org.xerial.snappy:snappy-java . > snappy-java need extract native lib. For this goal it uses > ld-linux-x86-64.so.2. > If your OS doesn't contain ld-linux-x86-64.so.2 you catch exception like this > java.lang.UnsatisfiedLinkError: > /tmp/snappy-1.1.2-b0bbcae9-e398-4a99-ad6d-19c86734be76-libsnappyjava.so: > Error loading shared library ld-linux-x86-64.so.2: No such file or directory > (needed by > /tmp/snappy-1.1.2-b0bbcae9-e398-4a99-ad6d-19c86734be76-libsnappyjava.so) > But documentation doesn't contain information about it. -- This message was sent by Atlassian JIRA (v7.6.3#76005)