Re: How many columns limitation for parquet files?

2018-11-01 Thread Eric Owhadi
I don't know about a fixed max, but I do know that writing to parquet files 
with lots of columns is very challenging from java memory consumption 
standpoint and forces you to downsize rowgroup size, with cascading negative 
effect on read speed.
Hope this helps
Eric



Sent from my Samsung Galaxy smartphone.


 Original message 
From: big data 
Date: 11/1/18 5:21 AM (GMT-06:00)
To: dev@parquet.apache.org
Subject: How many columns limitation for parquet files?

External

Hi,

Anybody know how many MAX columns for parquet files?

And Do huge columns affect query performance?

Thanks.



When the next release with page index release?

2018-11-01 Thread Tao JiaTao

Hi

Recently I’m reading Parquet’s page index in branch master, and it seems good. 
And I notice it has been a while since the previous release, so I’m wondering 
when the next release?

--
Regards!
Aron Tao



How many columns limitation for parquet files?

2018-11-01 Thread big data
Hi,

Anybody know how many MAX columns for parquet files?

And Do huge columns affect query performance?

Thanks.



Parquet-cpp schema does not conform to the Thrift definition

2018-11-01 Thread Ivan Sadikov
Hi all,

When debugging files written by parquet-cpp, I found that the library does
not set num_children field to None for PrimitiveType, when serialising to
Thrift, this causes all fields to have Some(0), but the definition says
that num_chilldren should be set only for GroupType.

Another odd behaviour that I found  is setting or keeping repetition level
for message type and not having the corresponding check on that when
reading the schema from Thrift. In the example file, the root type has
REQUIRED repetition, but it should not be set at all according to the
Thrift definition.

I looked at the code and it seems like those inconsistencies still exist in
the current master branch. Example file and discussion are here:
https://github.com/sunchao/parquet-rs/issues/178.

It may not be an issue at all, I would appreciate any suggestions and
feedback. Thanks!

It looks like our parsing is not as robust as parquet-cpp or parquet-mr, I
am going to update that as well.


Kind regards,

Ivan


[jira] [Commented] (PARQUET-1454) ld-linux-x86-64.so.2 is missing

2018-11-01 Thread Stets Alexander (JIRA)


[ 
https://issues.apache.org/jira/browse/PARQUET-1454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16671155#comment-16671155
 ] 

Stets Alexander commented on PARQUET-1454:
--

I use openjdk:8-jre-alpine in docker container. 

I have added libc6-compat dependency to dockerfile to resolve this exception.

> ld-linux-x86-64.so.2 is missing
> ---
>
> Key: PARQUET-1454
> URL: https://issues.apache.org/jira/browse/PARQUET-1454
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-avro
>Affects Versions: 1.10.0
>Reporter: Stets Alexander
>Priority: Minor
>  Labels: documentation
>
> parquet-avro uses dependensy org.xerial.snappy:snappy-java .
> snappy-java need extract native lib. For this goal it uses 
> ld-linux-x86-64.so.2.
> If your OS doesn't contain ld-linux-x86-64.so.2 you catch exception like this
> java.lang.UnsatisfiedLinkError: 
> /tmp/snappy-1.1.2-b0bbcae9-e398-4a99-ad6d-19c86734be76-libsnappyjava.so: 
> Error loading shared library ld-linux-x86-64.so.2: No such file or directory 
> (needed by 
> /tmp/snappy-1.1.2-b0bbcae9-e398-4a99-ad6d-19c86734be76-libsnappyjava.so)
> But documentation doesn't contain information about it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)