Re: HFile vs Parquet for very wide table

Ted Yu Thu, 21 Jan 2016 15:26:53 -0800

I have very limited knowledge on Parquet, so I can only answer from HBase
point of view.


Please see recent thread on number of columns in a row in HBase:

http://search-hadoop.com/m/YGbb3NN3v1jeL1f

There're a few Spark hbase connectors.
See this thread:

http://search-hadoop.com/m/q3RTt4cp9Z4p37s

Sorry I cannot answer performance comparison question.

Cheers

On Thu, Jan 21, 2016 at 2:43 PM, Krishna <[email protected]> wrote:

> We are evaluating Parquet and HBase for storing a dense & very, very wide
> matrix (can have more than 600K columns).
>
> I've following questions:
>
>    - Is there is a limit on # of columns in Parquet or HFile? We expect to
>    query [10-100] columns at a time using Spark - what are the performance
>    implications in this scenario?
>    - HBase can support millions of columns - anyone with prior experience
>    that compares Parquet vs HFile performance for wide structured tables?
>    - We want a schema-less solution since the matrix can get wider over a
>    period of time
>    - Is there a way to generate wide structured schema-less Parquet files
>    using map-reduce (input files are in custom binary format)?
>
> What other solutions other than Parquet & HBase are useful for this
> use-case?
>

Re: HFile vs Parquet for very wide table

Reply via email to