I have very limited knowledge on Parquet, so I can only answer from HBase point of view.
Please see recent thread on number of columns in a row in HBase: http://search-hadoop.com/m/YGbb3NN3v1jeL1f There're a few Spark hbase connectors. See this thread: http://search-hadoop.com/m/q3RTt4cp9Z4p37s Sorry I cannot answer performance comparison question. Cheers On Thu, Jan 21, 2016 at 2:43 PM, Krishna <[email protected]> wrote: > We are evaluating Parquet and HBase for storing a dense & very, very wide > matrix (can have more than 600K columns). > > I've following questions: > > - Is there is a limit on # of columns in Parquet or HFile? We expect to > query [10-100] columns at a time using Spark - what are the performance > implications in this scenario? > - HBase can support millions of columns - anyone with prior experience > that compares Parquet vs HFile performance for wide structured tables? > - We want a schema-less solution since the matrix can get wider over a > period of time > - Is there a way to generate wide structured schema-less Parquet files > using map-reduce (input files are in custom binary format)? > > What other solutions other than Parquet & HBase are useful for this > use-case? >
