when parquet came out it was developed by a community of companies, and was designed as a library to be supported by multiple big data projects. nice
orc on the other hand initially only supported hive. it wasn't even designed as a library that can be re-used. even today it brings in the kitchen sink of transitive dependencies. yikes On Jul 26, 2016 5:09 AM, "Jörn Franke" <jornfra...@gmail.com> wrote: > I think both are very similar, but with slightly different goals. While > they work transparently for each Hadoop application you need to enable > specific support in the application for predicate push down. > In the end you have to check which application you are using and do some > tests (with correct predicate push down configuration). Keep in mind that > both formats work best if they are sorted on filter columns (which is your > responsibility) and if their optimatizations are correctly configured (min > max index, bloom filter, compression etc) . > > If you need to ingest sensor data you may want to store it first in hbase > and then batch process it in large files in Orc or parquet format. > > On 26 Jul 2016, at 04:09, janardhan shetty <janardhan...@gmail.com> wrote: > > Just wondering advantages and disadvantages to convert data into ORC or > Parquet. > > In the documentation of Spark there are numerous examples of Parquet > format. > > Any strong reasons to chose Parquet over ORC file format ? > > Also : current data compression is bzip2 > > > http://stackoverflow.com/questions/32373460/parquet-vs-orc-vs-orc-with-snappy > This seems like biased. > >