My tests show Parquet has better performance than Avro in just about every test. It really shines when you are querying a subset of columns in a wide table.
-Don On Wed, Mar 2, 2016 at 3:49 PM, Timothy Spann <tim.sp...@airisdata.com> wrote: > Which format is the best format for SparkSQL adhoc queries and general > data storage? > > There are lots of specialized cases, but generally accessing some but not > all the available columns with a reasonable subset of the data. > > I am learning towards Parquet as it has great support in Spark. > > I also have to consider any file on HDFS may be accessed from other tools > like Hive, Impala, HAWQ. > > Suggestions? > — > airis.DATA > Timothy Spann, Senior Solutions Architect > C: 609-250-5894 > http://airisdata.com/ > http://meetup.com/nj-datascience > > > -- Donald Drake Drake Consulting http://www.drakeconsulting.com/ https://twitter.com/dondrake <http://www.MailLaunder.com/> 800-733-2143