Nice article about Parquet *with* Avro :

   - https://dzone.com/articles/understanding-how-parquet
   - http://zenfractal.com/2013/08/21/a-powerful-big-data-trio/

Nice video from the good folks of Cloudera for the *differences* between
"Avrow" and Parquet

   - https://www.youtube.com/watch?v=AY1dEfyFeHc


2016-03-04 7:12 GMT+01:00 Koert Kuipers <ko...@tresata.com>:

> well can you use orc without bringing in the kitchen sink of dependencies
> also known as hive?
>
> On Thu, Mar 3, 2016 at 11:48 PM, Jong Wook Kim <ilike...@gmail.com> wrote:
>
>> How about ORC? I have experimented briefly with Parquet and ORC, and I
>> liked the fact that ORC has its schema within the file, which makes it
>> handy to work with any other tools.
>>
>> Jong Wook
>>
>> On 3 March 2016 at 23:29, Don Drake <dondr...@gmail.com> wrote:
>>
>>> My tests show Parquet has better performance than Avro in just about
>>> every test.  It really shines when you are querying a subset of columns in
>>> a wide table.
>>>
>>> -Don
>>>
>>> On Wed, Mar 2, 2016 at 3:49 PM, Timothy Spann <tim.sp...@airisdata.com>
>>> wrote:
>>>
>>>> Which format is the best format for SparkSQL adhoc queries and general
>>>> data storage?
>>>>
>>>> There are lots of specialized cases, but generally accessing some but
>>>> not all the available columns with a reasonable subset of the data.
>>>>
>>>> I am learning towards Parquet as it has great support in Spark.
>>>>
>>>> I also have to consider any file on HDFS may be accessed from other
>>>> tools like Hive, Impala, HAWQ.
>>>>
>>>> Suggestions?
>>>> —
>>>> airis.DATA
>>>> Timothy Spann, Senior Solutions Architect
>>>> C: 609-250-5894
>>>> http://airisdata.com/
>>>> http://meetup.com/nj-datascience
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Donald Drake
>>> Drake Consulting
>>> http://www.drakeconsulting.com/
>>> https://twitter.com/dondrake <http://www.MailLaunder.com/>
>>> 800-733-2143
>>>
>>
>>
>


-- 

Paul Leclercq | Data engineer


 paul.lecle...@tabmo.io  |  http://www.tabmo.fr/

Reply via email to