Re: Format dillema

Furcy Pin Tue, 20 Jun 2017 01:01:07 -0700

Another option would be to try Facebook's Presto https://prestodb.io/

Like Impala, Presto is designed for fast interactive querying over Hive
tables, but it is also capable of querying data from many other SQL sources
(mySQL, postgreSQL, Kafka, Cassandra, ...
https://prestodb.io/docs/current/connector.html)

In terms of performances on small queries, it seems to be as fast as
Impala, a league over Spark-SQL, and of course two leagues over Hive.

Unlike Impala, Presto is also able to read ORC file format, and make the
most of it (e.g. read pre-aggregated values from ORC headers).

It can also make use of Hive's bucketing feature, while Impala still cannot:
https://github.com/prestodb/presto/issues/6666
https://issues.apache.org/jira/browse/IMPALA-3118

Regards,

Furcy

On Tue, Jun 20, 2017 at 5:36 AM, Sruthi Kumar Annamneedu <
sruthikumar...@gmail.com> wrote:

> Try using Parquet with Snappy compression and Impala will work with this
> combination.
>
> On Sun, Jun 18, 2017 at 3:35 AM, rakesh sharma <rakeshsharm...@hotmail.com
> > wrote:
>
>> We are facing an issue of format. We would like to do bi style queries
>> from hive using impala and that supports parquet but we would like the data
>> to be compressed to the best ratio like orc. But impala cannot query orc
>> formats. What can be a design consideration for this. Any help
>>
>> Thanks
>> Rakesh
>>
>> Get Outlook for Android <https://aka.ms/ghei36>
>>
>>
>

Re: Format dillema

Reply via email to