Re: Which [open-souce] SQL engine atop Hadoop?

Alexander Pivovarov Mon, 02 Feb 2015 19:09:15 -0800

Apache Phoenix is super fast for queries which filters data by table key,
- sub-second latency
- has good jdbc driver


but has limitations
- no full outer join support
- inner and left outer join use one computer memory, so it can not join
huge table to huge table


On Mon, Feb 2, 2015 at 6:59 PM, Alexander Pivovarov <apivova...@gmail.com>
wrote:

> I like Tez engine for hive (aka Stinger initiative)
>
> - faster than MR engine. especially for complex queries with lots of
> nested sub-queries
> - stable
> - min latency is 5-7 sec  (0 sec for select count(*) ...)
> - capable to process huge datasets (not limited by RAM as Spark)
>
>
> On Mon, Feb 2, 2015 at 6:00 PM, Samuel Marks <samuelma...@gmail.com>
> wrote:
>
>> Maybe you're right, and what I should be doing is throwing in connectors
>> so that data from regular databases is pushed into HDFS at regular
>> intervals, wherein my "fancier" analytics can be run across larger
>> data-sets.
>>
>> However, I don't want to decide straightaway, for example, Phoenix +
>> Spark may be just the combination I am looking for.
>>
>> Best,
>>
>>
>> Samuel Marks
>> http://linkedin.com/in/samuelmarks
>>
>> On Mon, Feb 2, 2015 at 5:14 PM, Jörn Franke <jornfra...@gmail.com> wrote:
>>
>>> Hallo,
>>>
>>> I think you have to think first about your functional and non-functional
>>> requirements. You can scale "normal" SQL databases as well (cf CERN or
>>> Facebook). There are different types of databases for different purposes -
>>> there is no one fits it all. At the moment, we are a few years away from a
>>> one-fits-it-all database that leverages AI etc to automatically scale,
>>> optimize etc processing, storage and network.  Until then you will have to
>>> do the math depending on your requirements.
>>> Once you make them more precise, we will able to help you more.
>>>
>>> Cheers
>>> Le 2 févr. 2015 06:08, "Samuel Marks" <samuelma...@gmail.com> a écrit :
>>>
>>> Well what I am seeking is a Big Data database that can work with Small
>>> Data also. I.e.: scaleable from one node to vast clusters; whilst
>>> maintaining relatively low latency throughout.
>>>
>>> Which fit into this category?
>>>
>>> Samuel Marks
>>> http://linkedin.com/in/samuelmarks
>>>
>>>
>>
>

Re: Which [open-souce] SQL engine atop Hadoop?

Reply via email to