Siva,

If your use case require Phoenix large table joins, you better redesign the
system: keep real-time data in HBase/Phoenix and have a separate analytical
system (Hadoop/Hive) with periodic batch updates from HBase. Its a data
duplication, but using ORCFile/Parquet with compression ON will
reduce your duplication significantly.

-Vlad

On Wed, Jun 3, 2015 at 10:45 AM, Siva <sbhavan...@gmail.com> wrote:

> I agree with you Anil!.
>
> On Tue, Jun 2, 2015 at 9:06 PM, anil gupta <anilgupt...@gmail.com> wrote:
>
>> Hi Siva/Jaime,
>>
>> In my opinion:
>> HBase is meant for quick key/value lookup or short range based scans and
>> Hive is meant for Analytical/Datawarehouse kind of workload. Full table
>> scan in HBase is not what HBase is known/popular for. Doing joins is not
>> really a sweet spot for HBase if you are doing full table scans.
>> If you are doing full table scan in HBase then you can also try running a
>> MapReduce job over HBase snapshot. Or You could just use Hive OLAP type
>> workload.
>>
>> Thanks,
>> Anil Gupta
>>
>> On Tue, Jun 2, 2015 at 4:43 PM, Siva <sbhavan...@gmail.com> wrote:
>>
>>> Hi Jaime,
>>>
>>> When we ran queries with complex joins (which involves ~10 tables) on
>>> Phoenix on the tables which has large data, initially we have seen a lot of
>>> issues, queries failed with errors. We started to tune both hbase and
>>> phoenix, now few queries are running fine, but queries with larger data set
>>> still have same issues. Still working on tuning them. The reason for
>>> failures could be because of small cluster, limited by memory and IO.
>>>
>>> On the other hand, same quires with same data size on Hive 14 (with Tez
>>> + ORC format + SNAPPY compression) were finished with in 70~100 seconds. It
>>> would be good if Phoenix can publish the performance results on join
>>> queries.
>>>
>>> Thanks,
>>> Siva.
>>>
>>> On Tue, Jun 2, 2015 at 1:47 PM, Jaime Solano <jdjsol...@gmail.com>
>>> wrote:
>>>
>>>> Hi guys,
>>>>
>>>> Are there benchmarks or numbers showing how Phoenix performs during the
>>>> join of two or more huge tables? I'm not familiar with the join
>>>> implementation, so I'm not sure if there's a limitation regarding number of
>>>> regions, memory, disk, etc.
>>>>
>>>> Any thoughts?
>>>>
>>>> Thanks,
>>>> -Jaime
>>>>
>>>
>>>
>>
>>
>> --
>> Thanks & Regards,
>> Anil Gupta
>>
>
>

Reply via email to