Siva, If your use case require Phoenix large table joins, you better redesign the system: keep real-time data in HBase/Phoenix and have a separate analytical system (Hadoop/Hive) with periodic batch updates from HBase. Its a data duplication, but using ORCFile/Parquet with compression ON will reduce your duplication significantly.
-Vlad On Wed, Jun 3, 2015 at 10:45 AM, Siva <sbhavan...@gmail.com> wrote: > I agree with you Anil!. > > On Tue, Jun 2, 2015 at 9:06 PM, anil gupta <anilgupt...@gmail.com> wrote: > >> Hi Siva/Jaime, >> >> In my opinion: >> HBase is meant for quick key/value lookup or short range based scans and >> Hive is meant for Analytical/Datawarehouse kind of workload. Full table >> scan in HBase is not what HBase is known/popular for. Doing joins is not >> really a sweet spot for HBase if you are doing full table scans. >> If you are doing full table scan in HBase then you can also try running a >> MapReduce job over HBase snapshot. Or You could just use Hive OLAP type >> workload. >> >> Thanks, >> Anil Gupta >> >> On Tue, Jun 2, 2015 at 4:43 PM, Siva <sbhavan...@gmail.com> wrote: >> >>> Hi Jaime, >>> >>> When we ran queries with complex joins (which involves ~10 tables) on >>> Phoenix on the tables which has large data, initially we have seen a lot of >>> issues, queries failed with errors. We started to tune both hbase and >>> phoenix, now few queries are running fine, but queries with larger data set >>> still have same issues. Still working on tuning them. The reason for >>> failures could be because of small cluster, limited by memory and IO. >>> >>> On the other hand, same quires with same data size on Hive 14 (with Tez >>> + ORC format + SNAPPY compression) were finished with in 70~100 seconds. It >>> would be good if Phoenix can publish the performance results on join >>> queries. >>> >>> Thanks, >>> Siva. >>> >>> On Tue, Jun 2, 2015 at 1:47 PM, Jaime Solano <jdjsol...@gmail.com> >>> wrote: >>> >>>> Hi guys, >>>> >>>> Are there benchmarks or numbers showing how Phoenix performs during the >>>> join of two or more huge tables? I'm not familiar with the join >>>> implementation, so I'm not sure if there's a limitation regarding number of >>>> regions, memory, disk, etc. >>>> >>>> Any thoughts? >>>> >>>> Thanks, >>>> -Jaime >>>> >>> >>> >> >> >> -- >> Thanks & Regards, >> Anil Gupta >> > >