Re: Large tables in phoenix, issues with relational queries

2018-02-21 Thread James Taylor
Hi Aman,
Will all of your 210 relational tables only have a few millions rows? If
so, have you tried just using something like MySQL? What led you toward a
distributed solution?

When going from a single node RDBMS system to Phoenix, you typically
wouldn't use the schemas directly, but there'd be some amount of
denormalization. Have you seen our Tuning Guide [1]? You'll likely want to
determine the best row key design and minimum number of secondary indexes
to satisfy your most common questions.

More specifically with joins [2], you have to be careful as the Phoenix
optimizer will attempt to do join ordering or figure out the best join
strategy (note there's on going work to improve this with PHOENIX-1556).
Instead, you'll need to you'll need to make sure to list your tables from
largest to smallest (the size after filtering). Also, Phoenix has two join
strategies - hash join and sort merge join. By default, Phoenix will
perform a hash join, but you can use the /*+ USE_SORT_MERGE_JOIN */ hint to
force a sort merge join. The sort merge join will be better if the tables
are already ordered by their join key. If your Report Framework use case is
doing many joins, you'd likely want to add secondary indexes that ensure
that one or both sides are ordered according how you're joining the tables.

Sorry for only providing very general information, but without more
specifics, it's difficult to provide more specific guidance.

Thanks,
James

[1] http://phoenix.apache.org/tuning_guide.html
[2] http://phoenix.apache.org/joins.html

On Mon, Feb 19, 2018 at 12:11 AM, Aman Kumar Jha  wrote:

> Phoenix Team,
>
>
>
> We are using Apache Phoenix on our Reporting Framework that we are
> building , and are facing a lot of challenges with it.  (majorly
> performance challenges). We are severely constrained on Apache Phoenix
> knowledge and would love your help to find someone who can help us get off
> the ground here.
>
>
>
> Our use case is, about 210 relational tables (a few million row in many of
> these tables) are present inside our DB and our reporting framework sits on
> top of the same. Due to many relational tables, the reports mostly result
> in large queries, with multiple joins (mostly left outer). This we think is
> the root cause of most of our problems. A lot of internet searches, get us
> the basics back easily, but we are not getting anything deeper, so that we
> can tune this further.
>
>
>
> At this point, we are really thinking, if Phoenix is the correct choice of
>  technology for the above use case.
>
>
>
> As mentioned earlier, we need help with finding someone who can help us
> move ahead.
>
>
>
> Thanks a lot for your time.
>
>
>
> Regards,
>
> Aman Kumar Jha
>
>
> This email communication (including any attachments) contains confidential
> information and is intended only for the named recipients. If you are not
> the intended recipient, please delete this email communication (including
> any attachments) and hard copies immediately, Any unauthorized use or
> dissemination of this email communication (including any attachments) in
> any manner, is strictly prohibited. This email communication (including any
> attachments), may not be free of viruses, you should carry out your own
> virus checks before opening any attachment to this e-mail. The sender of
> this e-mail and the company shall not be liable for any damage that you may
> sustain as a result of viruses, incompleteness of this message,
> interception of this message, which may arise as a result of e-mail
> transmission.
>


Large tables in phoenix, issues with relational queries

2018-02-19 Thread Aman Kumar Jha
Phoenix Team,



We are using Apache Phoenix on our Reporting Framework that we are building , 
and are facing a lot of challenges with it.  (majorly performance challenges). 
We are severely constrained on Apache Phoenix knowledge and would love your 
help to find someone who can help us get off the ground here.



Our use case is, about 210 relational tables (a few million row in many of 
these tables) are present inside our DB and our reporting framework sits on top 
of the same. Due to many relational tables, the reports mostly result in large 
queries, with multiple joins (mostly left outer). This we think is the root 
cause of most of our problems. A lot of internet searches, get us the basics 
back easily, but we are not getting anything deeper, so that we can tune this 
further.



At this point, we are really thinking, if Phoenix is the correct choice of  
technology for the above use case.



As mentioned earlier, we need help with finding someone who can help us move 
ahead.



Thanks a lot for your time.



Regards,

Aman Kumar Jha

This email communication (including any attachments) contains confidential 
information and is intended only for the named recipients. If you are not the 
intended recipient, please delete this email communication (including any 
attachments) and hard copies immediately, Any unauthorized use or dissemination 
of this email communication (including any attachments) in any manner, is 
strictly prohibited. This email communication (including any attachments), may 
not be free of viruses, you should carry out your own virus checks before 
opening any attachment to this e-mail. The sender of this e-mail and the 
company shall not be liable for any damage that you may sustain as a result of 
viruses, incompleteness of this message, interception of this message, which 
may arise as a result of e-mail transmission.