On Thu, Apr 24, 2014 at 7:15 PM, Tim Stearn <[email protected]> wrote:

>  Hi All,
>
>
>
> VERY new to Phoenix, so please forgive a newbie question;).
>
>
>
> I just watched this introductory video:
> http://www.youtube.com/watch?v=YHsHdQ08trg.  From this, I understand that
> there is a 1:1 correspondence between Phoenix and HBase tables.  That being
> the case, I assume that the same modeling imperatives for HBase apply for
> HBase+Phoenix.  Namely that:
>
> Good key design is very important
>
Yes, definitely. Your key design in Phoenix maps to your primary key
constraint.


>  You should try to denormalize when reasonable to avoid scans of multiple
> tables
>
Sometimes, depending on the data


>  For #2, this implies that rows will likely have repeating groups and
> time series in columns, using naming conventions to associate related
> columns.  For instance, our application will need to support storing
> networks of related entities.  We're planning on using the same adjacency
> list approach that TitanDB users, so that each entity would include a
> column family for relationships with related fields using a naming
> convention like:
>
>      [column name, column value (relationship strength)]:  <relationship
> type>_<related entity id>:  100
>
>
>
> We will probably also have time series within a row.  So I might have a
> set of related metrics, keyed by the time period and metric name:
>
>     <Year+Month>_<metric name>
>
>
>
>      201401_average_balance, 201402_average_balance, 201401_max_balance,
> 201402_max_balance, etc
>
>
>
>
>
> My questions are these:
>
> 1.       Is this denormalization approach still the recommended approach
> when using HBase with Phoenix?
>
In Phoenix you'd model your time dimension in your row key through a
composite primary key constraint. Then your metrics would become columns.
So something like this:
CREATE TABLE metrics (host VARCHAR, date DATE, average_balance BIGINT,
max_balance BIGINT, CONSTRAINT pk PRIMARY KEY (host, date));
you may indicate the column family by prefixing a column not in the primary
key like this:
CREATE TABLE metrics (host VARCHAR, date DATE, a.average_balance BIGINT,
a.max_balance BIGINT, CONSTRAINT pk PRIMARY KEY (host, date));
you may also choose to store the raw data and run aggregate queries to
calculate the average and max on the fly:
CREATE TABLE metrics (host VARCHAR, date DATE, balance BIGINT, CONSTRAINT
pk PRIMARY KEY (host, date));
SELECT avg(balance),max(balance) from metrics WHERE ... GROUP BY host;

>  2.       Is there any special support in Phoenix to extract the "data"
> contained in column names (like the <related entity id>) above, or is this
> up to the application processing the query result?
>
We don't have this, but we do have "dynamic" columns that you can specify
at read/query time: http://phoenix.incubator.apache.org/dynamic_columns.html
We also have the ability to create "views":
http://phoenix.incubator.apache.org/views.html
And folks have asked for this capability and there are some good ideas on
how this could be supported in a SQL-ly way:
https://issues.apache.org/jira/browse/PHOENIX-150

>  3.       Is there any special support for querying these repeating
> groups of fields (such that Phoenix could be aware of the relationships
> between these fields) or is it:
>
> a.       Up to the user to know which fields they're after and specify
> the proper names
>
The typical pattern is to take the "relationship" information and put that
in the row key and make your column names the static part. Note that we do
have the ability to add secondary indexes to improve performance for
various combinations of columns:
http://phoenix.incubator.apache.org/secondary_indexing.html and we also
support equi-joins though a broadcast hash join mechanism:
http://phoenix.incubator.apache.org/joins.html

>  b.      Up to the application to internally keep track of these fields
> and their relationships and generate the proper queries on the user's
> behalf?
>
>
>
> Thanks in advance,
>
>
>
> Tim S
>
>
>
>
>
>
>
Thanks,
James

Reply via email to