Re: Runtime DDL supported?

2018-02-21 Thread Miles Spielberg
I believe each query in a UNION needs to have the same result tuple format, 
which would work in this toy example, but in the general case each view would 
have a different schema. We could make the result tuples conform with each 
other by selecting NULL literals for every column except those in a view. It 
would get quite verbose though. Assuming f1,f2,f3 all have incompatible types, 
were you suggesting something like this?

Select f1, null, null from v1 where PK=?
Union all
Select null, f2, null from v2 where PK=?
Union all
Select null, null, f3 from v3 where PK=?

We might just run separate parallel queries against each view and merge the 
results client side. I would guess this should perform well since the block 
cache can be leveraged for queries after the first.

We could also use the HBase API to run a point row get. We'd have to 
reimplement decoding for Phoenix's column values, which is not ideal but quite 
doable.

Sent from my iPhone

> On Feb 21, 2018, at 9:09 PM, James Taylor  wrote:
> 
> Have you tried a UNION ALL query on (f1, f2, f3) instead? It seems you’re on 
> a good track with multiple views over a single (or handful) of physical 
> table(s).
> 
>> On Wed, Feb 21, 2018 at 6:45 PM Miles Spielberg  wrote:
>> I've done some experimentation with views, with a schema resembling this:
>> 
 create table t1(
 pk bigint not null primary key
 );
 
 create view v1(
 f1 varchar
 ) AS SELECT * FROM t1;
 create INDEX v1_f1 ON v1(f1);
 
 create view v2(
 f2 varchar
 ) AS SELECT * FROM t1;
 create INDEX v2_f2 ON v2(f2);
 
 create view v3(
 f3 varchar
 ) AS SELECT * FROM t1;
 create INDEX v3_f3 ON v3(f3);
>> 
>> Most of the time we'll be accessing data via the indexed views, but we'd 
>> also like to be able to query all columns (f1, f2, f3) for a given pk. At 
>> the HBase level, this should be doable as a point get on t1. The SQL-y way 
>> to express this would probably be with JOINs, but the EXPLAIN plan is not 
>> encouraging.
>> 
>>> > explain SELECT * from t1 left join v1 on v1.pk=t1.pk left join v2 on 
>>> > v2.pk=t1.pk left  join v3 on v3.pk=t1.pk where t1.pk=12345;
>>> | CLIENT 1-CHUNK 1 ROWS 281 BYTES PARALLEL 1-WAY ROUND ROBIN POINT LOOKUP 
>>> ON 1 KEY OVER T1
>>> | PARALLEL LEFT-JOIN TABLE 0
>>> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>>> | PARALLEL LEFT-JOIN TABLE 1
>>> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>>> | PARALLEL LEFT-JOIN TABLE 2
>>> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>> 
>> This is pushing me back towards a design of having a single table, except 
>> for the issue of proliferating physical HBase tables for the indexes. Would 
>> you advise having a single table + a single view on it containing all 
>> columns, to coerce Phoenix to consolidate the indexes into a single physical 
>> table? Are there other alternatives we should be considering?
>> 
>> Miles Spielberg
>> Staff Software Engineer
>> 
>> O. 650.485.1102
>> 900 Jefferson Ave
>> Redwood City, CA 94063
>> 
>> 
>>> On Fri, Feb 16, 2018 at 3:27 PM, James Taylor  
>>> wrote:
>>> All indexes on views are stored in a single physical table, so you'll be ok 
>>> in that regard.
>>> 
>>> If you could file bugs for any local index issues, we'd really appreciate 
>>> it. We've been steadily improving local indexes (see PHOENIX-3941 for some 
>>> recent perf improvements - applicable for multi-tenant tables in particular 
>>> - these will appear in our 4.14 release). Handling non covered columns is 
>>> pretty isolated, so we should be able to fix bugs you find. Plus, there's a 
>>> workaround - you can cover your indexes until any issues are fixed.
>>> 
>>> Global, mutable indexes have had many improvements over the last several 
>>> releases too, but there's more operational overhead if/when a data table 
>>> gets out of sync with it's index table (plus some amount of configurable 
>>> eventual consistency or index disablement). With local indexes (and HBase 
>>> 1.3), this isn't possible.
>>> 
>>> Thanks,
>>> James
>>> 
 On Fri, Feb 16, 2018 at 3:10 PM, Miles Spielberg  wrote:
 Hi James,
 
 Thanks for the tips around reducing the number of physical tables while 
 still maintaining the appearance of multiple tables via view definitions. 
 In our use case we don't anticipate having much if any immutable data, so 
 unfortunately I don't expect to be able to take advantage of Phoenix's 
 optimizations there.
 
 We're expecting many indexes, mostly likely with several per logical 
 per-tenant table. Given that global indexes are implemented as physical 
 HBase tables, will the view-oriented optimizations help very much? We've 
 done some experiments with local indexes on 4.13.2 and found bugs, 
 

Re: Large tables in phoenix, issues with relational queries

2018-02-21 Thread James Taylor
Hi Aman,
Will all of your 210 relational tables only have a few millions rows? If
so, have you tried just using something like MySQL? What led you toward a
distributed solution?

When going from a single node RDBMS system to Phoenix, you typically
wouldn't use the schemas directly, but there'd be some amount of
denormalization. Have you seen our Tuning Guide [1]? You'll likely want to
determine the best row key design and minimum number of secondary indexes
to satisfy your most common questions.

More specifically with joins [2], you have to be careful as the Phoenix
optimizer will attempt to do join ordering or figure out the best join
strategy (note there's on going work to improve this with PHOENIX-1556).
Instead, you'll need to you'll need to make sure to list your tables from
largest to smallest (the size after filtering). Also, Phoenix has two join
strategies - hash join and sort merge join. By default, Phoenix will
perform a hash join, but you can use the /*+ USE_SORT_MERGE_JOIN */ hint to
force a sort merge join. The sort merge join will be better if the tables
are already ordered by their join key. If your Report Framework use case is
doing many joins, you'd likely want to add secondary indexes that ensure
that one or both sides are ordered according how you're joining the tables.

Sorry for only providing very general information, but without more
specifics, it's difficult to provide more specific guidance.

Thanks,
James

[1] http://phoenix.apache.org/tuning_guide.html
[2] http://phoenix.apache.org/joins.html

On Mon, Feb 19, 2018 at 12:11 AM, Aman Kumar Jha  wrote:

> Phoenix Team,
>
>
>
> We are using Apache Phoenix on our Reporting Framework that we are
> building , and are facing a lot of challenges with it.  (majorly
> performance challenges). We are severely constrained on Apache Phoenix
> knowledge and would love your help to find someone who can help us get off
> the ground here.
>
>
>
> Our use case is, about 210 relational tables (a few million row in many of
> these tables) are present inside our DB and our reporting framework sits on
> top of the same. Due to many relational tables, the reports mostly result
> in large queries, with multiple joins (mostly left outer). This we think is
> the root cause of most of our problems. A lot of internet searches, get us
> the basics back easily, but we are not getting anything deeper, so that we
> can tune this further.
>
>
>
> At this point, we are really thinking, if Phoenix is the correct choice of
>  technology for the above use case.
>
>
>
> As mentioned earlier, we need help with finding someone who can help us
> move ahead.
>
>
>
> Thanks a lot for your time.
>
>
>
> Regards,
>
> Aman Kumar Jha
>
>
> This email communication (including any attachments) contains confidential
> information and is intended only for the named recipients. If you are not
> the intended recipient, please delete this email communication (including
> any attachments) and hard copies immediately, Any unauthorized use or
> dissemination of this email communication (including any attachments) in
> any manner, is strictly prohibited. This email communication (including any
> attachments), may not be free of viruses, you should carry out your own
> virus checks before opening any attachment to this e-mail. The sender of
> this e-mail and the company shall not be liable for any damage that you may
> sustain as a result of viruses, incompleteness of this message,
> interception of this message, which may arise as a result of e-mail
> transmission.
>


Re: Runtime DDL supported?

2018-02-21 Thread James Taylor
Have you tried a UNION ALL query on (f1, f2, f3) instead? It seems you’re
on a good track with multiple views over a single (or handful) of physical
table(s).

On Wed, Feb 21, 2018 at 6:45 PM Miles Spielberg  wrote:

> I've done some experimentation with views, with a schema resembling this:
>
> create table t1(
>>
>> pk bigint not null primary key
>>
>> );
>>
>>
>>> create view v1(
>>
>> f1 varchar
>>
>> ) AS SELECT * FROM t1;
>>
>> create INDEX v1_f1 ON v1(f1);
>>
>>
>>> create view v2(
>>
>> f2 varchar
>>
>> ) AS SELECT * FROM t1;
>>
>> create INDEX v2_f2 ON v2(f2);
>>
>>
>>> create view v3(
>>
>> f3 varchar
>>
>> ) AS SELECT * FROM t1;
>>
>> create INDEX v3_f3 ON v3(f3);
>>
>>
> Most of the time we'll be accessing data via the indexed views, but we'd
> also like to be able to query all columns (f1, f2, f3) for a given pk. At
> the HBase level, this should be doable as a point get on t1. The SQL-y
> way to express this would probably be with JOINs, but the EXPLAIN plan is
> not encouraging.
>
> > explain SELECT * from t1 left join v1 on v1.pk=t1.pk left join v2 on
>> v2.pk=t1.pk left  join v3 on v3.pk=t1.pk where t1.pk=12345;
>> | CLIENT 1-CHUNK 1 ROWS 281 BYTES PARALLEL 1-WAY ROUND ROBIN POINT LOOKUP
>> ON 1 KEY OVER T1
>> | PARALLEL LEFT-JOIN TABLE 0
>> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>> | PARALLEL LEFT-JOIN TABLE 1
>> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>> | PARALLEL LEFT-JOIN TABLE 2
>> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>>
>
> This is pushing me back towards a design of having a single table, except
> for the issue of proliferating physical HBase tables for the indexes. Would
> you advise having a single table + a single view on it containing all
> columns, to coerce Phoenix to consolidate the indexes into a single
> physical table? Are there other alternatives we should be considering?
>
> Miles Spielberg
> Staff Software Engineer
>
>
> O. 650.485.1102
> 900 Jefferson Ave
> 
> Redwood City
> ,
> CA 94063
> 
>
> On Fri, Feb 16, 2018 at 3:27 PM, James Taylor 
> wrote:
>
>> All indexes on views are stored in a single physical table, so you'll be
>> ok in that regard.
>>
>> If you could file bugs for any local index issues, we'd really appreciate
>> it. We've been steadily improving local indexes (see PHOENIX-3941 for some
>> recent perf improvements - applicable for multi-tenant tables in particular
>> - these will appear in our 4.14 release). Handling non covered columns is
>> pretty isolated, so we should be able to fix bugs you find. Plus, there's a
>> workaround - you can cover your indexes until any issues are fixed.
>>
>> Global, mutable indexes have had many improvements over the last several
>> releases too, but there's more operational overhead if/when a data table
>> gets out of sync with it's index table (plus some amount of configurable
>> eventual consistency or index disablement). With local indexes (and HBase
>> 1.3), this isn't possible.
>>
>> Thanks,
>> James
>>
>> On Fri, Feb 16, 2018 at 3:10 PM, Miles Spielberg  wrote:
>>
>>> Hi James,
>>>
>>> Thanks for the tips around reducing the number of physical tables while
>>> still maintaining the appearance of multiple tables via view definitions.
>>> In our use case we don't anticipate having much if any immutable data, so
>>> unfortunately I don't expect to be able to take advantage of Phoenix's
>>> optimizations there.
>>>
>>> We're expecting many indexes, mostly likely with several per logical
>>> per-tenant table. Given that global indexes are implemented as physical
>>> HBase tables, will the view-oriented optimizations help very much? We've
>>> done some experiments with local indexes on 4.13.2 and found bugs,
>>> particularly with the rewrite optimization to read non-covered columns from
>>> the main table, so we're not confident in using local indexes to optimize
>>> queries. (I've looked through the 5.0-alpha release notes and couldn't find
>>> anything related to this issue, so if desired I'll collect info for a
>>> separate bug report.)
>>>
>>> Miles Spielberg
>>> Staff Software Engineer
>>>
>>>
>>> O. 650.485.1102 <(650)%20485-1102>
>>> 900 Jefferson Ave
>>> 
>>> Redwood City
>>> ,
>>> CA 94063
>>> 
>>>
>>> 

Re: Runtime DDL supported?

2018-02-21 Thread Miles Spielberg
I've done some experimentation with views, with a schema resembling this:

create table t1(
>
> pk bigint not null primary key
>
> );
>
>
>> create view v1(
>
> f1 varchar
>
> ) AS SELECT * FROM t1;
>
> create INDEX v1_f1 ON v1(f1);
>
>
>> create view v2(
>
> f2 varchar
>
> ) AS SELECT * FROM t1;
>
> create INDEX v2_f2 ON v2(f2);
>
>
>> create view v3(
>
> f3 varchar
>
> ) AS SELECT * FROM t1;
>
> create INDEX v3_f3 ON v3(f3);
>
>
Most of the time we'll be accessing data via the indexed views, but we'd
also like to be able to query all columns (f1, f2, f3) for a given pk. At
the HBase level, this should be doable as a point get on t1. The SQL-y way
to express this would probably be with JOINs, but the EXPLAIN plan is not
encouraging.

> explain SELECT * from t1 left join v1 on v1.pk=t1.pk left join v2 on v2.pk
> =t1.pk left  join v3 on v3.pk=t1.pk where t1.pk=12345;
> | CLIENT 1-CHUNK 1 ROWS 281 BYTES PARALLEL 1-WAY ROUND ROBIN POINT LOOKUP
> ON 1 KEY OVER T1
> | PARALLEL LEFT-JOIN TABLE 0
> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
> | PARALLEL LEFT-JOIN TABLE 1
> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
> | PARALLEL LEFT-JOIN TABLE 2
> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>

This is pushing me back towards a design of having a single table, except
for the issue of proliferating physical HBase tables for the indexes. Would
you advise having a single table + a single view on it containing all
columns, to coerce Phoenix to consolidate the indexes into a single
physical table? Are there other alternatives we should be considering?

Miles Spielberg
Staff Software Engineer


O. 650.485.1102
900 Jefferson Ave
Redwood City, CA 94063

On Fri, Feb 16, 2018 at 3:27 PM, James Taylor 
wrote:

> All indexes on views are stored in a single physical table, so you'll be
> ok in that regard.
>
> If you could file bugs for any local index issues, we'd really appreciate
> it. We've been steadily improving local indexes (see PHOENIX-3941 for some
> recent perf improvements - applicable for multi-tenant tables in particular
> - these will appear in our 4.14 release). Handling non covered columns is
> pretty isolated, so we should be able to fix bugs you find. Plus, there's a
> workaround - you can cover your indexes until any issues are fixed.
>
> Global, mutable indexes have had many improvements over the last several
> releases too, but there's more operational overhead if/when a data table
> gets out of sync with it's index table (plus some amount of configurable
> eventual consistency or index disablement). With local indexes (and HBase
> 1.3), this isn't possible.
>
> Thanks,
> James
>
> On Fri, Feb 16, 2018 at 3:10 PM, Miles Spielberg  wrote:
>
>> Hi James,
>>
>> Thanks for the tips around reducing the number of physical tables while
>> still maintaining the appearance of multiple tables via view definitions.
>> In our use case we don't anticipate having much if any immutable data, so
>> unfortunately I don't expect to be able to take advantage of Phoenix's
>> optimizations there.
>>
>> We're expecting many indexes, mostly likely with several per logical
>> per-tenant table. Given that global indexes are implemented as physical
>> HBase tables, will the view-oriented optimizations help very much? We've
>> done some experiments with local indexes on 4.13.2 and found bugs,
>> particularly with the rewrite optimization to read non-covered columns from
>> the main table, so we're not confident in using local indexes to optimize
>> queries. (I've looked through the 5.0-alpha release notes and couldn't find
>> anything related to this issue, so if desired I'll collect info for a
>> separate bug report.)
>>
>> Miles Spielberg
>> Staff Software Engineer
>>
>>
>> O. 650.485.1102 <(650)%20485-1102>
>> 900 Jefferson Ave
>> 
>> Redwood City, CA 94063
>> 
>>
>> On Fri, Feb 16, 2018 at 2:49 PM, James Taylor 
>> wrote:
>>
>>> Hi Miles,
>>> You'll be fine if you use views [1] and multi-tenancy [2] to limit the
>>> number of physical HBase tables. Make sure you read about the limitations
>>> of views too [3].
>>>
>>> Here's the way I've seen this modeled successfully:
>>> - create one schema per use case. This will let you leverage some nice
>>> features in HBase for quotas and throttling. If you'll have a single use
>>> case, you don't have to worry about it. Read about namespaces here [4] and
>>> make sure to enable them before you start creating tables.
>>> - define an immutable, multi-tenant base table that has TENANT_ID +
>>> TYPE_ID primary key. There are optimizations Phoenix does over immutable
>>> tables that you'll want to leverage (assuming you have use cases that fit
>>> into 

FINAL REMINDER: CFP for Apache EU Roadshow Closes 25th February

2018-02-21 Thread Sharan F

Hello Apache Supporters and Enthusiasts

This is your FINAL reminder that the Call for Papers (CFP) for the 
Apache EU Roadshow is closing soon. Our Apache EU Roadshow will focus on 
Cloud, IoT, Apache Tomcat, Apache Http and will run from 13-14 June 2018 
in Berlin.
Note that the CFP deadline has been extended to *25*^*th* *February *and 
it will be your final opportunity to submit a talk for thisevent.


Please make your submissions at http://apachecon.com/euroadshow18/

Also note that early bird ticket registrations to attend FOSS Backstage 
including the Apache EU Roadshow, have also been extended and will be 
available until 23^rd February. Please register at 
https://foss-backstage.de/tickets


We look forward to seeing you in Berlin!

Thanks
Sharan Foga, VP Apache Community Development

PLEASE NOTE: You are receiving this message because you are subscribed 
to a user@ or dev@ list of one or more Apache Software Foundation projects.