Re: Runtime DDL supported?

2018-03-08 Thread James Taylor
Thanks for digging that up, Miles. I've added a comment on the JIRA on how
to go about implementing it here:
https://issues.apache.org/jira/browse/PHOENIX-3547?focusedCommentId=16391739=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16391739

That would be a good first contribution if you're up for it.

Thanks,
James

On Wed, Mar 7, 2018 at 5:09 PM, Miles Spielberg  wrote:

> We found https://issues.apache.org/jira/browse/PHOENIX-3547, which seems
> to be precisely our problem. We would want at least the option to use a
> bigint rather than the int in the JIRA to accommodate massive growth. While
> we intend to have many tenants, we don't intend to use the Phoenix
> "tenant_id" to differentiate them, and instead manage them at our
> application layer, so separate counters per Phoenix tenant would not help
> in our situation.
>
> Miles Spielberg
> Staff Software Engineer
>
>
> O. 650.485.1102 <(650)%20485-1102>
> 900 Jefferson Ave
> 
> Redwood City, CA 94063
> 
>
> On Wed, Feb 28, 2018 at 10:27 PM, James Taylor 
> wrote:
>
>> Please file a JIRA as it’d be feasible to change this limitation. The
>> easiest way would be to have a separate counter for each tenant. Another
>> way to reduce the number of indexes on tenant specific views would be to
>> factor out common columns to global views and create indexes there.
>>
>> On Tue, Feb 27, 2018 at 3:40 PM Miles Spielberg  wrote:
>>
>>> As we discussed, indexes across views are stored in a single HBase table
>>> associated with the original table (_IDX_). That's grand for
>>> limiting the number of HBase tables created, but I just realized that the
>>> actual index data within is differentiated by the 16-bit "viewIndexId",
>>> which limits us to 64K indexes across all views for a given table. That's
>>> concerning for our use case, especially if its a cumulative autoincrement
>>> across all CREATE INDEX and DROP INDEX operations over the lifetime of the
>>> base table.
>>>
>>> Is there any workaround for this? A quick grep across the source
>>> indicates that the length of viewIndexId is currently hard-coded.
>>>
>>> At least, this limitation should probably be added to the list of
>>> caveats and warnings at https://phoenix.apache.org/views.html.
>>>
>>> Miles Spielberg
>>> Staff Software Engineer
>>>
>>>
>>> O. 650.485.1102 <(650)%20485-1102>
>>> 900 Jefferson Ave
>>> 
>>> Redwood City
>>> ,
>>> CA 94063
>>> 
>>>
>>> On Thu, Feb 22, 2018 at 7:42 AM, James Taylor 
>>> wrote:
>>>
 Another option would be to use dynamic columns[1] when querying across
 views. You’d have to disable column encoding [2] in this case.

 [1] http://phoenix.apache.org/dynamic_columns.html
 [2] http://phoenix.apache.org/columnencoding.html

 On Wed, Feb 21, 2018 at 11:57 PM Miles Spielberg  wrote:

> I believe each query in a UNION needs to have the same result tuple
> format, which would work in this toy example, but in the general case each
> view would have a different schema. We could make the result tuples 
> conform
> with each other by selecting NULL literals for every column except those 
> in
> a view. It would get quite verbose though. Assuming f1,f2,f3 all have
> incompatible types, were you suggesting something like this?
>
> Select f1, null, null from v1 where PK=?
> Union all
> Select null, f2, null from v2 where PK=?
> Union all
> Select null, null, f3 from v3 where PK=?
>
> We might just run separate parallel queries against each view and
> merge the results client side. I would guess this should perform well 
> since
> the block cache can be leveraged for queries after the first.
>
> We could also use the HBase API to run a point row get. We'd have to
> reimplement decoding for Phoenix's column values, which is not ideal but
> quite doable.
>
> Sent from my iPhone
>
> On Feb 21, 2018, at 9:09 PM, James Taylor 
> wrote:
>
> Have you tried a UNION ALL query on (f1, f2, f3) instead? It seems
> you’re on a good track with multiple views over a single (or handful) of
> physical table(s).
>
> On Wed, Feb 21, 2018 at 6:45 PM Miles Spielberg  wrote:
>
>> I've done some experimentation with views, with a schema resembling
>> this:
>>
>> create table t1(
>>>
>>> pk bigint not null 

Re: Runtime DDL supported?

2018-03-07 Thread Miles Spielberg
We found https://issues.apache.org/jira/browse/PHOENIX-3547, which seems to
be precisely our problem. We would want at least the option to use a bigint
rather than the int in the JIRA to accommodate massive growth. While we
intend to have many tenants, we don't intend to use the Phoenix "tenant_id"
to differentiate them, and instead manage them at our application layer, so
separate counters per Phoenix tenant would not help in our situation.

Miles Spielberg
Staff Software Engineer


O. 650.485.1102
900 Jefferson Ave
Redwood City, CA 94063

On Wed, Feb 28, 2018 at 10:27 PM, James Taylor 
wrote:

> Please file a JIRA as it’d be feasible to change this limitation. The
> easiest way would be to have a separate counter for each tenant. Another
> way to reduce the number of indexes on tenant specific views would be to
> factor out common columns to global views and create indexes there.
>
> On Tue, Feb 27, 2018 at 3:40 PM Miles Spielberg  wrote:
>
>> As we discussed, indexes across views are stored in a single HBase table
>> associated with the original table (_IDX_). That's grand for
>> limiting the number of HBase tables created, but I just realized that the
>> actual index data within is differentiated by the 16-bit "viewIndexId",
>> which limits us to 64K indexes across all views for a given table. That's
>> concerning for our use case, especially if its a cumulative autoincrement
>> across all CREATE INDEX and DROP INDEX operations over the lifetime of the
>> base table.
>>
>> Is there any workaround for this? A quick grep across the source
>> indicates that the length of viewIndexId is currently hard-coded.
>>
>> At least, this limitation should probably be added to the list of caveats
>> and warnings at https://phoenix.apache.org/views.html.
>>
>> Miles Spielberg
>> Staff Software Engineer
>>
>>
>> O. 650.485.1102 <(650)%20485-1102>
>> 900 Jefferson Ave
>> 
>> Redwood City
>> ,
>> CA 94063
>> 
>>
>> On Thu, Feb 22, 2018 at 7:42 AM, James Taylor 
>> wrote:
>>
>>> Another option would be to use dynamic columns[1] when querying across
>>> views. You’d have to disable column encoding [2] in this case.
>>>
>>> [1] http://phoenix.apache.org/dynamic_columns.html
>>> [2] http://phoenix.apache.org/columnencoding.html
>>>
>>> On Wed, Feb 21, 2018 at 11:57 PM Miles Spielberg  wrote:
>>>
 I believe each query in a UNION needs to have the same result tuple
 format, which would work in this toy example, but in the general case each
 view would have a different schema. We could make the result tuples conform
 with each other by selecting NULL literals for every column except those in
 a view. It would get quite verbose though. Assuming f1,f2,f3 all have
 incompatible types, were you suggesting something like this?

 Select f1, null, null from v1 where PK=?
 Union all
 Select null, f2, null from v2 where PK=?
 Union all
 Select null, null, f3 from v3 where PK=?

 We might just run separate parallel queries against each view and merge
 the results client side. I would guess this should perform well since the
 block cache can be leveraged for queries after the first.

 We could also use the HBase API to run a point row get. We'd have to
 reimplement decoding for Phoenix's column values, which is not ideal but
 quite doable.

 Sent from my iPhone

 On Feb 21, 2018, at 9:09 PM, James Taylor 
 wrote:

 Have you tried a UNION ALL query on (f1, f2, f3) instead? It seems
 you’re on a good track with multiple views over a single (or handful) of
 physical table(s).

 On Wed, Feb 21, 2018 at 6:45 PM Miles Spielberg  wrote:

> I've done some experimentation with views, with a schema resembling
> this:
>
> create table t1(
>>
>> pk bigint not null primary key
>>
>> );
>>
>>
>>> create view v1(
>>
>> f1 varchar
>>
>> ) AS SELECT * FROM t1;
>>
>> create INDEX v1_f1 ON v1(f1);
>>
>>
>>> create view v2(
>>
>> f2 varchar
>>
>> ) AS SELECT * FROM t1;
>>
>> create INDEX v2_f2 ON v2(f2);
>>
>>
>>> create view v3(
>>
>> f3 varchar
>>
>> ) AS SELECT * FROM t1;
>>
>> create INDEX v3_f3 ON v3(f3);
>>
>>
> Most of the time we'll be accessing data via the indexed views, but
> we'd also like to be able to query all columns (f1, f2, f3) for a given 
> pk. At
> the HBase level, this should be doable as a point get on t1. The
> SQL-y way to express this would 

Re: Runtime DDL supported?

2018-02-28 Thread James Taylor
Please file a JIRA as it’d be feasible to change this limitation. The
easiest way would be to have a separate counter for each tenant. Another
way to reduce the number of indexes on tenant specific views would be to
factor out common columns to global views and create indexes there.

On Tue, Feb 27, 2018 at 3:40 PM Miles Spielberg  wrote:

> As we discussed, indexes across views are stored in a single HBase table
> associated with the original table (_IDX_). That's grand for
> limiting the number of HBase tables created, but I just realized that the
> actual index data within is differentiated by the 16-bit "viewIndexId",
> which limits us to 64K indexes across all views for a given table. That's
> concerning for our use case, especially if its a cumulative autoincrement
> across all CREATE INDEX and DROP INDEX operations over the lifetime of the
> base table.
>
> Is there any workaround for this? A quick grep across the source indicates
> that the length of viewIndexId is currently hard-coded.
>
> At least, this limitation should probably be added to the list of caveats
> and warnings at https://phoenix.apache.org/views.html.
>
> Miles Spielberg
> Staff Software Engineer
>
>
> O. 650.485.1102
> 900 Jefferson Ave
> 
> Redwood City
> ,
> CA 94063
> 
>
> On Thu, Feb 22, 2018 at 7:42 AM, James Taylor 
> wrote:
>
>> Another option would be to use dynamic columns[1] when querying across
>> views. You’d have to disable column encoding [2] in this case.
>>
>> [1] http://phoenix.apache.org/dynamic_columns.html
>> [2] http://phoenix.apache.org/columnencoding.html
>>
>> On Wed, Feb 21, 2018 at 11:57 PM Miles Spielberg  wrote:
>>
>>> I believe each query in a UNION needs to have the same result tuple
>>> format, which would work in this toy example, but in the general case each
>>> view would have a different schema. We could make the result tuples conform
>>> with each other by selecting NULL literals for every column except those in
>>> a view. It would get quite verbose though. Assuming f1,f2,f3 all have
>>> incompatible types, were you suggesting something like this?
>>>
>>> Select f1, null, null from v1 where PK=?
>>> Union all
>>> Select null, f2, null from v2 where PK=?
>>> Union all
>>> Select null, null, f3 from v3 where PK=?
>>>
>>> We might just run separate parallel queries against each view and merge
>>> the results client side. I would guess this should perform well since the
>>> block cache can be leveraged for queries after the first.
>>>
>>> We could also use the HBase API to run a point row get. We'd have to
>>> reimplement decoding for Phoenix's column values, which is not ideal but
>>> quite doable.
>>>
>>> Sent from my iPhone
>>>
>>> On Feb 21, 2018, at 9:09 PM, James Taylor 
>>> wrote:
>>>
>>> Have you tried a UNION ALL query on (f1, f2, f3) instead? It seems
>>> you’re on a good track with multiple views over a single (or handful) of
>>> physical table(s).
>>>
>>> On Wed, Feb 21, 2018 at 6:45 PM Miles Spielberg  wrote:
>>>
 I've done some experimentation with views, with a schema resembling
 this:

 create table t1(
>
> pk bigint not null primary key
>
> );
>
>
>> create view v1(
>
> f1 varchar
>
> ) AS SELECT * FROM t1;
>
> create INDEX v1_f1 ON v1(f1);
>
>
>> create view v2(
>
> f2 varchar
>
> ) AS SELECT * FROM t1;
>
> create INDEX v2_f2 ON v2(f2);
>
>
>> create view v3(
>
> f3 varchar
>
> ) AS SELECT * FROM t1;
>
> create INDEX v3_f3 ON v3(f3);
>
>
 Most of the time we'll be accessing data via the indexed views, but
 we'd also like to be able to query all columns (f1, f2, f3) for a given 
 pk. At
 the HBase level, this should be doable as a point get on t1. The SQL-y
 way to express this would probably be with JOINs, but the EXPLAIN plan is
 not encouraging.

 > explain SELECT * from t1 left join v1 on v1.pk=t1.pk left join v2 on
> v2.pk=t1.pk left  join v3 on v3.pk=t1.pk where t1.pk=12345;
> | CLIENT 1-CHUNK 1 ROWS 281 BYTES PARALLEL 1-WAY ROUND ROBIN POINT
> LOOKUP ON 1 KEY OVER T1
> | PARALLEL LEFT-JOIN TABLE 0
> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
> | PARALLEL LEFT-JOIN TABLE 1
> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
> | PARALLEL LEFT-JOIN TABLE 2
> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>

 This is pushing me back towards a design of having a single table,
 except for the issue 

Re: Runtime DDL supported?

2018-02-27 Thread Miles Spielberg
As we discussed, indexes across views are stored in a single HBase table
associated with the original table (_IDX_). That's grand for
limiting the number of HBase tables created, but I just realized that the
actual index data within is differentiated by the 16-bit "viewIndexId",
which limits us to 64K indexes across all views for a given table. That's
concerning for our use case, especially if its a cumulative autoincrement
across all CREATE INDEX and DROP INDEX operations over the lifetime of the
base table.

Is there any workaround for this? A quick grep across the source indicates
that the length of viewIndexId is currently hard-coded.

At least, this limitation should probably be added to the list of caveats
and warnings at https://phoenix.apache.org/views.html.

Miles Spielberg
Staff Software Engineer


O. 650.485.1102
900 Jefferson Ave
Redwood City, CA 94063

On Thu, Feb 22, 2018 at 7:42 AM, James Taylor 
wrote:

> Another option would be to use dynamic columns[1] when querying across
> views. You’d have to disable column encoding [2] in this case.
>
> [1] http://phoenix.apache.org/dynamic_columns.html
> [2] http://phoenix.apache.org/columnencoding.html
>
> On Wed, Feb 21, 2018 at 11:57 PM Miles Spielberg  wrote:
>
>> I believe each query in a UNION needs to have the same result tuple
>> format, which would work in this toy example, but in the general case each
>> view would have a different schema. We could make the result tuples conform
>> with each other by selecting NULL literals for every column except those in
>> a view. It would get quite verbose though. Assuming f1,f2,f3 all have
>> incompatible types, were you suggesting something like this?
>>
>> Select f1, null, null from v1 where PK=?
>> Union all
>> Select null, f2, null from v2 where PK=?
>> Union all
>> Select null, null, f3 from v3 where PK=?
>>
>> We might just run separate parallel queries against each view and merge
>> the results client side. I would guess this should perform well since the
>> block cache can be leveraged for queries after the first.
>>
>> We could also use the HBase API to run a point row get. We'd have to
>> reimplement decoding for Phoenix's column values, which is not ideal but
>> quite doable.
>>
>> Sent from my iPhone
>>
>> On Feb 21, 2018, at 9:09 PM, James Taylor  wrote:
>>
>> Have you tried a UNION ALL query on (f1, f2, f3) instead? It seems
>> you’re on a good track with multiple views over a single (or handful) of
>> physical table(s).
>>
>> On Wed, Feb 21, 2018 at 6:45 PM Miles Spielberg  wrote:
>>
>>> I've done some experimentation with views, with a schema resembling this:
>>>
>>> create table t1(

 pk bigint not null primary key

 );


> create view v1(

 f1 varchar

 ) AS SELECT * FROM t1;

 create INDEX v1_f1 ON v1(f1);


> create view v2(

 f2 varchar

 ) AS SELECT * FROM t1;

 create INDEX v2_f2 ON v2(f2);


> create view v3(

 f3 varchar

 ) AS SELECT * FROM t1;

 create INDEX v3_f3 ON v3(f3);


>>> Most of the time we'll be accessing data via the indexed views, but we'd
>>> also like to be able to query all columns (f1, f2, f3) for a given pk. At
>>> the HBase level, this should be doable as a point get on t1. The SQL-y
>>> way to express this would probably be with JOINs, but the EXPLAIN plan is
>>> not encouraging.
>>>
>>> > explain SELECT * from t1 left join v1 on v1.pk=t1.pk left join v2 on
 v2.pk=t1.pk left  join v3 on v3.pk=t1.pk where t1.pk=12345;
 | CLIENT 1-CHUNK 1 ROWS 281 BYTES PARALLEL 1-WAY ROUND ROBIN POINT
 LOOKUP ON 1 KEY OVER T1
 | PARALLEL LEFT-JOIN TABLE 0
 | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
 | PARALLEL LEFT-JOIN TABLE 1
 | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
 | PARALLEL LEFT-JOIN TABLE 2
 | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1

>>>
>>> This is pushing me back towards a design of having a single table,
>>> except for the issue of proliferating physical HBase tables for the
>>> indexes. Would you advise having a single table + a single view on it
>>> containing all columns, to coerce Phoenix to consolidate the indexes into a
>>> single physical table? Are there other alternatives we should be
>>> considering?
>>>
>>> Miles Spielberg
>>> Staff Software Engineer
>>>
>>>
>>> O. 650.485.1102 <(650)%20485-1102>
>>> 900 Jefferson Ave
>>> 
>>> Redwood City
>>> ,
>>> CA 94063
>>> 
>>>
>>> On Fri, Feb 16, 2018 at 3:27 PM, James Taylor 
>>> 

Re: Runtime DDL supported?

2018-02-22 Thread James Taylor
Another option would be to use dynamic columns[1] when querying across
views. You’d have to disable column encoding [2] in this case.

[1] http://phoenix.apache.org/dynamic_columns.html
[2] http://phoenix.apache.org/columnencoding.html

On Wed, Feb 21, 2018 at 11:57 PM Miles Spielberg  wrote:

> I believe each query in a UNION needs to have the same result tuple
> format, which would work in this toy example, but in the general case each
> view would have a different schema. We could make the result tuples conform
> with each other by selecting NULL literals for every column except those in
> a view. It would get quite verbose though. Assuming f1,f2,f3 all have
> incompatible types, were you suggesting something like this?
>
> Select f1, null, null from v1 where PK=?
> Union all
> Select null, f2, null from v2 where PK=?
> Union all
> Select null, null, f3 from v3 where PK=?
>
> We might just run separate parallel queries against each view and merge
> the results client side. I would guess this should perform well since the
> block cache can be leveraged for queries after the first.
>
> We could also use the HBase API to run a point row get. We'd have to
> reimplement decoding for Phoenix's column values, which is not ideal but
> quite doable.
>
> Sent from my iPhone
>
> On Feb 21, 2018, at 9:09 PM, James Taylor  wrote:
>
> Have you tried a UNION ALL query on (f1, f2, f3) instead? It seems you’re
> on a good track with multiple views over a single (or handful) of physical
> table(s).
>
> On Wed, Feb 21, 2018 at 6:45 PM Miles Spielberg  wrote:
>
>> I've done some experimentation with views, with a schema resembling this:
>>
>> create table t1(
>>>
>>> pk bigint not null primary key
>>>
>>> );
>>>
>>>
 create view v1(
>>>
>>> f1 varchar
>>>
>>> ) AS SELECT * FROM t1;
>>>
>>> create INDEX v1_f1 ON v1(f1);
>>>
>>>
 create view v2(
>>>
>>> f2 varchar
>>>
>>> ) AS SELECT * FROM t1;
>>>
>>> create INDEX v2_f2 ON v2(f2);
>>>
>>>
 create view v3(
>>>
>>> f3 varchar
>>>
>>> ) AS SELECT * FROM t1;
>>>
>>> create INDEX v3_f3 ON v3(f3);
>>>
>>>
>> Most of the time we'll be accessing data via the indexed views, but we'd
>> also like to be able to query all columns (f1, f2, f3) for a given pk. At
>> the HBase level, this should be doable as a point get on t1. The SQL-y
>> way to express this would probably be with JOINs, but the EXPLAIN plan is
>> not encouraging.
>>
>> > explain SELECT * from t1 left join v1 on v1.pk=t1.pk left join v2 on
>>> v2.pk=t1.pk left  join v3 on v3.pk=t1.pk where t1.pk=12345;
>>> | CLIENT 1-CHUNK 1 ROWS 281 BYTES PARALLEL 1-WAY ROUND ROBIN POINT
>>> LOOKUP ON 1 KEY OVER T1
>>> | PARALLEL LEFT-JOIN TABLE 0
>>> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>>> | PARALLEL LEFT-JOIN TABLE 1
>>> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>>> | PARALLEL LEFT-JOIN TABLE 2
>>> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>>>
>>
>> This is pushing me back towards a design of having a single table, except
>> for the issue of proliferating physical HBase tables for the indexes. Would
>> you advise having a single table + a single view on it containing all
>> columns, to coerce Phoenix to consolidate the indexes into a single
>> physical table? Are there other alternatives we should be considering?
>>
>> Miles Spielberg
>> Staff Software Engineer
>>
>>
>> O. 650.485.1102
>> 900 Jefferson Ave
>> 
>> Redwood City
>> ,
>> CA 94063
>> 
>>
>> On Fri, Feb 16, 2018 at 3:27 PM, James Taylor 
>> wrote:
>>
>>> All indexes on views are stored in a single physical table, so you'll be
>>> ok in that regard.
>>>
>>> If you could file bugs for any local index issues, we'd really
>>> appreciate it. We've been steadily improving local indexes
>>> (see PHOENIX-3941 for some recent perf improvements - applicable for
>>> multi-tenant tables in particular - these will appear in our 4.14 release).
>>> Handling non covered columns is pretty isolated, so we should be able to
>>> fix bugs you find. Plus, there's a workaround - you can cover your indexes
>>> until any issues are fixed.
>>>
>>> Global, mutable indexes have had many improvements over the last several
>>> releases too, but there's more operational overhead if/when a data table
>>> gets out of sync with it's index table (plus some amount of configurable
>>> eventual consistency or index disablement). With local indexes (and HBase
>>> 1.3), this isn't possible.
>>>
>>> Thanks,
>>> James
>>>
>>> On Fri, Feb 16, 2018 at 3:10 PM, Miles Spielberg  wrote:
>>>
 Hi James,

 Thanks for the tips around reducing 

Re: Runtime DDL supported?

2018-02-21 Thread Miles Spielberg
I believe each query in a UNION needs to have the same result tuple format, 
which would work in this toy example, but in the general case each view would 
have a different schema. We could make the result tuples conform with each 
other by selecting NULL literals for every column except those in a view. It 
would get quite verbose though. Assuming f1,f2,f3 all have incompatible types, 
were you suggesting something like this?

Select f1, null, null from v1 where PK=?
Union all
Select null, f2, null from v2 where PK=?
Union all
Select null, null, f3 from v3 where PK=?

We might just run separate parallel queries against each view and merge the 
results client side. I would guess this should perform well since the block 
cache can be leveraged for queries after the first.

We could also use the HBase API to run a point row get. We'd have to 
reimplement decoding for Phoenix's column values, which is not ideal but quite 
doable.

Sent from my iPhone

> On Feb 21, 2018, at 9:09 PM, James Taylor  wrote:
> 
> Have you tried a UNION ALL query on (f1, f2, f3) instead? It seems you’re on 
> a good track with multiple views over a single (or handful) of physical 
> table(s).
> 
>> On Wed, Feb 21, 2018 at 6:45 PM Miles Spielberg  wrote:
>> I've done some experimentation with views, with a schema resembling this:
>> 
 create table t1(
 pk bigint not null primary key
 );
 
 create view v1(
 f1 varchar
 ) AS SELECT * FROM t1;
 create INDEX v1_f1 ON v1(f1);
 
 create view v2(
 f2 varchar
 ) AS SELECT * FROM t1;
 create INDEX v2_f2 ON v2(f2);
 
 create view v3(
 f3 varchar
 ) AS SELECT * FROM t1;
 create INDEX v3_f3 ON v3(f3);
>> 
>> Most of the time we'll be accessing data via the indexed views, but we'd 
>> also like to be able to query all columns (f1, f2, f3) for a given pk. At 
>> the HBase level, this should be doable as a point get on t1. The SQL-y way 
>> to express this would probably be with JOINs, but the EXPLAIN plan is not 
>> encouraging.
>> 
>>> > explain SELECT * from t1 left join v1 on v1.pk=t1.pk left join v2 on 
>>> > v2.pk=t1.pk left  join v3 on v3.pk=t1.pk where t1.pk=12345;
>>> | CLIENT 1-CHUNK 1 ROWS 281 BYTES PARALLEL 1-WAY ROUND ROBIN POINT LOOKUP 
>>> ON 1 KEY OVER T1
>>> | PARALLEL LEFT-JOIN TABLE 0
>>> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>>> | PARALLEL LEFT-JOIN TABLE 1
>>> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>>> | PARALLEL LEFT-JOIN TABLE 2
>>> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>> 
>> This is pushing me back towards a design of having a single table, except 
>> for the issue of proliferating physical HBase tables for the indexes. Would 
>> you advise having a single table + a single view on it containing all 
>> columns, to coerce Phoenix to consolidate the indexes into a single physical 
>> table? Are there other alternatives we should be considering?
>> 
>> Miles Spielberg
>> Staff Software Engineer
>> 
>> O. 650.485.1102
>> 900 Jefferson Ave
>> Redwood City, CA 94063
>> 
>> 
>>> On Fri, Feb 16, 2018 at 3:27 PM, James Taylor  
>>> wrote:
>>> All indexes on views are stored in a single physical table, so you'll be ok 
>>> in that regard.
>>> 
>>> If you could file bugs for any local index issues, we'd really appreciate 
>>> it. We've been steadily improving local indexes (see PHOENIX-3941 for some 
>>> recent perf improvements - applicable for multi-tenant tables in particular 
>>> - these will appear in our 4.14 release). Handling non covered columns is 
>>> pretty isolated, so we should be able to fix bugs you find. Plus, there's a 
>>> workaround - you can cover your indexes until any issues are fixed.
>>> 
>>> Global, mutable indexes have had many improvements over the last several 
>>> releases too, but there's more operational overhead if/when a data table 
>>> gets out of sync with it's index table (plus some amount of configurable 
>>> eventual consistency or index disablement). With local indexes (and HBase 
>>> 1.3), this isn't possible.
>>> 
>>> Thanks,
>>> James
>>> 
 On Fri, Feb 16, 2018 at 3:10 PM, Miles Spielberg  wrote:
 Hi James,
 
 Thanks for the tips around reducing the number of physical tables while 
 still maintaining the appearance of multiple tables via view definitions. 
 In our use case we don't anticipate having much if any immutable data, so 
 unfortunately I don't expect to be able to take advantage of Phoenix's 
 optimizations there.
 
 We're expecting many indexes, mostly likely with several per logical 
 per-tenant table. Given that global indexes are implemented as physical 
 HBase tables, will the view-oriented optimizations help very much? We've 
 done some experiments with local indexes on 4.13.2 and found bugs, 
 

Re: Runtime DDL supported?

2018-02-21 Thread James Taylor
Have you tried a UNION ALL query on (f1, f2, f3) instead? It seems you’re
on a good track with multiple views over a single (or handful) of physical
table(s).

On Wed, Feb 21, 2018 at 6:45 PM Miles Spielberg  wrote:

> I've done some experimentation with views, with a schema resembling this:
>
> create table t1(
>>
>> pk bigint not null primary key
>>
>> );
>>
>>
>>> create view v1(
>>
>> f1 varchar
>>
>> ) AS SELECT * FROM t1;
>>
>> create INDEX v1_f1 ON v1(f1);
>>
>>
>>> create view v2(
>>
>> f2 varchar
>>
>> ) AS SELECT * FROM t1;
>>
>> create INDEX v2_f2 ON v2(f2);
>>
>>
>>> create view v3(
>>
>> f3 varchar
>>
>> ) AS SELECT * FROM t1;
>>
>> create INDEX v3_f3 ON v3(f3);
>>
>>
> Most of the time we'll be accessing data via the indexed views, but we'd
> also like to be able to query all columns (f1, f2, f3) for a given pk. At
> the HBase level, this should be doable as a point get on t1. The SQL-y
> way to express this would probably be with JOINs, but the EXPLAIN plan is
> not encouraging.
>
> > explain SELECT * from t1 left join v1 on v1.pk=t1.pk left join v2 on
>> v2.pk=t1.pk left  join v3 on v3.pk=t1.pk where t1.pk=12345;
>> | CLIENT 1-CHUNK 1 ROWS 281 BYTES PARALLEL 1-WAY ROUND ROBIN POINT LOOKUP
>> ON 1 KEY OVER T1
>> | PARALLEL LEFT-JOIN TABLE 0
>> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>> | PARALLEL LEFT-JOIN TABLE 1
>> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>> | PARALLEL LEFT-JOIN TABLE 2
>> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>>
>
> This is pushing me back towards a design of having a single table, except
> for the issue of proliferating physical HBase tables for the indexes. Would
> you advise having a single table + a single view on it containing all
> columns, to coerce Phoenix to consolidate the indexes into a single
> physical table? Are there other alternatives we should be considering?
>
> Miles Spielberg
> Staff Software Engineer
>
>
> O. 650.485.1102
> 900 Jefferson Ave
> 
> Redwood City
> ,
> CA 94063
> 
>
> On Fri, Feb 16, 2018 at 3:27 PM, James Taylor 
> wrote:
>
>> All indexes on views are stored in a single physical table, so you'll be
>> ok in that regard.
>>
>> If you could file bugs for any local index issues, we'd really appreciate
>> it. We've been steadily improving local indexes (see PHOENIX-3941 for some
>> recent perf improvements - applicable for multi-tenant tables in particular
>> - these will appear in our 4.14 release). Handling non covered columns is
>> pretty isolated, so we should be able to fix bugs you find. Plus, there's a
>> workaround - you can cover your indexes until any issues are fixed.
>>
>> Global, mutable indexes have had many improvements over the last several
>> releases too, but there's more operational overhead if/when a data table
>> gets out of sync with it's index table (plus some amount of configurable
>> eventual consistency or index disablement). With local indexes (and HBase
>> 1.3), this isn't possible.
>>
>> Thanks,
>> James
>>
>> On Fri, Feb 16, 2018 at 3:10 PM, Miles Spielberg  wrote:
>>
>>> Hi James,
>>>
>>> Thanks for the tips around reducing the number of physical tables while
>>> still maintaining the appearance of multiple tables via view definitions.
>>> In our use case we don't anticipate having much if any immutable data, so
>>> unfortunately I don't expect to be able to take advantage of Phoenix's
>>> optimizations there.
>>>
>>> We're expecting many indexes, mostly likely with several per logical
>>> per-tenant table. Given that global indexes are implemented as physical
>>> HBase tables, will the view-oriented optimizations help very much? We've
>>> done some experiments with local indexes on 4.13.2 and found bugs,
>>> particularly with the rewrite optimization to read non-covered columns from
>>> the main table, so we're not confident in using local indexes to optimize
>>> queries. (I've looked through the 5.0-alpha release notes and couldn't find
>>> anything related to this issue, so if desired I'll collect info for a
>>> separate bug report.)
>>>
>>> Miles Spielberg
>>> Staff Software Engineer
>>>
>>>
>>> O. 650.485.1102 <(650)%20485-1102>
>>> 900 Jefferson Ave
>>> 
>>> Redwood City
>>> ,
>>> CA 94063
>>> 
>>>
>>> 

Re: Runtime DDL supported?

2018-02-21 Thread Miles Spielberg
I've done some experimentation with views, with a schema resembling this:

create table t1(
>
> pk bigint not null primary key
>
> );
>
>
>> create view v1(
>
> f1 varchar
>
> ) AS SELECT * FROM t1;
>
> create INDEX v1_f1 ON v1(f1);
>
>
>> create view v2(
>
> f2 varchar
>
> ) AS SELECT * FROM t1;
>
> create INDEX v2_f2 ON v2(f2);
>
>
>> create view v3(
>
> f3 varchar
>
> ) AS SELECT * FROM t1;
>
> create INDEX v3_f3 ON v3(f3);
>
>
Most of the time we'll be accessing data via the indexed views, but we'd
also like to be able to query all columns (f1, f2, f3) for a given pk. At
the HBase level, this should be doable as a point get on t1. The SQL-y way
to express this would probably be with JOINs, but the EXPLAIN plan is not
encouraging.

> explain SELECT * from t1 left join v1 on v1.pk=t1.pk left join v2 on v2.pk
> =t1.pk left  join v3 on v3.pk=t1.pk where t1.pk=12345;
> | CLIENT 1-CHUNK 1 ROWS 281 BYTES PARALLEL 1-WAY ROUND ROBIN POINT LOOKUP
> ON 1 KEY OVER T1
> | PARALLEL LEFT-JOIN TABLE 0
> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
> | PARALLEL LEFT-JOIN TABLE 1
> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
> | PARALLEL LEFT-JOIN TABLE 2
> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>

This is pushing me back towards a design of having a single table, except
for the issue of proliferating physical HBase tables for the indexes. Would
you advise having a single table + a single view on it containing all
columns, to coerce Phoenix to consolidate the indexes into a single
physical table? Are there other alternatives we should be considering?

Miles Spielberg
Staff Software Engineer


O. 650.485.1102
900 Jefferson Ave
Redwood City, CA 94063

On Fri, Feb 16, 2018 at 3:27 PM, James Taylor 
wrote:

> All indexes on views are stored in a single physical table, so you'll be
> ok in that regard.
>
> If you could file bugs for any local index issues, we'd really appreciate
> it. We've been steadily improving local indexes (see PHOENIX-3941 for some
> recent perf improvements - applicable for multi-tenant tables in particular
> - these will appear in our 4.14 release). Handling non covered columns is
> pretty isolated, so we should be able to fix bugs you find. Plus, there's a
> workaround - you can cover your indexes until any issues are fixed.
>
> Global, mutable indexes have had many improvements over the last several
> releases too, but there's more operational overhead if/when a data table
> gets out of sync with it's index table (plus some amount of configurable
> eventual consistency or index disablement). With local indexes (and HBase
> 1.3), this isn't possible.
>
> Thanks,
> James
>
> On Fri, Feb 16, 2018 at 3:10 PM, Miles Spielberg  wrote:
>
>> Hi James,
>>
>> Thanks for the tips around reducing the number of physical tables while
>> still maintaining the appearance of multiple tables via view definitions.
>> In our use case we don't anticipate having much if any immutable data, so
>> unfortunately I don't expect to be able to take advantage of Phoenix's
>> optimizations there.
>>
>> We're expecting many indexes, mostly likely with several per logical
>> per-tenant table. Given that global indexes are implemented as physical
>> HBase tables, will the view-oriented optimizations help very much? We've
>> done some experiments with local indexes on 4.13.2 and found bugs,
>> particularly with the rewrite optimization to read non-covered columns from
>> the main table, so we're not confident in using local indexes to optimize
>> queries. (I've looked through the 5.0-alpha release notes and couldn't find
>> anything related to this issue, so if desired I'll collect info for a
>> separate bug report.)
>>
>> Miles Spielberg
>> Staff Software Engineer
>>
>>
>> O. 650.485.1102 <(650)%20485-1102>
>> 900 Jefferson Ave
>> 
>> Redwood City, CA 94063
>> 
>>
>> On Fri, Feb 16, 2018 at 2:49 PM, James Taylor 
>> wrote:
>>
>>> Hi Miles,
>>> You'll be fine if you use views [1] and multi-tenancy [2] to limit the
>>> number of physical HBase tables. Make sure you read about the limitations
>>> of views too [3].
>>>
>>> Here's the way I've seen this modeled successfully:
>>> - create one schema per use case. This will let you leverage some nice
>>> features in HBase for quotas and throttling. If you'll have a single use
>>> case, you don't have to worry about it. Read about namespaces here [4] and
>>> make sure to enable them before you start creating tables.
>>> - define an immutable, multi-tenant base table that has TENANT_ID +
>>> TYPE_ID primary key. There are optimizations Phoenix does over immutable
>>> tables that you'll want to leverage (assuming you have use cases that fit
>>> into 

Re: Runtime DDL supported?

2018-02-16 Thread James Taylor
All indexes on views are stored in a single physical table, so you'll be ok
in that regard.

If you could file bugs for any local index issues, we'd really appreciate
it. We've been steadily improving local indexes (see PHOENIX-3941 for some
recent perf improvements - applicable for multi-tenant tables in particular
- these will appear in our 4.14 release). Handling non covered columns is
pretty isolated, so we should be able to fix bugs you find. Plus, there's a
workaround - you can cover your indexes until any issues are fixed.

Global, mutable indexes have had many improvements over the last several
releases too, but there's more operational overhead if/when a data table
gets out of sync with it's index table (plus some amount of configurable
eventual consistency or index disablement). With local indexes (and HBase
1.3), this isn't possible.

Thanks,
James

On Fri, Feb 16, 2018 at 3:10 PM, Miles Spielberg  wrote:

> Hi James,
>
> Thanks for the tips around reducing the number of physical tables while
> still maintaining the appearance of multiple tables via view definitions.
> In our use case we don't anticipate having much if any immutable data, so
> unfortunately I don't expect to be able to take advantage of Phoenix's
> optimizations there.
>
> We're expecting many indexes, mostly likely with several per logical
> per-tenant table. Given that global indexes are implemented as physical
> HBase tables, will the view-oriented optimizations help very much? We've
> done some experiments with local indexes on 4.13.2 and found bugs,
> particularly with the rewrite optimization to read non-covered columns from
> the main table, so we're not confident in using local indexes to optimize
> queries. (I've looked through the 5.0-alpha release notes and couldn't find
> anything related to this issue, so if desired I'll collect info for a
> separate bug report.)
>
> Miles Spielberg
> Staff Software Engineer
>
>
> O. 650.485.1102 <(650)%20485-1102>
> 900 Jefferson Ave
> 
> Redwood City, CA 94063
> 
>
> On Fri, Feb 16, 2018 at 2:49 PM, James Taylor 
> wrote:
>
>> Hi Miles,
>> You'll be fine if you use views [1] and multi-tenancy [2] to limit the
>> number of physical HBase tables. Make sure you read about the limitations
>> of views too [3].
>>
>> Here's the way I've seen this modeled successfully:
>> - create one schema per use case. This will let you leverage some nice
>> features in HBase for quotas and throttling. If you'll have a single use
>> case, you don't have to worry about it. Read about namespaces here [4] and
>> make sure to enable them before you start creating tables.
>> - define an immutable, multi-tenant base table that has TENANT_ID +
>> TYPE_ID primary key. There are optimizations Phoenix does over immutable
>> tables that you'll want to leverage (assuming you have use cases that fit
>> into this category). This Phoenix table will be backed by a physical
>> HBase table, but you won't execute Phoenix DML against it. Think of it as a
>> kind of "abstract" type. Instead, you'll create updatable views over it.
>> - define a regular/mutable, multi-tenant base table that has TENANT_ID +
>> TYPE_ID primary key. Same deal as above, but this would be the base table
>> for any tables in which the rows change in place.
>> - define global views per "logical" table (against either your immutable
>> base table or mutable base table depending on the functionality needed)
>> with each view having a WHERE TYPE_ID='your type identifier' clause which
>> adds specific columns to the primary key. This view will be updatable (i.e.
>> you can execute DML against it). The columns you add to your PK will depend
>> on your most common query patterns.
>> - optionally define indexes on these global views.
>> - each tenant can further extend or just use the global views.
>>
>> FYI, lots of good performance/tuning tips can be found here[5].
>>
>> Thanks,
>> James
>>
>>
>> [1] https://phoenix.apache.org/views.html
>> [2] https://phoenix.apache.org/multi-tenancy.html
>> [3] https://phoenix.apache.org/views.html#Limitations
>> [4] https://phoenix.apache.org/namspace_mapping.html
>> [5] https://phoenix.apache.org/tuning_guide.html
>>
>> On Fri, Feb 16, 2018 at 11:47 AM, Miles Spielberg  wrote:
>>
>>> We're looking at employing Phoenix in a multi-tenant use case where
>>> tenants can create their own tables and indexes, running into totals of
>>> tens-of-thousands of each. Is this a supported scenario, or are we headed
>>> for trouble?
>>>
>>
>>
>


Re: Runtime DDL supported?

2018-02-16 Thread Miles Spielberg
Hi James,

Thanks for the tips around reducing the number of physical tables while
still maintaining the appearance of multiple tables via view definitions.
In our use case we don't anticipate having much if any immutable data, so
unfortunately I don't expect to be able to take advantage of Phoenix's
optimizations there.

We're expecting many indexes, mostly likely with several per logical
per-tenant table. Given that global indexes are implemented as physical
HBase tables, will the view-oriented optimizations help very much? We've
done some experiments with local indexes on 4.13.2 and found bugs,
particularly with the rewrite optimization to read non-covered columns from
the main table, so we're not confident in using local indexes to optimize
queries. (I've looked through the 5.0-alpha release notes and couldn't find
anything related to this issue, so if desired I'll collect info for a
separate bug report.)

Miles Spielberg
Staff Software Engineer


O. 650.485.1102
900 Jefferson Ave
Redwood City, CA 94063

On Fri, Feb 16, 2018 at 2:49 PM, James Taylor 
wrote:

> Hi Miles,
> You'll be fine if you use views [1] and multi-tenancy [2] to limit the
> number of physical HBase tables. Make sure you read about the limitations
> of views too [3].
>
> Here's the way I've seen this modeled successfully:
> - create one schema per use case. This will let you leverage some nice
> features in HBase for quotas and throttling. If you'll have a single use
> case, you don't have to worry about it. Read about namespaces here [4] and
> make sure to enable them before you start creating tables.
> - define an immutable, multi-tenant base table that has TENANT_ID +
> TYPE_ID primary key. There are optimizations Phoenix does over immutable
> tables that you'll want to leverage (assuming you have use cases that fit
> into this category). This Phoenix table will be backed by a physical
> HBase table, but you won't execute Phoenix DML against it. Think of it as a
> kind of "abstract" type. Instead, you'll create updatable views over it.
> - define a regular/mutable, multi-tenant base table that has TENANT_ID +
> TYPE_ID primary key. Same deal as above, but this would be the base table
> for any tables in which the rows change in place.
> - define global views per "logical" table (against either your immutable
> base table or mutable base table depending on the functionality needed)
> with each view having a WHERE TYPE_ID='your type identifier' clause which
> adds specific columns to the primary key. This view will be updatable (i.e.
> you can execute DML against it). The columns you add to your PK will depend
> on your most common query patterns.
> - optionally define indexes on these global views.
> - each tenant can further extend or just use the global views.
>
> FYI, lots of good performance/tuning tips can be found here[5].
>
> Thanks,
> James
>
>
> [1] https://phoenix.apache.org/views.html
> [2] https://phoenix.apache.org/multi-tenancy.html
> [3] https://phoenix.apache.org/views.html#Limitations
> [4] https://phoenix.apache.org/namspace_mapping.html
> [5] https://phoenix.apache.org/tuning_guide.html
>
> On Fri, Feb 16, 2018 at 11:47 AM, Miles Spielberg  wrote:
>
>> We're looking at employing Phoenix in a multi-tenant use case where
>> tenants can create their own tables and indexes, running into totals of
>> tens-of-thousands of each. Is this a supported scenario, or are we headed
>> for trouble?
>>
>
>


Re: Runtime DDL supported?

2018-02-16 Thread James Taylor
Hi Miles,
You'll be fine if you use views [1] and multi-tenancy [2] to limit the
number of physical HBase tables. Make sure you read about the limitations
of views too [3].

Here's the way I've seen this modeled successfully:
- create one schema per use case. This will let you leverage some nice
features in HBase for quotas and throttling. If you'll have a single use
case, you don't have to worry about it. Read about namespaces here [4] and
make sure to enable them before you start creating tables.
- define an immutable, multi-tenant base table that has TENANT_ID + TYPE_ID
primary key. There are optimizations Phoenix does over immutable tables
that you'll want to leverage (assuming you have use cases that fit into
this category). This Phoenix table will be backed by a physical HBase
table, but you won't execute Phoenix DML against it. Think of it as a kind
of "abstract" type. Instead, you'll create updatable views over it.
- define a regular/mutable, multi-tenant base table that has TENANT_ID +
TYPE_ID primary key. Same deal as above, but this would be the base table
for any tables in which the rows change in place.
- define global views per "logical" table (against either your immutable
base table or mutable base table depending on the functionality needed)
with each view having a WHERE TYPE_ID='your type identifier' clause which
adds specific columns to the primary key. This view will be updatable (i.e.
you can execute DML against it). The columns you add to your PK will depend
on your most common query patterns.
- optionally define indexes on these global views.
- each tenant can further extend or just use the global views.

FYI, lots of good performance/tuning tips can be found here[5].

Thanks,
James


[1] https://phoenix.apache.org/views.html
[2] https://phoenix.apache.org/multi-tenancy.html
[3] https://phoenix.apache.org/views.html#Limitations
[4] https://phoenix.apache.org/namspace_mapping.html
[5] https://phoenix.apache.org/tuning_guide.html

On Fri, Feb 16, 2018 at 11:47 AM, Miles Spielberg  wrote:

> We're looking at employing Phoenix in a multi-tenant use case where
> tenants can create their own tables and indexes, running into totals of
> tens-of-thousands of each. Is this a supported scenario, or are we headed
> for trouble?
>


Runtime DDL supported?

2018-02-16 Thread Miles Spielberg
We're looking at employing Phoenix in a multi-tenant use case where tenants
can create their own tables and indexes, running into totals of
tens-of-thousands of each. Is this a supported scenario, or are we headed
for trouble?