Re: Runtime DDL supported?

James Taylor Wed, 21 Feb 2018 21:09:37 -0800

Have you tried a UNION ALL query on (f1, f2, f3) instead? It seems you’re
on a good track with multiple views over a single (or handful) of physical
table(s).


On Wed, Feb 21, 2018 at 6:45 PM Miles Spielberg <mi...@box.com> wrote:

> I've done some experimentation with views, with a schema resembling this:
>
> create table t1(
>>
>>     pk bigint not null primary key
>>
>> );
>>
>>
>>> create view v1(
>>
>>     f1 varchar
>>
>> ) AS SELECT * FROM t1;
>>
>> create INDEX v1_f1 ON v1(f1);
>>
>>
>>> create view v2(
>>
>>     f2 varchar
>>
>> ) AS SELECT * FROM t1;
>>
>> create INDEX v2_f2 ON v2(f2);
>>
>>
>>> create view v3(
>>
>>     f3 varchar
>>
>> ) AS SELECT * FROM t1;
>>
>> create INDEX v3_f3 ON v3(f3);
>>
>>
> Most of the time we'll be accessing data via the indexed views, but we'd
> also like to be able to query all columns (f1, f2, f3) for a given pk. At
> the HBase level, this should be doable as a point get on t1. The SQL-y
> way to express this would probably be with JOINs, but the EXPLAIN plan is
> not encouraging.
>
> > explain SELECT * from t1 left join v1 on v1.pk=t1.pk left join v2 on
>> v2.pk=t1.pk left  join v3 on v3.pk=t1.pk where t1.pk=12345;
>> | CLIENT 1-CHUNK 1 ROWS 281 BYTES PARALLEL 1-WAY ROUND ROBIN POINT LOOKUP
>> ON 1 KEY OVER T1
>> |     PARALLEL LEFT-JOIN TABLE 0
>> |         CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>> |     PARALLEL LEFT-JOIN TABLE 1
>> |         CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>> |     PARALLEL LEFT-JOIN TABLE 2
>> |         CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>>
>
> This is pushing me back towards a design of having a single table, except
> for the issue of proliferating physical HBase tables for the indexes. Would
> you advise having a single table + a single view on it containing all
> columns, to coerce Phoenix to consolidate the indexes into a single
> physical table? Are there other alternatives we should be considering?
>
> Miles Spielberg
> Staff Software Engineer
>
>
> O. 650.485.1102
> 900 Jefferson Ave
> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g>
> Redwood City
> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g>,
> CA 94063
> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g>
>
> On Fri, Feb 16, 2018 at 3:27 PM, James Taylor <jamestay...@apache.org>
> wrote:
>
>> All indexes on views are stored in a single physical table, so you'll be
>> ok in that regard.
>>
>> If you could file bugs for any local index issues, we'd really appreciate
>> it. We've been steadily improving local indexes (see PHOENIX-3941 for some
>> recent perf improvements - applicable for multi-tenant tables in particular
>> - these will appear in our 4.14 release). Handling non covered columns is
>> pretty isolated, so we should be able to fix bugs you find. Plus, there's a
>> workaround - you can cover your indexes until any issues are fixed.
>>
>> Global, mutable indexes have had many improvements over the last several
>> releases too, but there's more operational overhead if/when a data table
>> gets out of sync with it's index table (plus some amount of configurable
>> eventual consistency or index disablement). With local indexes (and HBase
>> 1.3), this isn't possible.
>>
>> Thanks,
>> James
>>
>> On Fri, Feb 16, 2018 at 3:10 PM, Miles Spielberg <mi...@box.com> wrote:
>>
>>> Hi James,
>>>
>>> Thanks for the tips around reducing the number of physical tables while
>>> still maintaining the appearance of multiple tables via view definitions.
>>> In our use case we don't anticipate having much if any immutable data, so
>>> unfortunately I don't expect to be able to take advantage of Phoenix's
>>> optimizations there.
>>>
>>> We're expecting many indexes, mostly likely with several per logical
>>> per-tenant table. Given that global indexes are implemented as physical
>>> HBase tables, will the view-oriented optimizations help very much? We've
>>> done some experiments with local indexes on 4.13.2 and found bugs,
>>> particularly with the rewrite optimization to read non-covered columns from
>>> the main table, so we're not confident in using local indexes to optimize
>>> queries. (I've looked through the 5.0-alpha release notes and couldn't find
>>> anything related to this issue, so if desired I'll collect info for a
>>> separate bug report.)
>>>
>>> Miles Spielberg
>>> Staff Software Engineer
>>>
>>>
>>> O. 650.485.1102 <(650)%20485-1102>
>>> 900 Jefferson Ave
>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g>
>>> Redwood City
>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063+%3Chttps://maps.google.com/?q%3D900%2BJefferson%2BAve%2B%250D%2B%250D%2BRedwood%2BCity,%2BCA%2B94063%26entry%3Dgmail%26source%3Dg%3E&entry=gmail&source=g>,
>>> CA 94063
>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g>
>>>
>>> On Fri, Feb 16, 2018 at 2:49 PM, James Taylor <jamestay...@apache.org>
>>> wrote:
>>>
>>>> Hi Miles,
>>>> You'll be fine if you use views [1] and multi-tenancy [2] to limit the
>>>> number of physical HBase tables. Make sure you read about the limitations
>>>> of views too [3].
>>>>
>>>> Here's the way I've seen this modeled successfully:
>>>> - create one schema per use case. This will let you leverage some nice
>>>> features in HBase for quotas and throttling. If you'll have a single use
>>>> case, you don't have to worry about it. Read about namespaces here [4] and
>>>> make sure to enable them before you start creating tables.
>>>> - define an immutable, multi-tenant base table that has TENANT_ID +
>>>> TYPE_ID primary key. There are optimizations Phoenix does over immutable
>>>> tables that you'll want to leverage (assuming you have use cases that fit
>>>> into this category). This Phoenix table will be backed by a physical
>>>> HBase table, but you won't execute Phoenix DML against it. Think of it as a
>>>> kind of "abstract" type. Instead, you'll create updatable views over it.
>>>> - define a regular/mutable, multi-tenant base table that has TENANT_ID
>>>> + TYPE_ID primary key. Same deal as above, but this would be the base table
>>>> for any tables in which the rows change in place.
>>>> - define global views per "logical" table (against either your
>>>> immutable base table or mutable base table depending on the functionality
>>>> needed) with each view having a WHERE TYPE_ID='your type identifier' clause
>>>> which adds specific columns to the primary key. This view will be updatable
>>>> (i.e. you can execute DML against it). The columns you add to your PK will
>>>> depend on your most common query patterns.
>>>> - optionally define indexes on these global views.
>>>> - each tenant can further extend or just use the global views.
>>>>
>>>> FYI, lots of good performance/tuning tips can be found here[5].
>>>>
>>>> Thanks,
>>>> James
>>>>
>>>>
>>>> [1] https://phoenix.apache.org/views.html
>>>> [2] https://phoenix.apache.org/multi-tenancy.html
>>>> [3] https://phoenix.apache.org/views.html#Limitations
>>>> [4] https://phoenix.apache.org/namspace_mapping.html
>>>> [5] https://phoenix.apache.org/tuning_guide.html
>>>>
>>>> On Fri, Feb 16, 2018 at 11:47 AM, Miles Spielberg <mi...@box.com>
>>>> wrote:
>>>>
>>>>> We're looking at employing Phoenix in a multi-tenant use case where
>>>>> tenants can create their own tables and indexes, running into totals of
>>>>> tens-of-thousands of each. Is this a supported scenario, or are we headed
>>>>> for trouble?
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Runtime DDL supported?

Reply via email to