Re: Runtime DDL supported?

James Taylor Thu, 22 Feb 2018 07:43:16 -0800

Another option would be to use dynamic columns[1] when querying across
views. You’d have to disable column encoding [2] in this case.


[1] http://phoenix.apache.org/dynamic_columns.html
[2] http://phoenix.apache.org/columnencoding.html

On Wed, Feb 21, 2018 at 11:57 PM Miles Spielberg <[email protected]> wrote:

> I believe each query in a UNION needs to have the same result tuple
> format, which would work in this toy example, but in the general case each
> view would have a different schema. We could make the result tuples conform
> with each other by selecting NULL literals for every column except those in
> a view. It would get quite verbose though. Assuming f1,f2,f3 all have
> incompatible types, were you suggesting something like this?
>
> Select f1, null, null from v1 where PK=?
> Union all
> Select null, f2, null from v2 where PK=?
> Union all
> Select null, null, f3 from v3 where PK=?
>
> We might just run separate parallel queries against each view and merge
> the results client side. I would guess this should perform well since the
> block cache can be leveraged for queries after the first.
>
> We could also use the HBase API to run a point row get. We'd have to
> reimplement decoding for Phoenix's column values, which is not ideal but
> quite doable.
>
> Sent from my iPhone
>
> On Feb 21, 2018, at 9:09 PM, James Taylor <[email protected]> wrote:
>
> Have you tried a UNION ALL query on (f1, f2, f3) instead? It seems you’re
> on a good track with multiple views over a single (or handful) of physical
> table(s).
>
> On Wed, Feb 21, 2018 at 6:45 PM Miles Spielberg <[email protected]> wrote:
>
>> I've done some experimentation with views, with a schema resembling this:
>>
>> create table t1(
>>>
>>>     pk bigint not null primary key
>>>
>>> );
>>>
>>>
>>>> create view v1(
>>>
>>>     f1 varchar
>>>
>>> ) AS SELECT * FROM t1;
>>>
>>> create INDEX v1_f1 ON v1(f1);
>>>
>>>
>>>> create view v2(
>>>
>>>     f2 varchar
>>>
>>> ) AS SELECT * FROM t1;
>>>
>>> create INDEX v2_f2 ON v2(f2);
>>>
>>>
>>>> create view v3(
>>>
>>>     f3 varchar
>>>
>>> ) AS SELECT * FROM t1;
>>>
>>> create INDEX v3_f3 ON v3(f3);
>>>
>>>
>> Most of the time we'll be accessing data via the indexed views, but we'd
>> also like to be able to query all columns (f1, f2, f3) for a given pk. At
>> the HBase level, this should be doable as a point get on t1. The SQL-y
>> way to express this would probably be with JOINs, but the EXPLAIN plan is
>> not encouraging.
>>
>> > explain SELECT * from t1 left join v1 on v1.pk=t1.pk left join v2 on
>>> v2.pk=t1.pk left  join v3 on v3.pk=t1.pk where t1.pk=12345;
>>> | CLIENT 1-CHUNK 1 ROWS 281 BYTES PARALLEL 1-WAY ROUND ROBIN POINT
>>> LOOKUP ON 1 KEY OVER T1
>>> |     PARALLEL LEFT-JOIN TABLE 0
>>> |         CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>>> |     PARALLEL LEFT-JOIN TABLE 1
>>> |         CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>>> |     PARALLEL LEFT-JOIN TABLE 2
>>> |         CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1
>>>
>>
>> This is pushing me back towards a design of having a single table, except
>> for the issue of proliferating physical HBase tables for the indexes. Would
>> you advise having a single table + a single view on it containing all
>> columns, to coerce Phoenix to consolidate the indexes into a single
>> physical table? Are there other alternatives we should be considering?
>>
>> Miles Spielberg
>> Staff Software Engineer
>>
>>
>> O. 650.485.1102
>> 900 Jefferson Ave
>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g>
>> Redwood City
>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g>,
>> CA 94063
>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g>
>>
>> On Fri, Feb 16, 2018 at 3:27 PM, James Taylor <[email protected]>
>> wrote:
>>
>>> All indexes on views are stored in a single physical table, so you'll be
>>> ok in that regard.
>>>
>>> If you could file bugs for any local index issues, we'd really
>>> appreciate it. We've been steadily improving local indexes
>>> (see PHOENIX-3941 for some recent perf improvements - applicable for
>>> multi-tenant tables in particular - these will appear in our 4.14 release).
>>> Handling non covered columns is pretty isolated, so we should be able to
>>> fix bugs you find. Plus, there's a workaround - you can cover your indexes
>>> until any issues are fixed.
>>>
>>> Global, mutable indexes have had many improvements over the last several
>>> releases too, but there's more operational overhead if/when a data table
>>> gets out of sync with it's index table (plus some amount of configurable
>>> eventual consistency or index disablement). With local indexes (and HBase
>>> 1.3), this isn't possible.
>>>
>>> Thanks,
>>> James
>>>
>>> On Fri, Feb 16, 2018 at 3:10 PM, Miles Spielberg <[email protected]> wrote:
>>>
>>>> Hi James,
>>>>
>>>> Thanks for the tips around reducing the number of physical tables while
>>>> still maintaining the appearance of multiple tables via view definitions.
>>>> In our use case we don't anticipate having much if any immutable data, so
>>>> unfortunately I don't expect to be able to take advantage of Phoenix's
>>>> optimizations there.
>>>>
>>>> We're expecting many indexes, mostly likely with several per logical
>>>> per-tenant table. Given that global indexes are implemented as physical
>>>> HBase tables, will the view-oriented optimizations help very much? We've
>>>> done some experiments with local indexes on 4.13.2 and found bugs,
>>>> particularly with the rewrite optimization to read non-covered columns from
>>>> the main table, so we're not confident in using local indexes to optimize
>>>> queries. (I've looked through the 5.0-alpha release notes and couldn't find
>>>> anything related to this issue, so if desired I'll collect info for a
>>>> separate bug report.)
>>>>
>>>> Miles Spielberg
>>>> Staff Software Engineer
>>>>
>>>>
>>>> O. 650.485.1102 <(650)%20485-1102>
>>>> 900 Jefferson Ave
>>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g>
>>>> Redwood City
>>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063+%3Chttps://maps.google.com/?q%3D900%2BJefferson%2BAve%2B%250D%2B%250D%2BRedwood%2BCity,%2BCA%2B94063%26entry%3Dgmail%26source%3Dg%3E&entry=gmail&source=g>,
>>>> CA 94063
>>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g>
>>>>
>>>> On Fri, Feb 16, 2018 at 2:49 PM, James Taylor <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Miles,
>>>>> You'll be fine if you use views [1] and multi-tenancy [2] to limit the
>>>>> number of physical HBase tables. Make sure you read about the limitations
>>>>> of views too [3].
>>>>>
>>>>> Here's the way I've seen this modeled successfully:
>>>>> - create one schema per use case. This will let you leverage some nice
>>>>> features in HBase for quotas and throttling. If you'll have a single use
>>>>> case, you don't have to worry about it. Read about namespaces here [4] and
>>>>> make sure to enable them before you start creating tables.
>>>>> - define an immutable, multi-tenant base table that has TENANT_ID +
>>>>> TYPE_ID primary key. There are optimizations Phoenix does over immutable
>>>>> tables that you'll want to leverage (assuming you have use cases that fit
>>>>> into this category). This Phoenix table will be backed by a physical
>>>>> HBase table, but you won't execute Phoenix DML against it. Think of it as 
>>>>> a
>>>>> kind of "abstract" type. Instead, you'll create updatable views over it.
>>>>> - define a regular/mutable, multi-tenant base table that has
>>>>> TENANT_ID + TYPE_ID primary key. Same deal as above, but this would be the
>>>>> base table for any tables in which the rows change in place.
>>>>> - define global views per "logical" table (against either your
>>>>> immutable base table or mutable base table depending on the functionality
>>>>> needed) with each view having a WHERE TYPE_ID='your type identifier' 
>>>>> clause
>>>>> which adds specific columns to the primary key. This view will be 
>>>>> updatable
>>>>> (i.e. you can execute DML against it). The columns you add to your PK will
>>>>> depend on your most common query patterns.
>>>>> - optionally define indexes on these global views.
>>>>> - each tenant can further extend or just use the global views.
>>>>>
>>>>> FYI, lots of good performance/tuning tips can be found here[5].
>>>>>
>>>>> Thanks,
>>>>> James
>>>>>
>>>>>
>>>>> [1] https://phoenix.apache.org/views.html
>>>>> [2] https://phoenix.apache.org/multi-tenancy.html
>>>>> [3] https://phoenix.apache.org/views.html#Limitations
>>>>> [4] https://phoenix.apache.org/namspace_mapping.html
>>>>> [5] https://phoenix.apache.org/tuning_guide.html
>>>>>
>>>>> On Fri, Feb 16, 2018 at 11:47 AM, Miles Spielberg <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> We're looking at employing Phoenix in a multi-tenant use case where
>>>>>> tenants can create their own tables and indexes, running into totals of
>>>>>> tens-of-thousands of each. Is this a supported scenario, or are we headed
>>>>>> for trouble?
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>

Re: Runtime DDL supported?

Reply via email to