I believe each query in a UNION needs to have the same result tuple format, which would work in this toy example, but in the general case each view would have a different schema. We could make the result tuples conform with each other by selecting NULL literals for every column except those in a view. It would get quite verbose though. Assuming f1,f2,f3 all have incompatible types, were you suggesting something like this?
Select f1, null, null from v1 where PK=? Union all Select null, f2, null from v2 where PK=? Union all Select null, null, f3 from v3 where PK=? We might just run separate parallel queries against each view and merge the results client side. I would guess this should perform well since the block cache can be leveraged for queries after the first. We could also use the HBase API to run a point row get. We'd have to reimplement decoding for Phoenix's column values, which is not ideal but quite doable. Sent from my iPhone > On Feb 21, 2018, at 9:09 PM, James Taylor <jamestay...@apache.org> wrote: > > Have you tried a UNION ALL query on (f1, f2, f3) instead? It seems you’re on > a good track with multiple views over a single (or handful) of physical > table(s). > >> On Wed, Feb 21, 2018 at 6:45 PM Miles Spielberg <mi...@box.com> wrote: >> I've done some experimentation with views, with a schema resembling this: >> >>>> create table t1( >>>> pk bigint not null primary key >>>> ); >>>> >>>> create view v1( >>>> f1 varchar >>>> ) AS SELECT * FROM t1; >>>> create INDEX v1_f1 ON v1(f1); >>>> >>>> create view v2( >>>> f2 varchar >>>> ) AS SELECT * FROM t1; >>>> create INDEX v2_f2 ON v2(f2); >>>> >>>> create view v3( >>>> f3 varchar >>>> ) AS SELECT * FROM t1; >>>> create INDEX v3_f3 ON v3(f3); >> >> Most of the time we'll be accessing data via the indexed views, but we'd >> also like to be able to query all columns (f1, f2, f3) for a given pk. At >> the HBase level, this should be doable as a point get on t1. The SQL-y way >> to express this would probably be with JOINs, but the EXPLAIN plan is not >> encouraging. >> >>> > explain SELECT * from t1 left join v1 on v1.pk=t1.pk left join v2 on >>> > v2.pk=t1.pk left join v3 on v3.pk=t1.pk where t1.pk=12345; >>> | CLIENT 1-CHUNK 1 ROWS 281 BYTES PARALLEL 1-WAY ROUND ROBIN POINT LOOKUP >>> ON 1 KEY OVER T1 >>> | PARALLEL LEFT-JOIN TABLE 0 >>> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1 >>> | PARALLEL LEFT-JOIN TABLE 1 >>> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1 >>> | PARALLEL LEFT-JOIN TABLE 2 >>> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1 >> >> This is pushing me back towards a design of having a single table, except >> for the issue of proliferating physical HBase tables for the indexes. Would >> you advise having a single table + a single view on it containing all >> columns, to coerce Phoenix to consolidate the indexes into a single physical >> table? Are there other alternatives we should be considering? >> >> Miles Spielberg >> Staff Software Engineer >> >> O. 650.485.1102 >> 900 Jefferson Ave >> Redwood City, CA 94063 >> >> >>> On Fri, Feb 16, 2018 at 3:27 PM, James Taylor <jamestay...@apache.org> >>> wrote: >>> All indexes on views are stored in a single physical table, so you'll be ok >>> in that regard. >>> >>> If you could file bugs for any local index issues, we'd really appreciate >>> it. We've been steadily improving local indexes (see PHOENIX-3941 for some >>> recent perf improvements - applicable for multi-tenant tables in particular >>> - these will appear in our 4.14 release). Handling non covered columns is >>> pretty isolated, so we should be able to fix bugs you find. Plus, there's a >>> workaround - you can cover your indexes until any issues are fixed. >>> >>> Global, mutable indexes have had many improvements over the last several >>> releases too, but there's more operational overhead if/when a data table >>> gets out of sync with it's index table (plus some amount of configurable >>> eventual consistency or index disablement). With local indexes (and HBase >>> 1.3), this isn't possible. >>> >>> Thanks, >>> James >>> >>>> On Fri, Feb 16, 2018 at 3:10 PM, Miles Spielberg <mi...@box.com> wrote: >>>> Hi James, >>>> >>>> Thanks for the tips around reducing the number of physical tables while >>>> still maintaining the appearance of multiple tables via view definitions. >>>> In our use case we don't anticipate having much if any immutable data, so >>>> unfortunately I don't expect to be able to take advantage of Phoenix's >>>> optimizations there. >>>> >>>> We're expecting many indexes, mostly likely with several per logical >>>> per-tenant table. Given that global indexes are implemented as physical >>>> HBase tables, will the view-oriented optimizations help very much? We've >>>> done some experiments with local indexes on 4.13.2 and found bugs, >>>> particularly with the rewrite optimization to read non-covered columns >>>> from the main table, so we're not confident in using local indexes to >>>> optimize queries. (I've looked through the 5.0-alpha release notes and >>>> couldn't find anything related to this issue, so if desired I'll collect >>>> info for a separate bug report.) >>>> >>>> Miles Spielberg >>>> Staff Software Engineer >>>> >>>> O. 650.485.1102 >>>> 900 Jefferson Ave >>>> Redwood City, CA 94063 >>>> >>>> >>>>> On Fri, Feb 16, 2018 at 2:49 PM, James Taylor <jamestay...@apache.org> >>>>> wrote: >>>>> Hi Miles, >>>>> You'll be fine if you use views [1] and multi-tenancy [2] to limit the >>>>> number of physical HBase tables. Make sure you read about the limitations >>>>> of views too [3]. >>>>> >>>>> Here's the way I've seen this modeled successfully: >>>>> - create one schema per use case. This will let you leverage some nice >>>>> features in HBase for quotas and throttling. If you'll have a single use >>>>> case, you don't have to worry about it. Read about namespaces here [4] >>>>> and make sure to enable them before you start creating tables. >>>>> - define an immutable, multi-tenant base table that has TENANT_ID + >>>>> TYPE_ID primary key. There are optimizations Phoenix does over immutable >>>>> tables that you'll want to leverage (assuming you have use cases that fit >>>>> into this category). This Phoenix table will be backed by a physical >>>>> HBase table, but you won't execute Phoenix DML against it. Think of it as >>>>> a kind of "abstract" type. Instead, you'll create updatable views over it. >>>>> - define a regular/mutable, multi-tenant base table that has TENANT_ID + >>>>> TYPE_ID primary key. Same deal as above, but this would be the base table >>>>> for any tables in which the rows change in place. >>>>> - define global views per "logical" table (against either your immutable >>>>> base table or mutable base table depending on the functionality needed) >>>>> with each view having a WHERE TYPE_ID='your type identifier' clause which >>>>> adds specific columns to the primary key. This view will be updatable >>>>> (i.e. you can execute DML against it). The columns you add to your PK >>>>> will depend on your most common query patterns. >>>>> - optionally define indexes on these global views. >>>>> - each tenant can further extend or just use the global views. >>>>> >>>>> FYI, lots of good performance/tuning tips can be found here[5]. >>>>> >>>>> Thanks, >>>>> James >>>>> >>>>> >>>>> [1] https://phoenix.apache.org/views.html >>>>> [2] https://phoenix.apache.org/multi-tenancy.html >>>>> [3] https://phoenix.apache.org/views.html#Limitations >>>>> [4] https://phoenix.apache.org/namspace_mapping.html >>>>> [5] https://phoenix.apache.org/tuning_guide.html >>>>> >>>>>> On Fri, Feb 16, 2018 at 11:47 AM, Miles Spielberg <mi...@box.com> wrote: >>>>>> We're looking at employing Phoenix in a multi-tenant use case where >>>>>> tenants can create their own tables and indexes, running into totals of >>>>>> tens-of-thousands of each. Is this a supported scenario, or are we >>>>>> headed for trouble?