Have you tried a UNION ALL query on (f1, f2, f3) instead? It seems you’re on a good track with multiple views over a single (or handful) of physical table(s).
On Wed, Feb 21, 2018 at 6:45 PM Miles Spielberg <mi...@box.com> wrote: > I've done some experimentation with views, with a schema resembling this: > > create table t1( >> >> pk bigint not null primary key >> >> ); >> >> >>> create view v1( >> >> f1 varchar >> >> ) AS SELECT * FROM t1; >> >> create INDEX v1_f1 ON v1(f1); >> >> >>> create view v2( >> >> f2 varchar >> >> ) AS SELECT * FROM t1; >> >> create INDEX v2_f2 ON v2(f2); >> >> >>> create view v3( >> >> f3 varchar >> >> ) AS SELECT * FROM t1; >> >> create INDEX v3_f3 ON v3(f3); >> >> > Most of the time we'll be accessing data via the indexed views, but we'd > also like to be able to query all columns (f1, f2, f3) for a given pk. At > the HBase level, this should be doable as a point get on t1. The SQL-y > way to express this would probably be with JOINs, but the EXPLAIN plan is > not encouraging. > > > explain SELECT * from t1 left join v1 on v1.pk=t1.pk left join v2 on >> v2.pk=t1.pk left join v3 on v3.pk=t1.pk where t1.pk=12345; >> | CLIENT 1-CHUNK 1 ROWS 281 BYTES PARALLEL 1-WAY ROUND ROBIN POINT LOOKUP >> ON 1 KEY OVER T1 >> | PARALLEL LEFT-JOIN TABLE 0 >> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1 >> | PARALLEL LEFT-JOIN TABLE 1 >> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1 >> | PARALLEL LEFT-JOIN TABLE 2 >> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1 >> > > This is pushing me back towards a design of having a single table, except > for the issue of proliferating physical HBase tables for the indexes. Would > you advise having a single table + a single view on it containing all > columns, to coerce Phoenix to consolidate the indexes into a single > physical table? Are there other alternatives we should be considering? > > Miles Spielberg > Staff Software Engineer > > > O. 650.485.1102 > 900 Jefferson Ave > <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g> > Redwood City > <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g>, > CA 94063 > <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g> > > On Fri, Feb 16, 2018 at 3:27 PM, James Taylor <jamestay...@apache.org> > wrote: > >> All indexes on views are stored in a single physical table, so you'll be >> ok in that regard. >> >> If you could file bugs for any local index issues, we'd really appreciate >> it. We've been steadily improving local indexes (see PHOENIX-3941 for some >> recent perf improvements - applicable for multi-tenant tables in particular >> - these will appear in our 4.14 release). Handling non covered columns is >> pretty isolated, so we should be able to fix bugs you find. Plus, there's a >> workaround - you can cover your indexes until any issues are fixed. >> >> Global, mutable indexes have had many improvements over the last several >> releases too, but there's more operational overhead if/when a data table >> gets out of sync with it's index table (plus some amount of configurable >> eventual consistency or index disablement). With local indexes (and HBase >> 1.3), this isn't possible. >> >> Thanks, >> James >> >> On Fri, Feb 16, 2018 at 3:10 PM, Miles Spielberg <mi...@box.com> wrote: >> >>> Hi James, >>> >>> Thanks for the tips around reducing the number of physical tables while >>> still maintaining the appearance of multiple tables via view definitions. >>> In our use case we don't anticipate having much if any immutable data, so >>> unfortunately I don't expect to be able to take advantage of Phoenix's >>> optimizations there. >>> >>> We're expecting many indexes, mostly likely with several per logical >>> per-tenant table. Given that global indexes are implemented as physical >>> HBase tables, will the view-oriented optimizations help very much? We've >>> done some experiments with local indexes on 4.13.2 and found bugs, >>> particularly with the rewrite optimization to read non-covered columns from >>> the main table, so we're not confident in using local indexes to optimize >>> queries. (I've looked through the 5.0-alpha release notes and couldn't find >>> anything related to this issue, so if desired I'll collect info for a >>> separate bug report.) >>> >>> Miles Spielberg >>> Staff Software Engineer >>> >>> >>> O. 650.485.1102 <(650)%20485-1102> >>> 900 Jefferson Ave >>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g> >>> Redwood City >>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063+%3Chttps://maps.google.com/?q%3D900%2BJefferson%2BAve%2B%250D%2B%250D%2BRedwood%2BCity,%2BCA%2B94063%26entry%3Dgmail%26source%3Dg%3E&entry=gmail&source=g>, >>> CA 94063 >>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g> >>> >>> On Fri, Feb 16, 2018 at 2:49 PM, James Taylor <jamestay...@apache.org> >>> wrote: >>> >>>> Hi Miles, >>>> You'll be fine if you use views [1] and multi-tenancy [2] to limit the >>>> number of physical HBase tables. Make sure you read about the limitations >>>> of views too [3]. >>>> >>>> Here's the way I've seen this modeled successfully: >>>> - create one schema per use case. This will let you leverage some nice >>>> features in HBase for quotas and throttling. If you'll have a single use >>>> case, you don't have to worry about it. Read about namespaces here [4] and >>>> make sure to enable them before you start creating tables. >>>> - define an immutable, multi-tenant base table that has TENANT_ID + >>>> TYPE_ID primary key. There are optimizations Phoenix does over immutable >>>> tables that you'll want to leverage (assuming you have use cases that fit >>>> into this category). This Phoenix table will be backed by a physical >>>> HBase table, but you won't execute Phoenix DML against it. Think of it as a >>>> kind of "abstract" type. Instead, you'll create updatable views over it. >>>> - define a regular/mutable, multi-tenant base table that has TENANT_ID >>>> + TYPE_ID primary key. Same deal as above, but this would be the base table >>>> for any tables in which the rows change in place. >>>> - define global views per "logical" table (against either your >>>> immutable base table or mutable base table depending on the functionality >>>> needed) with each view having a WHERE TYPE_ID='your type identifier' clause >>>> which adds specific columns to the primary key. This view will be updatable >>>> (i.e. you can execute DML against it). The columns you add to your PK will >>>> depend on your most common query patterns. >>>> - optionally define indexes on these global views. >>>> - each tenant can further extend or just use the global views. >>>> >>>> FYI, lots of good performance/tuning tips can be found here[5]. >>>> >>>> Thanks, >>>> James >>>> >>>> >>>> [1] https://phoenix.apache.org/views.html >>>> [2] https://phoenix.apache.org/multi-tenancy.html >>>> [3] https://phoenix.apache.org/views.html#Limitations >>>> [4] https://phoenix.apache.org/namspace_mapping.html >>>> [5] https://phoenix.apache.org/tuning_guide.html >>>> >>>> On Fri, Feb 16, 2018 at 11:47 AM, Miles Spielberg <mi...@box.com> >>>> wrote: >>>> >>>>> We're looking at employing Phoenix in a multi-tenant use case where >>>>> tenants can create their own tables and indexes, running into totals of >>>>> tens-of-thousands of each. Is this a supported scenario, or are we headed >>>>> for trouble? >>>>> >>>> >>>> >>> >> >