As we discussed, indexes across views are stored in a single HBase table associated with the original table (_IDX_<tablename>). That's grand for limiting the number of HBase tables created, but I just realized that the actual index data within is differentiated by the 16-bit "viewIndexId", which limits us to 64K indexes across all views for a given table. That's concerning for our use case, especially if its a cumulative autoincrement across all CREATE INDEX and DROP INDEX operations over the lifetime of the base table.
Is there any workaround for this? A quick grep across the source indicates that the length of viewIndexId is currently hard-coded. At least, this limitation should probably be added to the list of caveats and warnings at https://phoenix.apache.org/views.html. Miles Spielberg Staff Software Engineer O. 650.485.1102 900 Jefferson Ave Redwood City, CA 94063 On Thu, Feb 22, 2018 at 7:42 AM, James Taylor <jamestay...@apache.org> wrote: > Another option would be to use dynamic columns[1] when querying across > views. You’d have to disable column encoding [2] in this case. > > [1] http://phoenix.apache.org/dynamic_columns.html > [2] http://phoenix.apache.org/columnencoding.html > > On Wed, Feb 21, 2018 at 11:57 PM Miles Spielberg <mi...@box.com> wrote: > >> I believe each query in a UNION needs to have the same result tuple >> format, which would work in this toy example, but in the general case each >> view would have a different schema. We could make the result tuples conform >> with each other by selecting NULL literals for every column except those in >> a view. It would get quite verbose though. Assuming f1,f2,f3 all have >> incompatible types, were you suggesting something like this? >> >> Select f1, null, null from v1 where PK=? >> Union all >> Select null, f2, null from v2 where PK=? >> Union all >> Select null, null, f3 from v3 where PK=? >> >> We might just run separate parallel queries against each view and merge >> the results client side. I would guess this should perform well since the >> block cache can be leveraged for queries after the first. >> >> We could also use the HBase API to run a point row get. We'd have to >> reimplement decoding for Phoenix's column values, which is not ideal but >> quite doable. >> >> Sent from my iPhone >> >> On Feb 21, 2018, at 9:09 PM, James Taylor <jamestay...@apache.org> wrote: >> >> Have you tried a UNION ALL query on (f1, f2, f3) instead? It seems >> you’re on a good track with multiple views over a single (or handful) of >> physical table(s). >> >> On Wed, Feb 21, 2018 at 6:45 PM Miles Spielberg <mi...@box.com> wrote: >> >>> I've done some experimentation with views, with a schema resembling this: >>> >>> create table t1( >>>> >>>> pk bigint not null primary key >>>> >>>> ); >>>> >>>> >>>>> create view v1( >>>> >>>> f1 varchar >>>> >>>> ) AS SELECT * FROM t1; >>>> >>>> create INDEX v1_f1 ON v1(f1); >>>> >>>> >>>>> create view v2( >>>> >>>> f2 varchar >>>> >>>> ) AS SELECT * FROM t1; >>>> >>>> create INDEX v2_f2 ON v2(f2); >>>> >>>> >>>>> create view v3( >>>> >>>> f3 varchar >>>> >>>> ) AS SELECT * FROM t1; >>>> >>>> create INDEX v3_f3 ON v3(f3); >>>> >>>> >>> Most of the time we'll be accessing data via the indexed views, but we'd >>> also like to be able to query all columns (f1, f2, f3) for a given pk. At >>> the HBase level, this should be doable as a point get on t1. The SQL-y >>> way to express this would probably be with JOINs, but the EXPLAIN plan is >>> not encouraging. >>> >>> > explain SELECT * from t1 left join v1 on v1.pk=t1.pk left join v2 on >>>> v2.pk=t1.pk left join v3 on v3.pk=t1.pk where t1.pk=12345; >>>> | CLIENT 1-CHUNK 1 ROWS 281 BYTES PARALLEL 1-WAY ROUND ROBIN POINT >>>> LOOKUP ON 1 KEY OVER T1 >>>> | PARALLEL LEFT-JOIN TABLE 0 >>>> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1 >>>> | PARALLEL LEFT-JOIN TABLE 1 >>>> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1 >>>> | PARALLEL LEFT-JOIN TABLE 2 >>>> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1 >>>> >>> >>> This is pushing me back towards a design of having a single table, >>> except for the issue of proliferating physical HBase tables for the >>> indexes. Would you advise having a single table + a single view on it >>> containing all columns, to coerce Phoenix to consolidate the indexes into a >>> single physical table? Are there other alternatives we should be >>> considering? >>> >>> Miles Spielberg >>> Staff Software Engineer >>> >>> >>> O. 650.485.1102 <(650)%20485-1102> >>> 900 Jefferson Ave >>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g> >>> Redwood City >>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g>, >>> CA 94063 >>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g> >>> >>> On Fri, Feb 16, 2018 at 3:27 PM, James Taylor <jamestay...@apache.org> >>> wrote: >>> >>>> All indexes on views are stored in a single physical table, so you'll >>>> be ok in that regard. >>>> >>>> If you could file bugs for any local index issues, we'd really >>>> appreciate it. We've been steadily improving local indexes >>>> (see PHOENIX-3941 for some recent perf improvements - applicable for >>>> multi-tenant tables in particular - these will appear in our 4.14 release). >>>> Handling non covered columns is pretty isolated, so we should be able to >>>> fix bugs you find. Plus, there's a workaround - you can cover your indexes >>>> until any issues are fixed. >>>> >>>> Global, mutable indexes have had many improvements over the last >>>> several releases too, but there's more operational overhead if/when a data >>>> table gets out of sync with it's index table (plus some amount of >>>> configurable eventual consistency or index disablement). With local indexes >>>> (and HBase 1.3), this isn't possible. >>>> >>>> Thanks, >>>> James >>>> >>>> On Fri, Feb 16, 2018 at 3:10 PM, Miles Spielberg <mi...@box.com> wrote: >>>> >>>>> Hi James, >>>>> >>>>> Thanks for the tips around reducing the number of physical tables >>>>> while still maintaining the appearance of multiple tables via view >>>>> definitions. In our use case we don't anticipate having much if any >>>>> immutable data, so unfortunately I don't expect to be able to take >>>>> advantage of Phoenix's optimizations there. >>>>> >>>>> We're expecting many indexes, mostly likely with several per logical >>>>> per-tenant table. Given that global indexes are implemented as physical >>>>> HBase tables, will the view-oriented optimizations help very much? We've >>>>> done some experiments with local indexes on 4.13.2 and found bugs, >>>>> particularly with the rewrite optimization to read non-covered columns >>>>> from >>>>> the main table, so we're not confident in using local indexes to optimize >>>>> queries. (I've looked through the 5.0-alpha release notes and couldn't >>>>> find >>>>> anything related to this issue, so if desired I'll collect info for a >>>>> separate bug report.) >>>>> >>>>> Miles Spielberg >>>>> Staff Software Engineer >>>>> >>>>> >>>>> O. 650.485.1102 <(650)%20485-1102> >>>>> 900 Jefferson Ave >>>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g> >>>>> Redwood City >>>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063+%3Chttps://maps.google.com/?q%3D900%2BJefferson%2BAve%2B%250D%2B%250D%2BRedwood%2BCity,%2BCA%2B94063%26entry%3Dgmail%26source%3Dg%3E&entry=gmail&source=g>, >>>>> CA 94063 >>>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g> >>>>> >>>>> On Fri, Feb 16, 2018 at 2:49 PM, James Taylor <jamestay...@apache.org> >>>>> wrote: >>>>> >>>>>> Hi Miles, >>>>>> You'll be fine if you use views [1] and multi-tenancy [2] to limit >>>>>> the number of physical HBase tables. Make sure you read about the >>>>>> limitations of views too [3]. >>>>>> >>>>>> Here's the way I've seen this modeled successfully: >>>>>> - create one schema per use case. This will let you leverage some >>>>>> nice features in HBase for quotas and throttling. If you'll have a single >>>>>> use case, you don't have to worry about it. Read about namespaces here >>>>>> [4] >>>>>> and make sure to enable them before you start creating tables. >>>>>> - define an immutable, multi-tenant base table that has TENANT_ID + >>>>>> TYPE_ID primary key. There are optimizations Phoenix does over immutable >>>>>> tables that you'll want to leverage (assuming you have use cases that fit >>>>>> into this category). This Phoenix table will be backed by a physical >>>>>> HBase table, but you won't execute Phoenix DML against it. Think of it >>>>>> as a >>>>>> kind of "abstract" type. Instead, you'll create updatable views over it. >>>>>> - define a regular/mutable, multi-tenant base table that has >>>>>> TENANT_ID + TYPE_ID primary key. Same deal as above, but this would be >>>>>> the >>>>>> base table for any tables in which the rows change in place. >>>>>> - define global views per "logical" table (against either your >>>>>> immutable base table or mutable base table depending on the functionality >>>>>> needed) with each view having a WHERE TYPE_ID='your type identifier' >>>>>> clause >>>>>> which adds specific columns to the primary key. This view will be >>>>>> updatable >>>>>> (i.e. you can execute DML against it). The columns you add to your PK >>>>>> will >>>>>> depend on your most common query patterns. >>>>>> - optionally define indexes on these global views. >>>>>> - each tenant can further extend or just use the global views. >>>>>> >>>>>> FYI, lots of good performance/tuning tips can be found here[5]. >>>>>> >>>>>> Thanks, >>>>>> James >>>>>> >>>>>> >>>>>> [1] https://phoenix.apache.org/views.html >>>>>> [2] https://phoenix.apache.org/multi-tenancy.html >>>>>> [3] https://phoenix.apache.org/views.html#Limitations >>>>>> [4] https://phoenix.apache.org/namspace_mapping.html >>>>>> [5] https://phoenix.apache.org/tuning_guide.html >>>>>> >>>>>> On Fri, Feb 16, 2018 at 11:47 AM, Miles Spielberg <mi...@box.com> >>>>>> wrote: >>>>>> >>>>>>> We're looking at employing Phoenix in a multi-tenant use case where >>>>>>> tenants can create their own tables and indexes, running into totals of >>>>>>> tens-of-thousands of each. Is this a supported scenario, or are we >>>>>>> headed >>>>>>> for trouble? >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>