Thanks for digging that up, Miles. I've added a comment on the JIRA on how to go about implementing it here: https://issues.apache.org/jira/browse/PHOENIX-3547?focusedCommentId=16391739&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16391739
That would be a good first contribution if you're up for it. Thanks, James On Wed, Mar 7, 2018 at 5:09 PM, Miles Spielberg <mi...@box.com> wrote: > We found https://issues.apache.org/jira/browse/PHOENIX-3547, which seems > to be precisely our problem. We would want at least the option to use a > bigint rather than the int in the JIRA to accommodate massive growth. While > we intend to have many tenants, we don't intend to use the Phoenix > "tenant_id" to differentiate them, and instead manage them at our > application layer, so separate counters per Phoenix tenant would not help > in our situation. > > Miles Spielberg > Staff Software Engineer > > > O. 650.485.1102 <(650)%20485-1102> > 900 Jefferson Ave > <https://maps.google.com/?q=900+Jefferson+Ave+%0D%0A%0D%0ARedwood+City+,+CA+94063&entry=gmail&source=g> > Redwood City, CA 94063 > <https://maps.google.com/?q=900+Jefferson+Ave+%0D%0A%0D%0ARedwood+City+,+CA+94063&entry=gmail&source=g> > > On Wed, Feb 28, 2018 at 10:27 PM, James Taylor <jamestay...@apache.org> > wrote: > >> Please file a JIRA as it’d be feasible to change this limitation. The >> easiest way would be to have a separate counter for each tenant. Another >> way to reduce the number of indexes on tenant specific views would be to >> factor out common columns to global views and create indexes there. >> >> On Tue, Feb 27, 2018 at 3:40 PM Miles Spielberg <mi...@box.com> wrote: >> >>> As we discussed, indexes across views are stored in a single HBase table >>> associated with the original table (_IDX_<tablename>). That's grand for >>> limiting the number of HBase tables created, but I just realized that the >>> actual index data within is differentiated by the 16-bit "viewIndexId", >>> which limits us to 64K indexes across all views for a given table. That's >>> concerning for our use case, especially if its a cumulative autoincrement >>> across all CREATE INDEX and DROP INDEX operations over the lifetime of the >>> base table. >>> >>> Is there any workaround for this? A quick grep across the source >>> indicates that the length of viewIndexId is currently hard-coded. >>> >>> At least, this limitation should probably be added to the list of >>> caveats and warnings at https://phoenix.apache.org/views.html. >>> >>> Miles Spielberg >>> Staff Software Engineer >>> >>> >>> O. 650.485.1102 <(650)%20485-1102> >>> 900 Jefferson Ave >>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D%0A%0D%0ARedwood+City+,+CA+94063&entry=gmail&source=g> >>> Redwood City >>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D%0A%0D%0ARedwood+City+,+CA+94063&entry=gmail&source=g>, >>> CA 94063 >>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D%0A%0D%0ARedwood+City+,+CA+94063&entry=gmail&source=g> >>> >>> On Thu, Feb 22, 2018 at 7:42 AM, James Taylor <jamestay...@apache.org> >>> wrote: >>> >>>> Another option would be to use dynamic columns[1] when querying across >>>> views. You’d have to disable column encoding [2] in this case. >>>> >>>> [1] http://phoenix.apache.org/dynamic_columns.html >>>> [2] http://phoenix.apache.org/columnencoding.html >>>> >>>> On Wed, Feb 21, 2018 at 11:57 PM Miles Spielberg <mi...@box.com> wrote: >>>> >>>>> I believe each query in a UNION needs to have the same result tuple >>>>> format, which would work in this toy example, but in the general case each >>>>> view would have a different schema. We could make the result tuples >>>>> conform >>>>> with each other by selecting NULL literals for every column except those >>>>> in >>>>> a view. It would get quite verbose though. Assuming f1,f2,f3 all have >>>>> incompatible types, were you suggesting something like this? >>>>> >>>>> Select f1, null, null from v1 where PK=? >>>>> Union all >>>>> Select null, f2, null from v2 where PK=? >>>>> Union all >>>>> Select null, null, f3 from v3 where PK=? >>>>> >>>>> We might just run separate parallel queries against each view and >>>>> merge the results client side. I would guess this should perform well >>>>> since >>>>> the block cache can be leveraged for queries after the first. >>>>> >>>>> We could also use the HBase API to run a point row get. We'd have to >>>>> reimplement decoding for Phoenix's column values, which is not ideal but >>>>> quite doable. >>>>> >>>>> Sent from my iPhone >>>>> >>>>> On Feb 21, 2018, at 9:09 PM, James Taylor <jamestay...@apache.org> >>>>> wrote: >>>>> >>>>> Have you tried a UNION ALL query on (f1, f2, f3) instead? It seems >>>>> you’re on a good track with multiple views over a single (or handful) of >>>>> physical table(s). >>>>> >>>>> On Wed, Feb 21, 2018 at 6:45 PM Miles Spielberg <mi...@box.com> wrote: >>>>> >>>>>> I've done some experimentation with views, with a schema resembling >>>>>> this: >>>>>> >>>>>> create table t1( >>>>>>> >>>>>>> pk bigint not null primary key >>>>>>> >>>>>>> ); >>>>>>> >>>>>>> >>>>>>>> create view v1( >>>>>>> >>>>>>> f1 varchar >>>>>>> >>>>>>> ) AS SELECT * FROM t1; >>>>>>> >>>>>>> create INDEX v1_f1 ON v1(f1); >>>>>>> >>>>>>> >>>>>>>> create view v2( >>>>>>> >>>>>>> f2 varchar >>>>>>> >>>>>>> ) AS SELECT * FROM t1; >>>>>>> >>>>>>> create INDEX v2_f2 ON v2(f2); >>>>>>> >>>>>>> >>>>>>>> create view v3( >>>>>>> >>>>>>> f3 varchar >>>>>>> >>>>>>> ) AS SELECT * FROM t1; >>>>>>> >>>>>>> create INDEX v3_f3 ON v3(f3); >>>>>>> >>>>>>> >>>>>> Most of the time we'll be accessing data via the indexed views, but >>>>>> we'd also like to be able to query all columns (f1, f2, f3) for a given >>>>>> pk. At >>>>>> the HBase level, this should be doable as a point get on t1. The >>>>>> SQL-y way to express this would probably be with JOINs, but the EXPLAIN >>>>>> plan is not encouraging. >>>>>> >>>>>> > explain SELECT * from t1 left join v1 on v1.pk=t1.pk left join v2 >>>>>>> on v2.pk=t1.pk left join v3 on v3.pk=t1.pk where t1.pk=12345; >>>>>>> | CLIENT 1-CHUNK 1 ROWS 281 BYTES PARALLEL 1-WAY ROUND ROBIN POINT >>>>>>> LOOKUP ON 1 KEY OVER T1 >>>>>>> | PARALLEL LEFT-JOIN TABLE 0 >>>>>>> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1 >>>>>>> | PARALLEL LEFT-JOIN TABLE 1 >>>>>>> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1 >>>>>>> | PARALLEL LEFT-JOIN TABLE 2 >>>>>>> | CLIENT 1-CHUNK PARALLEL 1-WAY ROUND ROBIN FULL SCAN OVER T1 >>>>>>> >>>>>> >>>>>> This is pushing me back towards a design of having a single table, >>>>>> except for the issue of proliferating physical HBase tables for the >>>>>> indexes. Would you advise having a single table + a single view on it >>>>>> containing all columns, to coerce Phoenix to consolidate the indexes >>>>>> into a >>>>>> single physical table? Are there other alternatives we should be >>>>>> considering? >>>>>> >>>>>> Miles Spielberg >>>>>> Staff Software Engineer >>>>>> >>>>>> >>>>>> O. 650.485.1102 <(650)%20485-1102> >>>>>> 900 Jefferson Ave >>>>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g> >>>>>> Redwood City >>>>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g>, >>>>>> CA 94063 >>>>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g> >>>>>> >>>>>> On Fri, Feb 16, 2018 at 3:27 PM, James Taylor <jamestay...@apache.org >>>>>> > wrote: >>>>>> >>>>>>> All indexes on views are stored in a single physical table, so >>>>>>> you'll be ok in that regard. >>>>>>> >>>>>>> If you could file bugs for any local index issues, we'd really >>>>>>> appreciate it. We've been steadily improving local indexes >>>>>>> (see PHOENIX-3941 for some recent perf improvements - applicable for >>>>>>> multi-tenant tables in particular - these will appear in our 4.14 >>>>>>> release). >>>>>>> Handling non covered columns is pretty isolated, so we should be able to >>>>>>> fix bugs you find. Plus, there's a workaround - you can cover your >>>>>>> indexes >>>>>>> until any issues are fixed. >>>>>>> >>>>>>> Global, mutable indexes have had many improvements over the last >>>>>>> several releases too, but there's more operational overhead if/when a >>>>>>> data >>>>>>> table gets out of sync with it's index table (plus some amount of >>>>>>> configurable eventual consistency or index disablement). With local >>>>>>> indexes >>>>>>> (and HBase 1.3), this isn't possible. >>>>>>> >>>>>>> Thanks, >>>>>>> James >>>>>>> >>>>>>> On Fri, Feb 16, 2018 at 3:10 PM, Miles Spielberg <mi...@box.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi James, >>>>>>>> >>>>>>>> Thanks for the tips around reducing the number of physical tables >>>>>>>> while still maintaining the appearance of multiple tables via view >>>>>>>> definitions. In our use case we don't anticipate having much if any >>>>>>>> immutable data, so unfortunately I don't expect to be able to take >>>>>>>> advantage of Phoenix's optimizations there. >>>>>>>> >>>>>>>> We're expecting many indexes, mostly likely with several per >>>>>>>> logical per-tenant table. Given that global indexes are implemented as >>>>>>>> physical HBase tables, will the view-oriented optimizations help very >>>>>>>> much? >>>>>>>> We've done some experiments with local indexes on 4.13.2 and found >>>>>>>> bugs, >>>>>>>> particularly with the rewrite optimization to read non-covered columns >>>>>>>> from >>>>>>>> the main table, so we're not confident in using local indexes to >>>>>>>> optimize >>>>>>>> queries. (I've looked through the 5.0-alpha release notes and couldn't >>>>>>>> find >>>>>>>> anything related to this issue, so if desired I'll collect info for a >>>>>>>> separate bug report.) >>>>>>>> >>>>>>>> Miles Spielberg >>>>>>>> Staff Software Engineer >>>>>>>> >>>>>>>> >>>>>>>> O. 650.485.1102 <(650)%20485-1102> >>>>>>>> 900 Jefferson Ave >>>>>>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g> >>>>>>>> Redwood City >>>>>>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063+%3Chttps://maps.google.com/?q%3D900%2BJefferson%2BAve%2B%250D%2B%250D%2BRedwood%2BCity,%2BCA%2B94063%26entry%3Dgmail%26source%3Dg%3E&entry=gmail&source=g>, >>>>>>>> CA 94063 >>>>>>>> <https://maps.google.com/?q=900+Jefferson+Ave+%0D+%0D+Redwood+City,+CA+94063&entry=gmail&source=g> >>>>>>>> >>>>>>>> On Fri, Feb 16, 2018 at 2:49 PM, James Taylor < >>>>>>>> jamestay...@apache.org> wrote: >>>>>>>> >>>>>>>>> Hi Miles, >>>>>>>>> You'll be fine if you use views [1] and multi-tenancy [2] to limit >>>>>>>>> the number of physical HBase tables. Make sure you read about the >>>>>>>>> limitations of views too [3]. >>>>>>>>> >>>>>>>>> Here's the way I've seen this modeled successfully: >>>>>>>>> - create one schema per use case. This will let you leverage some >>>>>>>>> nice features in HBase for quotas and throttling. If you'll have a >>>>>>>>> single >>>>>>>>> use case, you don't have to worry about it. Read about namespaces >>>>>>>>> here [4] >>>>>>>>> and make sure to enable them before you start creating tables. >>>>>>>>> - define an immutable, multi-tenant base table that has TENANT_ID >>>>>>>>> + TYPE_ID primary key. There are optimizations Phoenix does over >>>>>>>>> immutable >>>>>>>>> tables that you'll want to leverage (assuming you have use cases that >>>>>>>>> fit >>>>>>>>> into this category). This Phoenix table will be backed by a >>>>>>>>> physical HBase table, but you won't execute Phoenix DML against it. >>>>>>>>> Think >>>>>>>>> of it as a kind of "abstract" type. Instead, you'll create updatable >>>>>>>>> views >>>>>>>>> over it. >>>>>>>>> - define a regular/mutable, multi-tenant base table that has >>>>>>>>> TENANT_ID + TYPE_ID primary key. Same deal as above, but this would >>>>>>>>> be the >>>>>>>>> base table for any tables in which the rows change in place. >>>>>>>>> - define global views per "logical" table (against either your >>>>>>>>> immutable base table or mutable base table depending on the >>>>>>>>> functionality >>>>>>>>> needed) with each view having a WHERE TYPE_ID='your type identifier' >>>>>>>>> clause >>>>>>>>> which adds specific columns to the primary key. This view will be >>>>>>>>> updatable >>>>>>>>> (i.e. you can execute DML against it). The columns you add to your PK >>>>>>>>> will >>>>>>>>> depend on your most common query patterns. >>>>>>>>> - optionally define indexes on these global views. >>>>>>>>> - each tenant can further extend or just use the global views. >>>>>>>>> >>>>>>>>> FYI, lots of good performance/tuning tips can be found here[5]. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> James >>>>>>>>> >>>>>>>>> >>>>>>>>> [1] https://phoenix.apache.org/views.html >>>>>>>>> [2] https://phoenix.apache.org/multi-tenancy.html >>>>>>>>> [3] https://phoenix.apache.org/views.html#Limitations >>>>>>>>> [4] https://phoenix.apache.org/namspace_mapping.html >>>>>>>>> [5] https://phoenix.apache.org/tuning_guide.html >>>>>>>>> >>>>>>>>> On Fri, Feb 16, 2018 at 11:47 AM, Miles Spielberg <mi...@box.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> We're looking at employing Phoenix in a multi-tenant use case >>>>>>>>>> where tenants can create their own tables and indexes, running into >>>>>>>>>> totals >>>>>>>>>> of tens-of-thousands of each. Is this a supported scenario, or are we >>>>>>>>>> headed for trouble? >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>> >