Maybe you could separate some of the columns into separate column families so you have some physical partitioning on disk?

Whether you select one or many columns, you presently have to read through each column on disk.

AFAIK, there shouldn't really be an upper limit here (in terms of what will execute). The price to pay would be relative to the data that has to be inspected to answer your query.

Arvind S wrote:
Setup ..
hbase (1.1.2.2.4) cluster on azure with 1 Region server. (8core 28 gb
ram ..~16gb RS heap)
phoenix .. 4.4

Observation ..
created a table with 3 col composite PK and 3600 float type columns (1
per sec).
loaded with <5000 lines of data (<100 MB compressed snappy & fast diff
encoding)

On performing "select * " or select with individually naming each of
these 3600 columns the query takes around 2+ mins to just return a few
lines (limit 2,10 etc).

Subsequently on selecting lesser number of columns the performance seems
to improve.

is it an anti-pattern to have large number of columns in phoenix tables?

*Cheers !!*
Arvind

Reply via email to