I'll use parallel arrays for now, but STRUCT would be ideal
it's a case for nested tables, each attribute can have multiple values, and
several attributes are grouped into "sub-rows" by group id, e.g. in the
table below the group <*id: name, price>* corresponds to * <id1: "apple1",
id1:1>* that form one sub-row under rowkey1,and group <*id2: "apple2" ,
id2:2> *form another sub-row with the same set of attributes
another group is <*id: order, supplier>* , which has only one sub-row with
two columns *<id0: 1001, id0: "company1">*
in each row there are a few hundred groups with 1 to 10 attributes each
(total up to 3000 columns in a row), the tables have 1M-100M rows, there
are ~200 tables total.
I would like to avoid normalization into additional tables as joins would
slow things down
| rowkey | id:order | id:name |id: price| id: supplier
| rowkey1|id0: 1001 |id1: "apple1"| id1: 1 | id0: "company1"
|id2: "apple2"| id2: 2 |
| rowkey2|id3: 1002 |id4: "orange"| id4: 5 | id3: "company2"
| rowkey3|id5: 1003 |id6: "pear1" | id6: 1 | id5: "company1"
|id7: "pear2" | id7: 1 |
|id8: "pear3" | id8: 3 |
in scala i'd use this structure to represent it:
case class ColumnLine(
id: Int,
value: Option[Any])case class Column(
colname: String,
coltype: String,
lines: Option[List[ColumnLine]])case class Row (
rowkey:String,
columns:Map[String,Column] //colname -> Column)case class Table (
name:String,
rows:Map[String,Row] //rowkey -> Row)
On Tue, Dec 23, 2014 at 9:06 PM, James Taylor <[email protected]>
wrote:
> No, that's currently not possible. You'd may be able to leverage one
> of the following to help you, though:
> - parallel arrays as you've mentioned
> - different tables with an FK (and likely an index) between them
> - dynamic columns (http://phoenix.apache.org/dynamic_columns.html)
> - on-the-fly updatable VIEW creation, where the VIEW represents the
> set of tuples (http://phoenix.apache.org/views.html)
>
> The implementation of a STRUCT data type (PHOENIX-477) or support for
> JSON (PHOENIX-628) may help you as well.
>
> Would it be possible to share more details about your use case?
>
> On Tue, Dec 23, 2014 at 2:07 PM, Alex Kamil <[email protected]> wrote:
> > is there a way to represent Map.Entry<K,V>[] as a column value in
> phoenix,
> > i.e. store an array of tuples <K,V> instead of creating two arrays: K
> > VARCHAR ARRAY[] and V VARCHAR ARRAY[]
>