I have a similar situation. I have records with varying fields that I wanted to access individually and also as a group.
My actual records are JSON objects, so they look like like this: {"field1": value1, "field2": value2, …} To make matter harder, the fields are also varying types: ints, strings, boolean, floats. What I ended up doing that has worked really well is make each field in the JSON record a dynamic field in a Phoenix table. And, to preserve the integrity of the original record, I have one static string field in which the entire original JSON goes into. Because of the dynamic fields, I have to execute an UPSERT for each row instead of some kind of batch loading. But, I routinely load millions of rows like this, sometimes with 20+ fields, and it has never failed on an upsert and runs fast enough. A few of the fields are actually always in each record, so I use those for my primary keys, The end result is great! I can query by any field, quickly pull entire records from the key fields, and any JSON object can be loaded. In my code that translates the JSON into UPSERT statements I also keep a running tab of field names, their type, and number of occurrences. Then a the end of my millions of UPSERTS I populate a simple table that just lists all the fields/types/counts so that there is no mystery about what fields are available when I want to query the big table. I did this because I don't always know what field names are in the JSON, and I don't know how to get a list of dynamic field names from Phoenix SQL. Hope this helps, Steve On Thu, Jun 2, 2016 at 9:28 PM, Stephen Wilcoxon <wilco...@gmail.com> wrote: > The simple solution is to transform the multi-valued field into a separate > linked table with multiple rows per "id". If this is what you mean by > "exploding the data", why do you not want to do so? > > On Thu, Jun 2, 2016 at 7:11 PM, Rahul Jain <rja...@gmail.com> wrote: > >> >> Folks, >> >> I used this question earlier on the wrong list; posting on apache phoenix >> user group on the advise of James Taylor: >> >> We have a use case of migration from legacy database to hbase/phoenix. >> I'd want to hear your thoughts on how to manage the given multi-valued >> field: >> >> Field name : namesList >> >> Contains: comma separated list of strings. >> >> e.g. "Jack, Richard, Fred" >> >> The field does not have an inherent limit on the number of strings, >> however for successful migration we need to support unto 100 strings in the >> field >> >> This field is currently indexed in our legacy DB technology where the >> user is able to query by any of the contained strings. >> >> e.g. query: >> >> select * from myTable where namesList LIKE '%Fred%'; >> >> Is there any way Phoenix supports generating multiple index values from >> a single column in a row ? We are looking for a method where we don't >> explicitly specify individual offsets for indexing, (nameList[0], >> nameList[1] etc) , rather say a custom functional index equivalent >> generates the indexes as a list and each of the index in the list is >> persisted individually. >> >> Effectively our goal is to avoid full table scans when doing queries that >> match values from multi valued fields such as example above. We do want >> to avoid exploding the data however possible. >> >> Any comments / suggestions are welcome. >> >> Thank you ! >> >> Rahul >> > >