i'll take a stab at this. - probably no reason.
- if you can. is there a derby client s/t you can issue the command: "alter table COLUMNS_V2 modify TYPE_NAME varchar(32672)". otherwise maybe use the mysql or postgres metastores (instead of derby) and run that alter command after the install. - the schema only exists in one place and that's the metastore (which is probably on your namenode for derby.) for mysql or postgres it can be anywhere you want but again examples will probably show localhost (the namenode) that's a mighty big schema! you don't just want to use string type and use get_json_object to pull data out of it dynamically? not as elegant as using static syntax like nested structs but its better than nothing. something to think about anyway. i'm guessing given a nested struct that large you'll get over one hump only to be faced with another one. hive needs to do some crazy mapping there for every record. hopefully that's optimized. :) Good luck! I'd be curious how it goes. On Mon, Aug 11, 2014 at 5:52 PM, David Beveridge <dbeveri...@cylance.com> wrote: > We are creating an Hive schema for reading massive JSON files. Our JSON > schema is rather large, and we have found that the default metastore schema > for Hive cannot work for us as-is. > > To be specific, one field in our schema has about 17KB of nested structs > within it. Unfortunately, it appears that Hive has a limit of varchar(4000) > for the field that stores the resulting definition: > > > > CREATE TABLE "COLUMNS_V2" ( > > "CD_ID" bigint NOT NULL, > > "COMMENT" varchar(4000), > > "COLUMN_NAME" varchar(128) NOT NULL, > > "TYPE_NAME" varchar(4000), > > "INTEGER_IDX" INTEGER NOT NULL, > > PRIMARY KEY ("CD_ID", "COLUMN_NAME") > > ); > > > > We are running this on Amazon MapReduce (v0.11 with default Derby > metastore) > > > > So, our initial questions are: > > · Is there a reason that the TYPE_NAME is being limited to 4000 > (IIUC, varchar on derby can grow to 32672, which would be sufficient for > a long time) > > · Can we alter the metastore schema without hacking/reinstalling > Hive? (if so, how?) > > · If so, is there a proper way to update the schema on all nodes? > > > > > > Thanks in advance! > > --DB >