We are creating an Hive schema for reading massive JSON files. Our JSON schema 
is rather large, and we have found that the default metastore schema for Hive 
cannot work for us as-is.

To be specific, one field in our schema has about 17KB of nested structs within 
it. Unfortunately, it appears that Hive has a limit of varchar(4000) for the 
field that stores the resulting definition:

    CREATE TABLE "COLUMNS_V2" (
    "CD_ID" bigint NOT NULL,
    "COMMENT" varchar(4000),
    "COLUMN_NAME" varchar(128) NOT NULL,
    "TYPE_NAME" varchar(4000),
    "INTEGER_IDX" INTEGER NOT NULL,
    PRIMARY KEY ("CD_ID", "COLUMN_NAME")
    );

We are running this on Amazon MapReduce (v0.11 with default Derby metastore)

So, our initial questions are:

*         Is there a reason that the TYPE_NAME is being limited to 4000 (IIUC, 
varchar on derby can grow to 32672, which would be sufficient for a long time)

*         Can we alter the metastore schema without hacking/reinstalling Hive? 
(if so, how?)

*         If so, is there a proper way to update the schema on all nodes?


Thanks in advance!
--DB

Reply via email to