Hi, we have a use case where our MR job have to read from old json (data where each line is a json with fixed schema) and ORC files. The output of the job will be in ORC file.
I tried some approaches. 1) Hcatalog but it was not having support for reading from multiple tables as of now. Json data don't have hive tables too. 2) With the help of hive ORC lib and serde. But unable to pass orc Struct through shuffle phase. As they don't implement writable.(I am creating ORCStruct in mapper) 3) Currently I am checking org.apache.orc.mapreduce apis. everything is good here. I have to convert exiting json record to Orcstruct. This looks a common use-case. Writing a converter myself look like reinventing. Hoping if anyone in community aware of any utils which can help me in converting json to ORCStruct. Any other suggestion is well come. Thanks
