A few of us have written hacky ones, but we should have an official one that is more robust. Mine was in this pull request https://github.com/apache/orc/pull/43/commits/48a9f3443062bfaee4b684e49b137106bbfe9947#diff-efa8880e64e22de68f1e34c2f1d5b538 where I was converting the github archives data to ORC for benchmarking.
I've created a jira https://issues.apache.org/jira/browse/ORC-150 for adding one. .. Owen On Sun, Feb 19, 2017 at 11:14 PM, Piyush Mukati (Data Platform) < [email protected]> wrote: > Hi, > we have a use case where our MR job have to read from old json (data where > each line is a json with fixed schema) and ORC files. The output of the job > will be in ORC file. > > I tried some approaches. > > 1) Hcatalog but it was not having support for reading from multiple > tables as of now. Json data don't have hive tables too. > > 2) With the help of hive ORC lib and serde. > But unable to pass orc Struct through shuffle phase. As they don't > implement writable.(I am creating ORCStruct in mapper) > > 3) Currently I am checking org.apache.orc.mapreduce apis. everything is > good here. I have to convert exiting json record to Orcstruct. > This looks a common use-case. Writing a converter myself look like > reinventing. > > Hoping if anyone in community aware of any utils which can help me in > converting json to ORCStruct. Any other suggestion is well come. > > Thanks > >
