Gang and Xiening,
This is exciting stuff and I'm looking forward to working with you. If
you can separate out the bug fixes from the refactoring, that would make
things much easier. (In particular, we should figure out which of them we
should back port to previous versions.)
Thanks,
Owen
On Wed, Apr 26, 2017 at 8:08 AM, Deepak Majeti
wrote:
> Hi Gang and Xiening,
>
> We at Vertica have been actively contributing and using the ORC C++ project
> as well.
> C++ writer will be a great addition to this project and we will look
> forward to working with you in merging your contributions.
> Thanks.
>
>
> On Wed, Apr 26, 2017 at 2:13 AM, Gang Wu wrote:
>
> > Hi,
> > This is Gang from Alibaba working on Alibaba's big data platform -
> > MaxCompute. We have developed our own columnar storage format within
> > MaxCompute to support MapReduce and other batch processing workload. But
> as
> > Apache Orc is getting popular in the industry, we are actively looking at
> > integrating Orc format into MaxCompute.
> > In the past few months, Xiening (cc'ed) and I have been working on
> > echancing Orc C++ to provide full featured C++ reader and writer. Our
> work
> > mainly involves adding a C++ writer that supports all data types and
> stats,
> > and supporting index for both reader and writer. As of today, we have
> > finished development and testing and plan to contribute this work back to
> > the Apach Orc project. We have communicated with Owen via email and have
> > created an umbrella JIRA ORC-179 for the plan. In brief, we plan to do
> the
> > following:
> > 1. Refactor common classes for writer and reader
> > -- extract common classes and functions for writer and reader to
> share
> > 2. OutputStream interface for writer
> > -- implement several output streams for writing to memory, file, etc.
> > -- implement ByteRleEncoder, RleEncoder, BooleanRleEncoder, etc.
> > -- support zlib compression
> > 3. ORC Writer
> > -- write orc file header, file footer, postscript, etc.
> > -- write columns of all types
> > -- write column statistics
> > -- write index stream in writer and reader seeks to
> > row based on index information
> > 4. other
> > -- some minor bug fixes of current code base.
> >
> > Should you have any question, please feel free to contact us. Any
> > feedbacks and suggestions are welcome. Thanks!
> > Gang WuSenior EngineerAlibaba Group
> >
>
>
>
> --
> regards,
> Deepak Majeti,
> Software Engineer at Vertica
>