Re: Support for ORC files

rahul challapalli Thu, 13 Apr 2017 15:16:49 -0700

What you need is a format plugin. You can take a look at the Text Format
plugin while reading paul's documentation which abhishek already shared.
Don't look at parquet as it is more complicated. A short summary of what
you need : (maybe too short to be any useful :) )

1. A group of classes which make drill recognize your format plugin.
2. An ORC Reader. This will the heart of this project. Essentially you
provide a way to read data(columns) from ORC files and then populate
drill's value vectors. You can later enhance this by parallelizing the
reads of individual columns.
3. Once you have the format plugin working, you might want to start playing
with planner rules if you want features like "filter pushdown into the
scan" etc.

- Rahul

On Apr 13, 2017 2:57 PM, "Manoj Murumkar" <[email protected]> wrote:

Thanks. I knew about the hive table format support. I'll look into reading
directly from orc files on hdfs (a la parquet). Is there some documentation
around how to develop a new storage plugin?

> On Apr 13, 2017, at 2:51 PM, Abhishek Girish <[email protected]> wrote:
>
> Drill does not support ORC as a DFS file format. You are welcome to
> contribute. As a workaround, Drill supports reading ORC files via the Hive
> plugin, so you should be able use that.
>
> On Thu, Apr 13, 2017 at 2:19 PM, Manoj Murumkar <[email protected]>
> wrote:
>
>> Hi!
>>
>> I am wondering if someone is actively working on ORC support already.
>> Appreciate any pointers.
>>
>> Thanks,
>>
>> Manoj
>>

Re: Support for ORC files

Reply via email to