What you need is a format plugin. You can take a look at the Text Format plugin while reading paul's documentation which abhishek already shared. Don't look at parquet as it is more complicated. A short summary of what you need : (maybe too short to be any useful :) )
1. A group of classes which make drill recognize your format plugin. 2. An ORC Reader. This will the heart of this project. Essentially you provide a way to read data(columns) from ORC files and then populate drill's value vectors. You can later enhance this by parallelizing the reads of individual columns. 3. Once you have the format plugin working, you might want to start playing with planner rules if you want features like "filter pushdown into the scan" etc. - Rahul On Apr 13, 2017 2:57 PM, "Manoj Murumkar" <[email protected]> wrote: Thanks. I knew about the hive table format support. I'll look into reading directly from orc files on hdfs (a la parquet). Is there some documentation around how to develop a new storage plugin? > On Apr 13, 2017, at 2:51 PM, Abhishek Girish <[email protected]> wrote: > > Drill does not support ORC as a DFS file format. You are welcome to > contribute. As a workaround, Drill supports reading ORC files via the Hive > plugin, so you should be able use that. > > On Thu, Apr 13, 2017 at 2:19 PM, Manoj Murumkar <[email protected]> > wrote: > >> Hi! >> >> I am wondering if someone is actively working on ORC support already. >> Appreciate any pointers. >> >> Thanks, >> >> Manoj >>
