[Question] Hive InputFormat relationship to FetchOperator

Anthony Virtuoso Mon, 27 Jul 2020 10:43:06 -0700

Hello Hive Community,

I have what will hopefully be a simple question. I'm working on a new, or
perhaps enhancement to an existing, InputFormat. As part of this research
I'm trying to understand where in the Hive codebase the InputFormat is
actually used. From my initial tracing, it seems that InputFormat is mostly
used from within FetchOperator.java to determine which splits exist and
could be read in a given path... it also seems as though the InputFormat is
used to create instances of the actual reader for said splits. So my
questions are:


1. Is FetchOperator the main (only?) place where InputFormat is used to
create the list of candidate splits to be read?
2. Is FetchOperator the main (only?) place where InputFormat is used to
createthe reader for a given split?
3. Are there any other places I should look in the codebase if I wanted to
introduce a new manifest based InputFormat like SymlinkTextInputFormat?

Here is a bit more background on why I'm asking this question. The
interface of InputFormat would seem self-explanatory but I've found that
other engines like Spark and Presto don't actually respect the methods on
InputFormat in all cases. They have some situations where they only call
getSplits(...) and not getRecordReader(...) or vice-versa.

I appreciate any guidance you might be able to offer. Thanks.

[Question] Hive InputFormat relationship to FetchOperator

Reply via email to