Hard to say. You probably need the SparkSQL people to look at the example data. I'm guessing it has to do with the fact that Drill adds full metadata to the file around different data types and maybe Spark doesn't handle all the extended types. Last I checked, Hive hasn't yet implemented complete support for generating Parquet's extended metadata so it would likely not hit this issue (at the cost of less self-description in the data).
-- Jacques Nadeau CTO and Co-Founder, Dremio On Mon, Aug 24, 2015 at 3:50 PM, Sungwook Yoon <[email protected]> wrote: > Hi, > > I generated Parquet files from Drill with CTAS. > SparkSQL is throwing error nullpointerexception? > > SparkSQL however can read Hive generated Parquet files. > > Can someone clarify what's making SparkSQL failing on reading Drill > generated Parquet files? > > Maybe some different Parquet version? > > Thanks, > > Sungwook >
