Hard to say.  You probably need the SparkSQL people to look at the example
data.  I'm guessing it has to do with the fact that Drill adds full
metadata to the file around different data types and maybe Spark doesn't
handle all the extended types. Last I checked, Hive hasn't yet implemented
complete support for generating Parquet's extended metadata so it would
likely not hit this issue (at the cost of less self-description in the
data).

--
Jacques Nadeau
CTO and Co-Founder, Dremio

On Mon, Aug 24, 2015 at 3:50 PM, Sungwook Yoon <[email protected]> wrote:

> Hi,
>
> I generated Parquet files from Drill with CTAS.
> SparkSQL is throwing error nullpointerexception?
>
> SparkSQL however can read Hive generated Parquet files.
>
> Can someone clarify what's making SparkSQL failing on reading Drill
> generated Parquet files?
>
> Maybe some different Parquet version?
>
> Thanks,
>
> Sungwook
>

Reply via email to