How do I stream in Parquet files using fileStream() and ParquetInputFormat?

Rory Byrne Thu, 18 Feb 2016 04:06:15 -0800

Hi,

I'm trying to understand how to stream Parquet files into Spark using
StreamingContext.fileStream[Key, Value, Format]().


I am struggling to understand a) what should be passed as Key and Value
(assuming ParquetInputFormat - is this the correct format?), and b) how -
if at all - to configure the ParquetInputFormat with a ReadSupport class,
 RecordMaterializer etc..

I have tried setting the ReadSupportClass to GroupReadSupport (from the
examples), but I am having problems with the fact that I must also pass a
Hadoop MapReduce job - which is expected to be running and attached to a
job tracker.

Any help or reading suggestions are appreciated as I have almost no
knowledge of Hadoop so this low level use of Hadoop is very confusing for
me.

How do I stream in Parquet files using fileStream() and ParquetInputFormat?

Reply via email to