Hi, I'm trying to understand how to stream Parquet files into Spark using StreamingContext.fileStream[Key, Value, Format]().
I am struggling to understand a) what should be passed as Key and Value (assuming ParquetInputFormat - is this the correct format?), and b) how - if at all - to configure the ParquetInputFormat with a ReadSupport class, RecordMaterializer etc.. I have tried setting the ReadSupportClass to GroupReadSupport (from the examples), but I am having problems with the fact that I must also pass a Hadoop MapReduce job - which is expected to be running and attached to a job tracker. Any help or reading suggestions are appreciated as I have almost no knowledge of Hadoop so this low level use of Hadoop is very confusing for me.