subject:"Structured Streaming partition logic with respect to storage and fileformat"

Re: Structured Streaming partition logic with respect to storage and fileformat

2016-06-21 Thread Sachin Aggarwal

what will the scenario in case of s3 and local file system? On Tue, Jun 21, 2016 at 4:36 PM, Jörn Franke wrote: > Based on the underlying Hadoop FileFormat. This one does it mostly based > on blocksize. You can change this though. > > On 21 Jun 2016, at 12:19, Sachin

Re: Structured Streaming partition logic with respect to storage and fileformat

2016-06-21 Thread Jörn Franke

Based on the underlying Hadoop FileFormat. This one does it mostly based on blocksize. You can change this though. > On 21 Jun 2016, at 12:19, Sachin Aggarwal wrote: > > > when we use readStream to read data as Stream, how spark decides the no of > RDD and

Structured Streaming partition logic with respect to storage and fileformat

2016-06-21 Thread Sachin Aggarwal

when we use readStream to read data as Stream, how spark decides the no of RDD and partition within each RDD with respect to storage and file format. val dsJson = sqlContext.readStream.json("/Users/sachin/testSpark/inputJson") val dsCsv = sqlContext.readStream.option("header","true").csv(