val ssc = new StreamingContext(sc, Minutes(10)) //500 textFile streams watching S3 directories val streams = streamPaths.par.map { path => ssc.textFileStream(path) }
streams.par.foreach { stream => stream.foreachRDD { rdd => //do something } } ssc.start() Would something like this scale? What would be the limiting factor to performance? What is the best way to parallelize this? Any other ideas on design?