val ssc = new StreamingContext(sc, Minutes(10))

//500 textFile streams watching S3 directories
val streams = streamPaths.par.map { path =>
  ssc.textFileStream(path)
}

streams.par.foreach { stream =>
  stream.foreachRDD { rdd =>
    //do something
  }
}

ssc.start()

Would something like this scale? What would be the limiting factor to
performance? What is the best way to parallelize this? Any other ideas on
design?

Reply via email to