Thank you for your reply. I will consider hdfs for the checkpoint storage.
Le mar. 21 juil. 2015 à 17:51, Dean Wampler <deanwamp...@gmail.com> a écrit : > TD's Spark Summit talk offers suggestions ( > https://spark-summit.org/2015/events/recipes-for-running-spark-streaming-applications-in-production/). > He recommends using HDFS, because you get the triplicate resiliency it > offers, albeit with extra overhead. I believe the driver doesn't need > visibility to the checkpointing directory, e.g., if you're running in > client mode, but all the cluster nodes would need to see it for recovering > a lost stage, where it might get started on a different node. Hence, I > would think NFS could work, if all nodes have the same mount, although > there would be a lot of network overhead. In some situations, a high > performance file system appliance, e.g., NAS, could suffice. > > My $0.02, > dean > > Dean Wampler, Ph.D. > Author: Programming Scala, 2nd Edition > <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly) > Typesafe <http://typesafe.com> > @deanwampler <http://twitter.com/deanwampler> > http://polyglotprogramming.com > > On Tue, Jul 21, 2015 at 10:43 AM, Emmanuel <fortin.emman...@gmail.com> > wrote: > >> Hi, >> >> I'm working on a Spark Streaming application and I would like to know what >> is the best storage to use >> for checkpointing. >> >> For testing purposes we're are using NFS between the worker, the master >> and >> the driver program (in client mode), >> but we have some issues with the CheckpointWriter (1 thread dedicated). >> *My >> understanding is that NFS is not a good candidate for this usage.* >> >> 1. What is the best solution for checkpointing and what are the >> alternatives >> ? >> >> 2. Does checkpointings directories need to be shared by the driver >> application and the workers too ? >> >> Thanks for your replies >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Checkpointing-solutions-tp23932.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >>