Thank you for your reply. I will consider hdfs for the checkpoint storage.


Le mar. 21 juil. 2015 à 17:51, Dean Wampler <deanwamp...@gmail.com> a
écrit :

> TD's Spark Summit talk offers suggestions (
> https://spark-summit.org/2015/events/recipes-for-running-spark-streaming-applications-in-production/).
> He recommends using HDFS, because you get the triplicate resiliency it
> offers, albeit with extra overhead. I believe the driver doesn't need
> visibility to the checkpointing directory, e.g., if you're running in
> client mode, but all the cluster nodes would need to see it for recovering
> a lost stage, where it might get started on a different node. Hence, I
> would think NFS could work, if all nodes have the same mount, although
> there would be a lot of network overhead. In some situations, a high
> performance file system appliance, e.g., NAS, could suffice.
>
> My $0.02,
> dean
>
> Dean Wampler, Ph.D.
> Author: Programming Scala, 2nd Edition
> <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
> Typesafe <http://typesafe.com>
> @deanwampler <http://twitter.com/deanwampler>
> http://polyglotprogramming.com
>
> On Tue, Jul 21, 2015 at 10:43 AM, Emmanuel <fortin.emman...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I'm working on a Spark Streaming application and I would like to know what
>> is the best storage to use
>> for checkpointing.
>>
>> For testing purposes we're are using NFS between the worker, the master
>> and
>> the driver program (in client mode),
>> but we have some issues with the CheckpointWriter (1 thread dedicated).
>> *My
>> understanding is that NFS is not a good candidate for this usage.*
>>
>> 1. What is the best solution for checkpointing and what are the
>> alternatives
>> ?
>>
>> 2. Does checkpointings directories need to be shared by the driver
>> application and the workers too ?
>>
>> Thanks for your replies
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Checkpointing-solutions-tp23932.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>

Reply via email to