Thank you very much! It is much more clear to me now… From: haosdent [mailto:[email protected]] Sent: mardi 23 juin 2015 16:10 To: [email protected] Subject: Re: mesos be aware of hdfs location using spark
>- Is there a way to ensure, that there is always a spark and an hdfs >instance on the same mesos worker ? I think mesos could not ensure this now. > If they are on the same mesos worker, the two services would probably not > know they could talk to each other locally ? Spark use hdfs client library to connect to hdfs. HDFS would detect that which is the fastest way to read data if would configure your hdfs cluster correct. >Is it the way most of the people are using hdfs in mesos, or are they any >bestpractices to attach the same storage when the instance moves ? I think it is not related to mesos. For a hdfs cluster, if you replica factor number is 3. After you decommission a datanode, some blocks replica would become to 2. This hdfs cluster still could work as normal, because the client would retry to connect other datanodes to get the data. But for the hdfs cluster, it would have some internal network copy to keep the replica factor number 3. There may be some glitches during this time, but I think you don't need worried too much. On Tue, Jun 23, 2015 at 9:21 PM, Sebastien Brennion <[email protected]<mailto:[email protected]>> wrote: Thank you for your answers… I’m new to both… Sorry sent to quick… I’m not sure to understand your answers… I probably should try to reformulate my question… If hdfs is in Mesos, and Spark too.. 1. - Is there a way to ensure, that there is always a spark and an hdfs instance on the same mesos worker ? - If they are on the same mesos worker, the two services would probably not know they could talk to each other locally ? 2. What I also not sure about, is how to handle the storage, if hdfs is running in mesos ? What happend when hdfs_1 is moved from mesos_worker_1 to mesos_worker_2, do all Data have to be copied ? How are handling this ? -> I think unless you already have data replica in new hdfs datanode, otherwise hdfs would copy the block from other exist datanode. Do you mean, if the Data Node instance moves, it would copy all data ? Is it the way most of the people are using hdfs in mesos, or are they any bestpractices to attach the same storage when the instance moves ? From: haosdent [mailto:[email protected]] Sent: mardi 23 juin 2015 15:03 To: [email protected]<mailto:[email protected]> Subject: Re: mesos be aware of hdfs location using spark By the way, I think you problems are related to HDFS, not related to mesos. You could send it to hdfs user email list. On Tue, Jun 23, 2015 at 9:01 PM, haosdent <[email protected]<mailto:[email protected]>> wrote: And for your this question: >on instances, that also contain hdfs service, to prevent all Data going over >the network ? If you open HDFS Short-Circuit Local Reads, HDFS would auto read from local machine instead of read from network when the data exists in local machine. On Tue, Jun 23, 2015 at 8:58 PM, haosdent <[email protected]<mailto:[email protected]>> wrote: For your second question, I think unless you already have data replica in new hdfs datanode, otherwise hdfs would copy the block from other exist datanode. On Tue, Jun 23, 2015 at 7:51 PM, Sebastien Brennion <[email protected]<mailto:[email protected]>> wrote: Hi, - I would like to know it there is a way to make mesos dispatch in priority spark jobs, on instances, that also contain hdfs service, to prevent all Data going over the network ? - What I also not sure about, is how to handle the storage, if hdfs is running in mesos ? What happend when hdfs_1 is moved from mesos_worker_1 to mesos_worker_2, do all Data have to be copied ? How are handling this ? Regards Sébastien -- Best Regards, Haosdent Huang -- Best Regards, Haosdent Huang -- Best Regards, Haosdent Huang -- Best Regards, Haosdent Huang

