Re: Tachyon in Spark

2014-12-15 Thread Jun Feng Liu

Thanks  the response. I got the point - sounds like todays Spark linage
dose not push to Tachyon linage.  Would be good to see how it works.

Jun Feng Liu.



   
 Haoyuan Li
 haoyuan.li@gmail 
 .com  To
   Jun Feng Liu/China/IBM@IBMCN,   
 2014-12-13 00:17   cc
   Reynold Xin r...@databricks.com,
   Andrew Ash and...@andrewash.com,
   dev@spark.apache.org  
   dev@spark.apache.org  
   Subject
   Re: Tachyon in Spark
   
   
   
   
   
   




Junfeng, by off the heap solution, did you mean rdd.persist(OFF_HEAP)?
That feature is different from the lineage feature. You can use this
feature (rdd.persist(OFF_HEAP)) now for any Spark version later than 1.0.0
with Tachyon without a problem.

Regarding Reynold's last email, those are good points. Tachyon had provided
this a while ago. We are working on enhancing this feature and the
integration part with Spark.

Thanks,

Haoyuan

On Fri, Dec 12, 2014 at 5:06 AM, Jun Feng Liu liuj...@cn.ibm.com wrote:

 I think the linage is the key feature of tachyon to reproduce the RDD
when
 any error happen. Otherwise, there have to be some data replica among
 tachyon nodes to ensure the data redundancy for fault tolerant - I think
 tachyon is avoiding to go to this path. Dose it mean the off-heap
solution
 is not ready yet if tachyon linage dose not work right now?

 Best Regards


 *Jun Feng Liu*
 IBM China Systems  Technology Laboratory in Beijing

   --
  [image: 2D barcode - encoded with contact information] *Phone:
*86-10-82452683

 * E-mail:* *liuj...@cn.ibm.com* liuj...@cn.ibm.com
 [image: IBM]

 BLD 28,ZGC Software Park
 No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
 China





  *Reynold Xin r...@databricks.com r...@databricks.com*

 2014/12/12 10:22
   To
 Andrew Ash and...@andrewash.com,
 cc
 Jun Feng Liu/China/IBM@IBMCN, dev@spark.apache.org
dev@spark.apache.org
 
 Subject
 Re: Tachyon in Spark




 Actually HY emailed me offline about this and this is supported in the
 latest version of Tachyon. It is a hard problem to push this into
storage;
 need to think about how to handle isolation, resource allocation, etc.



https://github.com/amplab/tachyon/blob/master/core/src/main/java/tachyon/master/Dependency.java


 On Thu, Dec 11, 2014 at 3:54 PM, Reynold Xin r...@databricks.com wrote:

  I don't think the lineage thing is even turned on in Tachyon - it was
  mostly a research prototype, so I don't think it'd make sense for us to
 use
  that.
 
 
  On Thu, Dec 11, 2014 at 3:51 PM, Andrew Ash and...@andrewash.com
 wrote:
 
  I'm interested in understanding this as well.  One of the main ways
  Tachyon
  is supposed to realize performance gains without sacrificing
durability
 is
  by storing the lineage of data rather than full copies of it (similar
to
  Spark).  But if Spark isn't sending lineage information into Tachyon,
 then
  I'm not sure how this isn't a durability concern.
 
  On Wed, Dec 10, 2014 at 5:47 AM, Jun Feng Liu liuj...@cn.ibm.com
 wrote:
 
   Dose Spark today really leverage Tachyon linage to process data? It
  seems
   like the application should call createDependency function in
 TachyonFS
   to create a new linage node. But I did not find any place call that
in
   Spark code. Did I missed anything?
  
   Best Regards
  
  
   *Jun Feng Liu*
   IBM China Systems  Technology Laboratory in Beijing
  
 --
[image: 2D barcode - encoded with contact information] *Phone:
  *86-10-82452683
  
   * E-mail:* *liuj...@cn.ibm.com* liuj...@cn.ibm.com
   [image: IBM]
  
   BLD 28,ZGC Software Park
   No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
   China
  
  
  
  
  
 
 
 



--
Haoyuan Li
AMPLab, EECS, UC Berkeley
http://www.cs.berkeley.edu/~haoyuan/


Re: Tachyon in Spark

2014-12-12 Thread Jun Feng Liu
I think the linage is the key feature of tachyon to reproduce the RDD when 
any error happen. Otherwise, there have to be some data replica among 
tachyon nodes to ensure the data redundancy for fault tolerant - I think 
tachyon is avoiding to go to this path. Dose it mean the off-heap solution 
is not ready yet if tachyon linage dose not work right now? 
 
Best Regards
 
Jun Feng Liu
IBM China Systems  Technology Laboratory in Beijing



Phone: 86-10-82452683 
E-mail: liuj...@cn.ibm.com


BLD 28,ZGC Software Park 
No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193 
China 
 

 



Reynold Xin r...@databricks.com 
2014/12/12 10:22

To
Andrew Ash and...@andrewash.com, 
cc
Jun Feng Liu/China/IBM@IBMCN, dev@spark.apache.org 
dev@spark.apache.org
Subject
Re: Tachyon in Spark






Actually HY emailed me offline about this and this is supported in the
latest version of Tachyon. It is a hard problem to push this into storage;
need to think about how to handle isolation, resource allocation, etc.

https://github.com/amplab/tachyon/blob/master/core/src/main/java/tachyon/master/Dependency.java


On Thu, Dec 11, 2014 at 3:54 PM, Reynold Xin r...@databricks.com wrote:

 I don't think the lineage thing is even turned on in Tachyon - it was
 mostly a research prototype, so I don't think it'd make sense for us to 
use
 that.


 On Thu, Dec 11, 2014 at 3:51 PM, Andrew Ash and...@andrewash.com 
wrote:

 I'm interested in understanding this as well.  One of the main ways
 Tachyon
 is supposed to realize performance gains without sacrificing durability 
is
 by storing the lineage of data rather than full copies of it (similar 
to
 Spark).  But if Spark isn't sending lineage information into Tachyon, 
then
 I'm not sure how this isn't a durability concern.

 On Wed, Dec 10, 2014 at 5:47 AM, Jun Feng Liu liuj...@cn.ibm.com 
wrote:

  Dose Spark today really leverage Tachyon linage to process data? It
 seems
  like the application should call createDependency function in 
TachyonFS
  to create a new linage node. But I did not find any place call that 
in
  Spark code. Did I missed anything?
 
  Best Regards
 
 
  *Jun Feng Liu*
  IBM China Systems  Technology Laboratory in Beijing
 
--
   [image: 2D barcode - encoded with contact information] *Phone:
 *86-10-82452683
 
  * E-mail:* *liuj...@cn.ibm.com* liuj...@cn.ibm.com
  [image: IBM]
 
  BLD 28,ZGC Software Park
  No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
  China
 
 
 
 
 






Re: Tachyon in Spark

2014-12-11 Thread Andrew Ash
I'm interested in understanding this as well.  One of the main ways Tachyon
is supposed to realize performance gains without sacrificing durability is
by storing the lineage of data rather than full copies of it (similar to
Spark).  But if Spark isn't sending lineage information into Tachyon, then
I'm not sure how this isn't a durability concern.

On Wed, Dec 10, 2014 at 5:47 AM, Jun Feng Liu liuj...@cn.ibm.com wrote:

 Dose Spark today really leverage Tachyon linage to process data? It seems
 like the application should call createDependency function in TachyonFS
 to create a new linage node. But I did not find any place call that in
 Spark code. Did I missed anything?

 Best Regards


 *Jun Feng Liu*
 IBM China Systems  Technology Laboratory in Beijing

   --
  [image: 2D barcode - encoded with contact information] *Phone: 
 *86-10-82452683

 * E-mail:* *liuj...@cn.ibm.com* liuj...@cn.ibm.com
 [image: IBM]

 BLD 28,ZGC Software Park
 No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
 China







Re: Tachyon in Spark

2014-12-11 Thread Reynold Xin
I don't think the lineage thing is even turned on in Tachyon - it was
mostly a research prototype, so I don't think it'd make sense for us to use
that.


On Thu, Dec 11, 2014 at 3:51 PM, Andrew Ash and...@andrewash.com wrote:

 I'm interested in understanding this as well.  One of the main ways Tachyon
 is supposed to realize performance gains without sacrificing durability is
 by storing the lineage of data rather than full copies of it (similar to
 Spark).  But if Spark isn't sending lineage information into Tachyon, then
 I'm not sure how this isn't a durability concern.

 On Wed, Dec 10, 2014 at 5:47 AM, Jun Feng Liu liuj...@cn.ibm.com wrote:

  Dose Spark today really leverage Tachyon linage to process data? It seems
  like the application should call createDependency function in TachyonFS
  to create a new linage node. But I did not find any place call that in
  Spark code. Did I missed anything?
 
  Best Regards
 
 
  *Jun Feng Liu*
  IBM China Systems  Technology Laboratory in Beijing
 
--
   [image: 2D barcode - encoded with contact information] *Phone:
 *86-10-82452683
 
  * E-mail:* *liuj...@cn.ibm.com* liuj...@cn.ibm.com
  [image: IBM]
 
  BLD 28,ZGC Software Park
  No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
  China
 
 
 
 
 



Tachyon in Spark

2014-12-10 Thread Jun Feng Liu
Dose Spark today really leverage Tachyon linage to process data? It seems 
like the application should call createDependency function in TachyonFS to 
create a new linage node. But I did not find any place call that in Spark 
code. Did I missed anything?
Best Regards
 
Jun Feng Liu
IBM China Systems  Technology Laboratory in Beijing



Phone: 86-10-82452683 
E-mail: liuj...@cn.ibm.com


BLD 28,ZGC Software Park 
No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193 
China