Re: off heap to alluxio/tachyon in Spark 2
Hi, If you are looking for how to run Spark on Alluxio (formerly Tachyon), here is the documentation from Alluxio doc site: http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html It still works for Spark 2.x. Alluxio team also published articles on when and why running Spark (2.x) with Alluxio may benefit performance: http://www.alluxio.com/2016/08/effective-spark-rdds-with-alluxio/ - Bin On Mon, Sep 19, 2016 at 7:56 AM, aka.fe2s wrote: > Hi folks, > > What has happened with Tachyon / Alluxio in Spark 2? Doc doesn't mention > it no longer. > > -- > Oleksiy Dyagilev >
Re: off heap to alluxio/tachyon in Spark 2
It backed the "OFF_HEAP" storage level for RDDs. That's not quite the same thing that off-heap Tungsten allocation refers to. It's also worth pointing out that things like HDFS also can put data into memory already. On Mon, Sep 19, 2016 at 7:48 PM, Richard Catlin wrote: > Here is my understanding. > > Spark used Tachyon as an off-heap solution for RDDs. In certain situations, > it would alleviate Garbage Collection or the RDDs. > > Tungsten, Spark 2’s off-heap (columnar format) is much more efficient and > used as the default. Alluvio no longer makes sense for this use. > > > You can still use Tachyon/Alluxio to bring your files into Memory, which is > quicker for Spark to access than your DFS(HDFS or S3). > > Alluxio actually supports a “Tiered Filesystem”, and automatically brings > the “hotter” files into the fastest storage (Memory, SSD). You can > configure it with Memory, SSD, and/or HDDs with the DFS as the persistent > store, called under-filesystem. > > Hope this helps. > > Richard Catlin > > On Sep 19, 2016, at 7:56 AM, aka.fe2s wrote: > > Hi folks, > > What has happened with Tachyon / Alluxio in Spark 2? Doc doesn't mention it > no longer. > > -- > Oleksiy Dyagilev > > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: off heap to alluxio/tachyon in Spark 2
Here is my understanding. Spark used Tachyon as an off-heap solution for RDDs. In certain situations, it would alleviate Garbage Collection or the RDDs. Tungsten, Spark 2’s off-heap (columnar format) is much more efficient and used as the default. Alluvio no longer makes sense for this use. You can still use Tachyon/Alluxio to bring your files into Memory, which is quicker for Spark to access than your DFS(HDFS or S3). Alluxio actually supports a “Tiered Filesystem”, and automatically brings the “hotter” files into the fastest storage (Memory, SSD). You can configure it with Memory, SSD, and/or HDDs with the DFS as the persistent store, called under-filesystem. Hope this helps. Richard Catlin > On Sep 19, 2016, at 7:56 AM, aka.fe2s wrote: > > Hi folks, > > What has happened with Tachyon / Alluxio in Spark 2? Doc doesn't mention it > no longer. > > -- > Oleksiy Dyagilev