Re: off heap to alluxio/tachyon in Spark 2

2016-09-19 Thread Bin Fan
Hi,

If you are looking for how to run Spark on Alluxio (formerly Tachyon),
here is the documentation from Alluxio doc site:
http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html
It still works for Spark 2.x.

Alluxio team also published articles on when and why running Spark (2.x)
with Alluxio may benefit performance:
http://www.alluxio.com/2016/08/effective-spark-rdds-with-alluxio/

- Bin


On Mon, Sep 19, 2016 at 7:56 AM, aka.fe2s  wrote:

> Hi folks,
>
> What has happened with Tachyon / Alluxio in Spark 2? Doc doesn't mention
> it no longer.
>
> --
> Oleksiy Dyagilev
>


Re: off heap to alluxio/tachyon in Spark 2

2016-09-19 Thread Sean Owen
It backed the "OFF_HEAP" storage level for RDDs. That's not quite the
same thing that off-heap Tungsten allocation refers to.

It's also worth pointing out that things like HDFS also can put data
into memory already.

On Mon, Sep 19, 2016 at 7:48 PM, Richard Catlin
 wrote:
> Here is my understanding.
>
> Spark used Tachyon as an off-heap solution for RDDs.  In certain situations,
> it would alleviate Garbage Collection or the RDDs.
>
> Tungsten, Spark 2’s off-heap (columnar format) is much more efficient and
> used as the default.  Alluvio no longer makes sense for this use.
>
>
> You can still use Tachyon/Alluxio to bring your files into Memory, which is
> quicker for Spark to access than your DFS(HDFS or S3).
>
> Alluxio actually supports a “Tiered Filesystem”, and automatically brings
> the “hotter” files into the fastest storage (Memory, SSD).  You can
> configure it with Memory, SSD, and/or HDDs with the DFS as the persistent
> store, called under-filesystem.
>
> Hope this helps.
>
> Richard Catlin
>
> On Sep 19, 2016, at 7:56 AM, aka.fe2s  wrote:
>
> Hi folks,
>
> What has happened with Tachyon / Alluxio in Spark 2? Doc doesn't mention it
> no longer.
>
> --
> Oleksiy Dyagilev
>
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: off heap to alluxio/tachyon in Spark 2

2016-09-19 Thread Richard Catlin
Here is my understanding.

Spark used Tachyon as an off-heap solution for RDDs.  In certain situations, it 
would alleviate Garbage Collection or the RDDs.

Tungsten, Spark 2’s off-heap (columnar format) is much more efficient and used 
as the default.  Alluvio no longer makes sense for this use.


You can still use Tachyon/Alluxio to bring your files into Memory, which is 
quicker for Spark to access than your DFS(HDFS or S3).

Alluxio actually supports a “Tiered Filesystem”, and automatically brings the 
“hotter” files into the fastest storage (Memory, SSD).  You can configure it 
with Memory, SSD, and/or HDDs with the DFS as the persistent store, called 
under-filesystem.

Hope this helps.

Richard Catlin

> On Sep 19, 2016, at 7:56 AM, aka.fe2s  wrote:
> 
> Hi folks,
> 
> What has happened with Tachyon / Alluxio in Spark 2? Doc doesn't mention it 
> no longer.
> 
> --
> Oleksiy Dyagilev