Re: Spark 2.x OFF_HEAP persistence

2017-01-09 Thread Gene Pang
Yes, as far as I can tell, your description is accurate.

Thanks,
Gene

On Wed, Jan 4, 2017 at 9:37 PM, Vin J  wrote:

> Thanks for the reply Gene. Looks like this means, with Spark 2.x, one has
> to change from rdd.persist(StorageLevel.OFF_HEAP) to 
> rdd.saveAsTextFile(alluxioPath)
> / rdd.saveAsObjectFile (alluxioPath) for guarantees like persisted rdd
> surviving a Spark JVM crash etc,  as also the other benefits you mention.
>
> Vin.
>
> On Thu, Jan 5, 2017 at 2:50 AM, Gene Pang  wrote:
>
>> Hi Vin,
>>
>> From Spark 2.x, OFF_HEAP was changed to no longer directly interface with
>> an external block store. The previous tight dependency was restrictive and
>> reduced flexibility. It looks like the new version uses the executor's off
>> heap memory to allocate direct byte buffers, and does not interface with
>> any external system for the data storage. I am not aware of a way to
>> connect the new version of OFF_HEAP to Alluxio.
>>
>> You can experience similar benefits of the old OFF_HEAP <-> Tachyon mode
>> as well as additional benefits like unified namespace
>> 
>>  or
>> sharing in-memory data across applications, by using the Alluxio
>> filesystem API
>> .
>>
>> I hope this helps!
>>
>> Thanks,
>> Gene
>>
>> On Wed, Jan 4, 2017 at 10:50 AM, Vin J  wrote:
>>
>>> Until Spark 1.6 I see there were specific properties to configure such
>>> as the external block store master url (spark.externalBlockStore.url) etc
>>> to use OFF_HEAP storage level which made it clear that an external Tachyon
>>> type of block store as required/used for OFF_HEAP storage.
>>>
>>> Can someone clarify how this has been changed in Spark 2.x - because I
>>> do not see config settings anymore that point Spark to an external block
>>> store like Tachyon (now Alluxio) (or am i missing seeing it?)
>>>
>>> I understand there are ways to use Alluxio with Spark, but how about
>>> OFF_HEAP storage - can Spark 2.x OFF_HEAP rdd persistence still exploit
>>> alluxio/external block store? Any pointers to design decisions/Spark JIRAs
>>> related to this will also help.
>>>
>>> Thanks,
>>> Vin.
>>>
>>
>>
>


Re: Spark 2.x OFF_HEAP persistence

2017-01-04 Thread Vin J
Thanks for the reply Gene. Looks like this means, with Spark 2.x, one has
to change from rdd.persist(StorageLevel.OFF_HEAP) to
rdd.saveAsTextFile(alluxioPath) / rdd.saveAsObjectFile (alluxioPath) for
guarantees like persisted rdd surviving a Spark JVM crash etc,  as also the
other benefits you mention.

Vin.

On Thu, Jan 5, 2017 at 2:50 AM, Gene Pang  wrote:

> Hi Vin,
>
> From Spark 2.x, OFF_HEAP was changed to no longer directly interface with
> an external block store. The previous tight dependency was restrictive and
> reduced flexibility. It looks like the new version uses the executor's off
> heap memory to allocate direct byte buffers, and does not interface with
> any external system for the data storage. I am not aware of a way to
> connect the new version of OFF_HEAP to Alluxio.
>
> You can experience similar benefits of the old OFF_HEAP <-> Tachyon mode
> as well as additional benefits like unified namespace
> 
>  or
> sharing in-memory data across applications, by using the Alluxio
> filesystem API
> .
>
> I hope this helps!
>
> Thanks,
> Gene
>
> On Wed, Jan 4, 2017 at 10:50 AM, Vin J  wrote:
>
>> Until Spark 1.6 I see there were specific properties to configure such as
>> the external block store master url (spark.externalBlockStore.url) etc to
>> use OFF_HEAP storage level which made it clear that an external Tachyon
>> type of block store as required/used for OFF_HEAP storage.
>>
>> Can someone clarify how this has been changed in Spark 2.x - because I do
>> not see config settings anymore that point Spark to an external block store
>> like Tachyon (now Alluxio) (or am i missing seeing it?)
>>
>> I understand there are ways to use Alluxio with Spark, but how about
>> OFF_HEAP storage - can Spark 2.x OFF_HEAP rdd persistence still exploit
>> alluxio/external block store? Any pointers to design decisions/Spark JIRAs
>> related to this will also help.
>>
>> Thanks,
>> Vin.
>>
>
>


Re: Spark 2.x OFF_HEAP persistence

2017-01-04 Thread Gene Pang
Hi Vin,

>From Spark 2.x, OFF_HEAP was changed to no longer directly interface with
an external block store. The previous tight dependency was restrictive and
reduced flexibility. It looks like the new version uses the executor's off
heap memory to allocate direct byte buffers, and does not interface with
any external system for the data storage. I am not aware of a way to
connect the new version of OFF_HEAP to Alluxio.

You can experience similar benefits of the old OFF_HEAP <-> Tachyon mode as
well as additional benefits like unified namespace

or
sharing in-memory data across applications, by using the Alluxio filesystem
API .

I hope this helps!

Thanks,
Gene

On Wed, Jan 4, 2017 at 10:50 AM, Vin J  wrote:

> Until Spark 1.6 I see there were specific properties to configure such as
> the external block store master url (spark.externalBlockStore.url) etc to
> use OFF_HEAP storage level which made it clear that an external Tachyon
> type of block store as required/used for OFF_HEAP storage.
>
> Can someone clarify how this has been changed in Spark 2.x - because I do
> not see config settings anymore that point Spark to an external block store
> like Tachyon (now Alluxio) (or am i missing seeing it?)
>
> I understand there are ways to use Alluxio with Spark, but how about
> OFF_HEAP storage - can Spark 2.x OFF_HEAP rdd persistence still exploit
> alluxio/external block store? Any pointers to design decisions/Spark JIRAs
> related to this will also help.
>
> Thanks,
> Vin.
>


Spark 2.x OFF_HEAP persistence

2017-01-04 Thread Vin J
Until Spark 1.6 I see there were specific properties to configure such as
the external block store master url (spark.externalBlockStore.url) etc to
use OFF_HEAP storage level which made it clear that an external Tachyon
type of block store as required/used for OFF_HEAP storage.

Can someone clarify how this has been changed in Spark 2.x - because I do
not see config settings anymore that point Spark to an external block store
like Tachyon (now Alluxio) (or am i missing seeing it?)

I understand there are ways to use Alluxio with Spark, but how about
OFF_HEAP storage - can Spark 2.x OFF_HEAP rdd persistence still exploit
alluxio/external block store? Any pointers to design decisions/Spark JIRAs
related to this will also help.

Thanks,
Vin.