Re: Silly Question on my part...

2016-05-17 Thread Gene Pang
Hi Michael,

Yes, you can use Alluxio to share Spark RDDs. Here is a blog post about
getting started with Spark and Alluxio (
http://www.alluxio.com/2016/04/getting-started-with-alluxio-and-spark/),
and some documentation (
http://alluxio.org/documentation/master/en/Running-Spark-on-Alluxio.html).

I hope that helps,
Gene

On Tue, May 17, 2016 at 8:36 AM, Michael Segel 
wrote:

> Thanks for the response.
>
> That’s what I thought, but I didn’t want to assume anything.
> (You know what happens when you ass u me … :-)
>
>
> Not sure about Tachyon though.  Its a thought, but I’m very conservative
> when it comes to design choices.
>
>
> On May 16, 2016, at 5:21 PM, John Trengrove 
> wrote:
>
> If you are wanting to share RDDs it might be a good idea to check out
> Tachyon / Alluxio.
>
> For the Thrift server, I believe the datasets are located in your Spark
> cluster as RDDs and you just communicate with it via the Thrift
> JDBC Distributed Query Engine connector.
>
> 2016-05-17 5:12 GMT+10:00 Michael Segel :
>
>> For one use case.. we were considering using the thrift server as a way
>> to allow multiple clients access shared RDDs.
>>
>> Within the Thrift Context, we create an RDD and expose it as a hive table.
>>
>> The question  is… where does the RDD exist. On the Thrift service node
>> itself, or is that just a reference to the RDD which is contained with
>> contexts on the cluster?
>>
>>
>> Thx
>>
>> -Mike
>>
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>
>
>


Re: Silly Question on my part...

2016-05-17 Thread Dood

On 5/16/2016 12:12 PM, Michael Segel wrote:

For one use case.. we were considering using the thrift server as a way to 
allow multiple clients access shared RDDs.

Within the Thrift Context, we create an RDD and expose it as a hive table.

The question  is… where does the RDD exist. On the Thrift service node itself, 
or is that just a reference to the RDD which is contained with contexts on the 
cluster?



You can share RDDs using Apache Ignite - it is a distributed memory 
grid/cache with tons of additional functionality. The advantage is extra 
resilience (you can mirror caches or just partition them), you can query 
the contents of the caches in standard SQL etc. Since the caches persist 
past the existence of the Spark app, you can share them (obviously). You 
also get read/write through to SQL or NoSQL databases on the back end 
for persistence and loading/dumping caches to secondary storage. It is 
written in Java so very easy to use from Scala/Spark apps.


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Silly Question on my part...

2016-05-17 Thread Michael Segel
Thanks for the response. 

That’s what I thought, but I didn’t want to assume anything. 
(You know what happens when you ass u me … :-) 


Not sure about Tachyon though.  Its a thought, but I’m very conservative when 
it comes to design choices. 


> On May 16, 2016, at 5:21 PM, John Trengrove  
> wrote:
> 
> If you are wanting to share RDDs it might be a good idea to check out Tachyon 
> / Alluxio.
> 
> For the Thrift server, I believe the datasets are located in your Spark 
> cluster as RDDs and you just communicate with it via the Thrift JDBC 
> Distributed Query Engine connector.
> 
> 2016-05-17 5:12 GMT+10:00 Michael Segel  >:
> For one use case.. we were considering using the thrift server as a way to 
> allow multiple clients access shared RDDs.
> 
> Within the Thrift Context, we create an RDD and expose it as a hive table.
> 
> The question  is… where does the RDD exist. On the Thrift service node 
> itself, or is that just a reference to the RDD which is contained with 
> contexts on the cluster?
> 
> 
> Thx
> 
> -Mike
> 
> 
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org 
> 
> For additional commands, e-mail: user-h...@spark.apache.org 
> 
> 
> 
> 



Re: Silly Question on my part...

2016-05-16 Thread John Trengrove
If you are wanting to share RDDs it might be a good idea to check out
Tachyon / Alluxio.

For the Thrift server, I believe the datasets are located in your Spark
cluster as RDDs and you just communicate with it via the Thrift
JDBC Distributed Query Engine connector.

2016-05-17 5:12 GMT+10:00 Michael Segel :

> For one use case.. we were considering using the thrift server as a way to
> allow multiple clients access shared RDDs.
>
> Within the Thrift Context, we create an RDD and expose it as a hive table.
>
> The question  is… where does the RDD exist. On the Thrift service node
> itself, or is that just a reference to the RDD which is contained with
> contexts on the cluster?
>
>
> Thx
>
> -Mike
>
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Silly Question on my part...

2016-05-16 Thread Michael Segel
For one use case.. we were considering using the thrift server as a way to 
allow multiple clients access shared RDDs. 

Within the Thrift Context, we create an RDD and expose it as a hive table. 

The question  is… where does the RDD exist. On the Thrift service node itself, 
or is that just a reference to the RDD which is contained with contexts on the 
cluster? 


Thx

-Mike


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org