Re: Is RDD thread safe?

2019-11-24 Thread Chang Chen
I need to cache the DataFrame for accelerating query.  In such case, the
two query may simultaneously run the DAG before cache data actually happen.

Sonal Goyal  于2019年11月19日周二 下午9:46写道:

> the RDD or the dataframe is distributed and partitioned by Spark so as to
> leverage all your workers (CPUs) effectively. So all the Dataframe
> operations are actually happening simultaneously on a section of the data.
> Why do you want to use threading here?
>
> Thanks,
> Sonal
> Nube Technologies 
>
> 
>
>
>
>
> On Tue, Nov 12, 2019 at 7:18 AM Chang Chen  wrote:
>
>>
>> Hi all
>>
>> I meet a case where I need cache a source RDD, and then create different
>> DataFrame from it in different threads to accelerate query.
>>
>> I know that SparkSession is thread safe(
>> https://issues.apache.org/jira/browse/SPARK-15135), but i am not sure
>> whether RDD  si thread safe or not
>>
>> Thanks
>> Chang
>>
>


Re: SparkR integration with Hive 3 spark-r

2019-11-24 Thread Felix Cheung
I think you will get more answer if you ask without SparkR.

You question is independent on SparkR.

Spark support for Hive 3.x (3.1.2) was added here

https://github.com/apache/spark/commit/1b404b9b9928144e9f527ac7b1caa15f932c2649

You should be able to connect Spark to Hive metastore.




From: Alfredo Marquez 
Sent: Friday, November 22, 2019 4:26:49 PM
To: user@spark.apache.org 
Subject: Re: SparkR integration with Hive 3 spark-r

Does anyone else have some insight to this question?

Thanks,

Alfredo

On Mon, Nov 18, 2019, 3:00 PM Alfredo Marquez 
mailto:alfredo.g.marq...@gmail.com>> wrote:
Hello Nicolas,

Well the issue is that with Hive 3, Spark gets it's own metastore, separate 
from the Hive 3 metastore.  So how do you reconcile this separation of 
metastores?

Can you continue to "enableHivemetastore" and be able to connect to Hive 3? 
Does this connection take advantage of Hive's LLAP?

Our team doesn't believe that it's possible to make the connection as you would 
in the past.  But if it is that simple, I would be ecstatic .

Thanks,

Alfredo

On Mon, Nov 18, 2019, 12:53 PM Nicolas Paris 
mailto:nicolas.pa...@riseup.net>> wrote:
Hi Alfredo

my 2 cents:
To my knowlegde and reading the spark3 pre-release note, it will handle
hive metastore 2.3.5 - no mention of hive 3 metastore. I made several
tests on this in the past[1] and it seems to handle any hive metastore
version.

However spark cannot read hive managed table AKA transactional tables.
So I would say you should be able to read any hive 3 regular table with
any of spark, pyspark or sparkR.


[1] https://parisni.frama.io/posts/playing-with-hive-spark-metastore-versions/

On Mon, Nov 18, 2019 at 11:23:50AM -0600, Alfredo Marquez wrote:
> Hello,
>
> Our company is moving to Hive 3, and they are saying that there is no SparkR
> implementation in Spark 2.3.x + that will connect to Hive 3.  Is this true?
>
> If it is true, will this be addressed in the Spark 3 release?
>
> I don't use python, so losing SparkR to get work done on Hadoop is a huge 
> loss.
>
> P.S. This is my first email to this community; if there is something I should
> do differently, please let me know.
>
> Thank you
>
> Alfredo

--
nicolas

-
To unsubscribe e-mail: 
user-unsubscr...@spark.apache.org