No, you can not refer a variable in scala when it is defined in scala. You need to register the table in python, then get it from SparkSession/SparkContext. The following is what you can do.
Python: df_in_pyspark = spark.read.json("examples/src/main/resources/people.json") df_in_pyspark.registerTempTable("mytable") Scala: val df_in_pyspark = spark.table("mytable") val dfInScala: DataFrame = df_in_pyspark.where("age > 35") David Hu <hood...@gmail.com>于2017年12月11日周一 下午12:09写道: > Hi Jeff, > > That's great to know! I've heard of zeppelin and sort of know what it > does, but I haven't got a chance to use it by myself. So to confirm if what > you are saying is what I am understanding, I'd like to go with a scenario. > > I first send a POST request to /sessions/1/statements with kind as > 'pyspark' and code as the following: > > df_in_pyspark = spark.read.json("examples/src/main/resources/people.json") > > the above code defines a dataframe var `df_in_pyspark` in Python code and > it will be used in the second POST request to /sessions/1/statements, whose > 'kind' is 'spark'(scala) with the following code: > > val dfInScala: DataFrame = df_in_pyspark.where("age > 35") > > So basically you were saying that the above code would run without any > issues, is that correct? If so, I assume it also applies to other types of > vars like Estimator/Model/Pipeline? Then how about methods? Is it ok if I > define a method in Scala and later use it in Python/R code and vice versa? > > Sorry for so many questions but if I could know the answer I would be much > assured to upgrade to latest HDP and enable this awesome feature. Thanks! > > Regards, David > > 2017-12-11 11:07 GMT+08:00 Jeff Zhang <zjf...@gmail.com>: > >> >> You can use dataframe in scala if this dataframe is registered in python. >> Because they share the same sparkcontext. >> >> I believe livy can meet your requirement. If you know zeppelin, the >> behavior of livy now is very similar as zeppelin where you can run one >> paragraph via scala and another paragraph via python or R. And they run in >> the same spark application and be able to share data via sparkcontext. >> >> >> >> >> >> >> David Hu <hood...@gmail.com>于2017年12月11日周一 上午10:44写道: >> >>> Hi Jeff & Saisai, >>> >>> Thank you so much for the explanation and they are very helpful, also >>> sorry for not replying in time. >>> >>> I had read all the links you provided and the impression I got is that, >>> correct me if I am wrong, this feature would not allow different >>> session-kind interacting with each other? What I mean is, if I ran one >>> Scala kind and one Python kind in the same context, unless some kind of >>> persistence it won't be possible to refer a dataframe variable in >>> Python code that was defined in Scala right? >>> >>> The goal I want to achieve is to mix different languages together and >>> run as one integrated spark job within which vars/methods defined in one >>> language can be referred/used in other, because our users might have >>> different programming background. It might sound silly but I am keen to >>> know if that's possible under the current Livy infrastructure. Appreciate >>> it if anyone could answer. Thanks in advance! >>> >>> Regards, Dawei >>> >>> 2017-12-04 8:30 GMT+08:00 Saisai Shao <sai.sai.s...@gmail.com>: >>> >>>> This feature is targeted for Livy 0.5.0 community version. But we >>>> already back-ported this in HDP 2.6.3, so you can try this feature in HDP >>>> 2.6.3. >>>> >>>> You can check this doc ( >>>> https://github.com/apache/incubator-livy/blob/master/docs/rest-api.md) >>>> to see the API difference for this feature. >>>> >>>> 2017-12-03 9:55 GMT+08:00 Jeff Zhang <zjf...@gmail.com>: >>>> >>>>> >>>>> It is implemented in https://issues.apache.org/jira/browse/LIVY-194 >>>>> >>>>> But not release in apache version, HDP backport it in their >>>>> distribution >>>>> >>>>> >>>>> >>>>> 胡大为(David) <hood...@gmail.com>于2017年12月2日周六 上午10:58写道: >>>>> >>>>>> I forgot to add the link reference and here it is. >>>>>> >>>>>> https://hortonworks.com/blog/hdp-2-6-3-dataplane-service/ >>>>>> >>>>>> Regards, Dawei >>>>>> >>>>>> On 2 Dec 2017, at 8:24 AM, 胡大为(David) <hood...@gmail.com> wrote: >>>>>> >>>>>> >>>>>> Hi all, >>>>>> >>>>>> I was reading the HDP 2.6.3 release notes and it mentions that Livy >>>>>> service is able to multiple programming languages in the same Spark >>>>>> context, but I went through all the Livy document and examples I can find >>>>>> but so far haven’t found out how to get it work. Currently I am using the >>>>>> latest Livy 0.4 to submit Scala code only and it would be awesome to mix >>>>>> it >>>>>> with Python or R code in the same session. Much appreciate it anyone >>>>>> could >>>>>> give me some clue about this. >>>>>> >>>>>> Thanks in advance and have a good day :) >>>>>> >>>>>> Regards, Dawei >>>>>> >>>>>> >>>>>> >>>> >>> >