As a first guess, where do you think this view is created in a distributed
environment?

The whole purpose is fast access to this temporary storage (shared among
executors in this job) and that storage is only materialised after an
action is performed.

scala> val sales = spark.read.format("jdbc").options(
     |        Map("url" -> _ORACLEserver,
     |        "dbtable" -> "(SELECT * FROM sh.sales)",
     |        "user" -> _username,
     |        "password" -> _password)).load
sales: org.apache.spark.sql.DataFrame = [PROD_ID: decimal(38,10), CUST_ID:
decimal(38,10) ... 5 more fields]

scala> sales.createOrReplaceTempView("sales")

scala> spark.sql("select count(1) from sales").show
+--------+
|count(1)|
+--------+
|  918843|
+--------+

HTH



   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 26 Mar 2021 at 06:55, Kushagra Deep <kushagra.d...@mobileum.com>
wrote:

> Hi all,
>
> I just wanted to know that when we create a 'createOrReplaceTempView' on a
> spark dataset, where does the view reside ? Does all the data come to
> driver and the view is created ? Or individual executors have part of the
> views (based on the data each executor has) with them , so that when we
> query a view, the query runs on each part of data that is there in every
> executor?
>
>
>
> Get Outlook for Android <https://aka.ms/ghei36>
>

Reply via email to