yep. created on driver memory. watch for OOM if the size becomes too large

spark-submit --driver-memory 8G ...

HTH

Dr Mich Talebzadeh,
Architect | Data Science | Financial Crime | Forensic Analysis | GDPR

   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>





On Sun, 16 Feb 2025 at 09:16, Tim Robertson <timrobertson...@gmail.com>
wrote:

> Answering my own question. Global temp views get created in the
> global_temp database, so can be accessed thusly.
>
> Thanks
>
> Dataset<Row> s = spark.read().parquet("/tmp/svampeatlas/*");
> s.createOrReplaceGlobalTempView("occurrence_svampe");
> spark.catalog().cacheTable("global_temp.occurrence_svampe");
>
>
> On Sun, Feb 16, 2025 at 10:05 AM Tim Robertson <timrobertson...@gmail.com>
> wrote:
>
>> Hi folks
>>
>> Is it possible to cache a table for shared use across sessions with spark
>> connect?
>> I'd like to load a read only table once that many sessions will then
>> query to improve performance.
>>
>> This is an example of the kind of thing that I have been trying, but have
>> not succeeded with.
>>
>>   SparkSession spark =
>> SparkSession.builder().remote("sc://localhost").getOrCreate();
>>   Dataset<Row> s = spark.read().parquet("/tmp/svampeatlas/*");
>>
>>   // this works if it is not "global"
>>   s.createOrReplaceGlobalTempView("occurrence_svampe");
>>   spark.catalog().cacheTable("occurrence_svampe");
>>
>>   // this fails with a table not found when a global view is used
>>   spark
>>       .sql("SELECT * FROM occurrence_svampe")
>>       .write()
>>       .parquet("/tmp/export");
>>
>> Thank you
>> Tim
>>
>

Reply via email to