Re: running pyspark on kubernetes - no space left on device

2022-09-01 Thread Qian SUN
Hi
Spark provides spark.local.dir configuration to specify work folder on the
pod. You can specify spark.local.dir as your mount path.

Best regards

Manoj GEORGE  于2022年9月1日周四 21:16写道:

> CONFIDENTIAL & RESTRICTED
>
> Hi Team,
>
>
>
> I am new to spark, so please excuse my ignorance.
>
>
>
> Currently we are trying to run PySpark on Kubernetes cluster. The setup is
> working fine for some jobs, but when we are processing a large file ( 36
> gb),  we run into one of space issues.
>
>
>
> Based on what was found on internet, we have mapped the local dir to a
> persistent volume. This still doesn’t solve the issue.
>
>
>
> I am not sure if it is still writing to /tmp folder on the pod. Is there
> some other setting which need to be changed for this to work.
>
>
>
> Thanks in advance.
>
>
>
>
>
>
>
> Thanks,
>
> Manoj George
>
> *Manager Database Architecture*​
> M: +1 3522786801
>
> manoj.geo...@amadeus.com
>
> www.amadeus.com
> 
> ​
>
>
> 
>
>
> Disclaimer: This email message and information contained in or attached to
> this message may be privileged, confidential, and protected from disclosure
> and is intended only for the person or entity to which it is addressed. Any
> review, retransmission, dissemination, printing or other use of, or taking
> of any action in reliance upon, this information by persons or entities
> other than the intended recipient is prohibited. If you receive this
> message in error, please immediately inform the sender by reply email and
> delete the message and any attachments. Thank you.
>


-- 
Best!
Qian SUN


Re: running pyspark on kubernetes - no space left on device

2022-09-01 Thread Matt Proetsch
Hi George,

You can try mounting a larger PersistentVolume to the work directory as 
described here instead of using localdir which might have site-specific size 
constraints:

https://spark.apache.org/docs/latest/running-on-kubernetes.html#using-kubernetes-volumes

-Matt

> On Sep 1, 2022, at 09:16, Manoj GEORGE  
> wrote:
> 
> 
> CONFIDENTIAL & RESTRICTED
> 
> Hi Team,
>  
> I am new to spark, so please excuse my ignorance.
>  
> Currently we are trying to run PySpark on Kubernetes cluster. The setup is 
> working fine for some jobs, but when we are processing a large file ( 36 gb), 
>  we run into one of space issues.
>  
> Based on what was found on internet, we have mapped the local dir to a 
> persistent volume. This still doesn’t solve the issue.
>  
> I am not sure if it is still writing to /tmp folder on the pod. Is there some 
> other setting which need to be changed for this to work.
>  
> Thanks in advance.
>  
>  
>  
> Thanks,
> Manoj George
> Manager Database Architecture​
> M: +1 3522786801
> manoj.geo...@amadeus.com
> www.amadeus.com​
> 
>  
> Disclaimer: This email message and information contained in or attached to 
> this message may be privileged, confidential, and protected from disclosure 
> and is intended only for the person or entity to which it is addressed. Any 
> review, retransmission, dissemination, printing or other use of, or taking of 
> any action in reliance upon, this information by persons or entities other 
> than the intended recipient is prohibited. If you receive this message in 
> error, please immediately inform the sender by reply email and delete the 
> message and any attachments. Thank you.