Re: running pyspark on kubernetes - no space left on device

2022-09-01 Thread Qian SUN
Hi
Spark provides spark.local.dir configuration to specify work folder on the
pod. You can specify spark.local.dir as your mount path.

Best regards

Manoj GEORGE  于2022年9月1日周四 21:16写道:

> CONFIDENTIAL & RESTRICTED
>
> Hi Team,
>
>
>
> I am new to spark, so please excuse my ignorance.
>
>
>
> Currently we are trying to run PySpark on Kubernetes cluster. The setup is
> working fine for some jobs, but when we are processing a large file ( 36
> gb),  we run into one of space issues.
>
>
>
> Based on what was found on internet, we have mapped the local dir to a
> persistent volume. This still doesn’t solve the issue.
>
>
>
> I am not sure if it is still writing to /tmp folder on the pod. Is there
> some other setting which need to be changed for this to work.
>
>
>
> Thanks in advance.
>
>
>
>
>
>
>
> Thanks,
>
> Manoj George
>
> *Manager Database Architecture*​
> M: +1 3522786801
>
> manoj.geo...@amadeus.com
>
> www.amadeus.com
> 
> ​
>
>
> 
>
>
> Disclaimer: This email message and information contained in or attached to
> this message may be privileged, confidential, and protected from disclosure
> and is intended only for the person or entity to which it is addressed. Any
> review, retransmission, dissemination, printing or other use of, or taking
> of any action in reliance upon, this information by persons or entities
> other than the intended recipient is prohibited. If you receive this
> message in error, please immediately inform the sender by reply email and
> delete the message and any attachments. Thank you.
>


-- 
Best!
Qian SUN


Re: running pyspark on kubernetes - no space left on device

2022-09-01 Thread Matt Proetsch
Hi George,

You can try mounting a larger PersistentVolume to the work directory as 
described here instead of using localdir which might have site-specific size 
constraints:

https://spark.apache.org/docs/latest/running-on-kubernetes.html#using-kubernetes-volumes

-Matt

> On Sep 1, 2022, at 09:16, Manoj GEORGE  
> wrote:
> 
> 
> CONFIDENTIAL & RESTRICTED
> 
> Hi Team,
>  
> I am new to spark, so please excuse my ignorance.
>  
> Currently we are trying to run PySpark on Kubernetes cluster. The setup is 
> working fine for some jobs, but when we are processing a large file ( 36 gb), 
>  we run into one of space issues.
>  
> Based on what was found on internet, we have mapped the local dir to a 
> persistent volume. This still doesn’t solve the issue.
>  
> I am not sure if it is still writing to /tmp folder on the pod. Is there some 
> other setting which need to be changed for this to work.
>  
> Thanks in advance.
>  
>  
>  
> Thanks,
> Manoj George
> Manager Database Architecture​
> M: +1 3522786801
> manoj.geo...@amadeus.com
> www.amadeus.com​
> 
>  
> Disclaimer: This email message and information contained in or attached to 
> this message may be privileged, confidential, and protected from disclosure 
> and is intended only for the person or entity to which it is addressed. Any 
> review, retransmission, dissemination, printing or other use of, or taking of 
> any action in reliance upon, this information by persons or entities other 
> than the intended recipient is prohibited. If you receive this message in 
> error, please immediately inform the sender by reply email and delete the 
> message and any attachments. Thank you.


running pyspark on kubernetes - no space left on device

2022-09-01 Thread Manoj GEORGE
CONFIDENTIAL & RESTRICTED

Hi Team,

I am new to spark, so please excuse my ignorance.

Currently we are trying to run PySpark on Kubernetes cluster. The setup is 
working fine for some jobs, but when we are processing a large file ( 36 gb),  
we run into one of space issues.

Based on what was found on internet, we have mapped the local dir to a 
persistent volume. This still doesn’t solve the issue.

I am not sure if it is still writing to /tmp folder on the pod. Is there some 
other setting which need to be changed for this to work.

Thanks in advance.



Thanks,
Manoj George
Manager Database Architecture​
M: +1 3522786801
manoj.geo...@amadeus.com
www.amadeus.com​
[cid:image001.png@01D8BDDF.E19AB9C0]

Disclaimer: This email message and information contained in or attached to this 
message may be privileged, confidential, and protected from disclosure and is 
intended only for the person or entity to which it is addressed. Any review, 
retransmission, dissemination, printing or other use of, or taking of any 
action in reliance upon, this information by persons or entities other than the 
intended recipient is prohibited. If you receive this message in error, please 
immediately inform the sender by reply email and delete the message and any 
attachments. Thank you.