Raise exception whilst casting instead of defaulting to null

2023-04-05 Thread Yeachan Park
Hi all,

The default behaviour of Spark is to add a null value for casts that fail,
unless ANSI SQL is enabled, SPARK-30292
.

Whilst I understand that this is a subset of ANSI compliant behaviour, I
don't understand why this feature is so coupled. Enabling ANSI also comes
with other consequences that fall outside casting behaviour, and not all
Spark operations are done via the SQL interface (i.e. spark.sql("") ).

I can imagine it's a pretty useful feature to have something like an extra
arg that would raise an exception if casting fails (e.g. *df.age.cast("int",
raise=True)* ) without enabling ANSI as an option.

Does anyone know why this approach was chosen/have I missed something?
Would others find something like this useful?

Thanks,
Yeachan


Re: Potability of dockers built on different cloud platforms

2023-04-05 Thread Mich Talebzadeh
The whole idea of creating a docker container is to have a reployable self
contained utility. A Docker container image is a lightweight, standalone,
executable package of software that includes everything needed to run an
application: code, runtime, system tools, system libraries and settings. The
concepts are explained in the http://sparkcommunitytalk.slack.com/ slack
under section https://sparkcommunitytalk.slack.com/archives/C051KFWK9TJ

Back to AWS, GCP use case, we are currently creating an Istio mesh for GCP
to AWS k8s fail-over using the same docker image in both gc
r and ecr

(container registries)

 HTH

Mich Talebzadeh,
Lead Solutions Architect/Engineering Lead
Palantir Technologies
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 5 Apr 2023 at 10:59, Ken Peng  wrote:

>
>
> ashok34...@yahoo.com.INVALID wrote:
> > Is it possible to use Spark docker built on GCP on AWS without
> > rebuilding from new on AWS?
>
> I am using the spark image from bitnami for running on k8s.
> And yes, it's deployed by helm.
>
>
> --
> https://kenpeng.pages.dev/
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: Troubleshooting ArrayIndexOutOfBoundsException in long running Spark application

2023-04-05 Thread Mich Talebzadeh
OK Spark Structured Streaming.

How are you getting messages into Spark?  Is it Kafka?

This to me index that the message is incomplete or having another value in
Json

HTH

Mich Talebzadeh,
Lead Solutions Architect/Engineering Lead
Palantir Technologies
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Wed, 5 Apr 2023 at 12:58, me  wrote:

> Dear Apache Spark users,
> I have a long running Spark application that is encountering an
> ArrayIndexOutOfBoundsException once every two weeks. The exception does not
> disrupt the operation of my app, but I'm still concerned about it and would
> like to find a solution.
>
> Here's some additional information about my setup:
>
> Spark is running in standalone mode
> Spark version is 3.3.1
> Scala version is 2.12.15
> I'm using Spark in Structured Streaming
>
> Here's the relevant error message:
> java.lang.ArrayIndexOutOfBoundsException Index 59 out of bounds for length
> 16
> I've reviewed the code and searched online, but I'm still unable to find a
> solution. The full stacktrace can be found at this link:
> https://gist.github.com/rsi2m/ae54eccac93ae602d04d383e56c1a737
> I would appreciate any insights or suggestions on how to resolve this
> issue. Thank you in advance for your help.
>
> Best regards,
> rsi2m
>
>
>


Troubleshooting ArrayIndexOutOfBoundsException in long running Spark application

2023-04-05 Thread me
 
 
Dear Apache Spark users,
 
 
I have a long running Spark application that is encountering an 
ArrayIndexOutOfBoundsException once every two weeks. The exception does not 
disrupt the operation of my app, but I'm still concerned about it and would 
like to find a solution.
 

 
Here's some additional information about my setup:
 

 
Spark is running in standalone mode
 
Spark version is 3.3.1
 
Scala version is 2.12.15
 
I'm using Spark in Structured Streaming
 

 
Here's the relevant error message:
 
java.lang.ArrayIndexOutOfBoundsException Index 59 out of bounds for length 16
 
I've reviewed the code and searched online, but I'm still unable to find a 
solution. The full stacktrace can be found at this link: 
https://gist.github.com/rsi2m/ae54eccac93ae602d04d383e56c1a737
 
I would appreciate any insights or suggestions on how to resolve this issue. 
Thank you in advance for your help.
 

 
Best regards,
 
rsi2m
 
 

 
 

 
 

Troubleshooting ArrayIndexOutOfBoundsException in long running Spark application

2023-04-05 Thread me
 
 
Dear Apache Spark users,
 
 

I have a long running Spark application that is encountering an 
ArrayIndexOutOfBoundsException once every two weeks. The exception does not 
disrupt the operation of my app, but I'm still concerned about it and would 
like to find a solution.

 

Here's some additional information about my setup:

 

Spark is running in standalone mode
 Spark version is 3.3.1
 Scala version is 2.12.15
 I'm using Spark in Structured Streaming

 

Here's the relevant error message:
 java.lang.ArrayIndexOutOfBoundsException Index 59 out of bounds for length 16
 I've reviewed the code and searched online, but I'm still unable to find a 
solution. The full stacktrace can be found at this link:  
https://gist.github.com/rsi2m/ae54eccac93ae602d04d383e56c1a737
 I would appreciate any insights or suggestions on how to resolve this issue. 
Thank you in advance for your help.

 

Best regards,
 rsi2m

 
 

 
 

Re: Potability of dockers built on different cloud platforms

2023-04-05 Thread Ken Peng




ashok34...@yahoo.com.INVALID wrote:
Is it possible to use Spark docker built on GCP on AWS without 
rebuilding from new on AWS?


I am using the spark image from bitnami for running on k8s.
And yes, it's deployed by helm.


--
https://kenpeng.pages.dev/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Potability of dockers built on different cloud platforms

2023-04-05 Thread ashok34...@yahoo.com.INVALID
Hello team
Is it possible to use Spark docker built on GCP on AWS without rebuilding from 
new on AWS?
Will that work please.
AK