Re: How to deal Schema Evolution with Dataset API

2020-05-09 Thread Edgardo Szrajber
If you want to keep the dataset, maybe you can try to add a constructor to the case class (through the companion objcet) that receives only the age.Bentzi Sent from Yahoo Mail on Android On Sat, May 9, 2020 at 17:50, Jorge Machado wrote: Ok, I found a way to solve it. Just pass the

Re: How to populate all possible combination values in columns using Spark SQL

2020-05-09 Thread Edgardo Szrajber
. On Fri 8 May, 2020, 1:31 PM Edgardo Szrajber, wrote: Have you checked the pivot function?Bentzi Sent from Yahoo Mail on Android On Thu, May 7, 2020 at 22:46, Aakash Basu wrote: Hi, I've updated the SO question with masked data, added year column and other requirement. Please take a look

Re: How to populate all possible combination values in columns using Spark SQL

2020-05-08 Thread Edgardo Szrajber
Have you checked the pivot function?Bentzi Sent from Yahoo Mail on Android On Thu, May 7, 2020 at 22:46, Aakash Basu wrote: Hi, I've updated the SO question with masked data, added year column and other requirement. Please take a look. Hope this helps in solving the problem. Thanks and

Re: No. of active states?

2020-05-08 Thread Edgardo Szrajber
This should open a new world of real-time metrics for you.How to get Spark Metrics as JSON using Spark REST API in YARN Cluster mode | | | | | | | | | | | How to get Spark Metrics as JSON using Spark REST API in YARN Cluster mode Anbu Cheeralan Spark provides the metrics in UI.

Re: Filtering on multiple columns in spark

2020-04-29 Thread Edgardo Szrajber
Maybe create a column with "lit" function for the variables you are comparing against.Bentzi Sent from Yahoo Mail on Android On Wed, Apr 29, 2020 at 18:40, Mich Talebzadeh wrote: The below line works   valc =

Re: Converting a date to milliseconds with time zone in Scala

2020-04-28 Thread Edgardo Szrajber
Hiplease check combining unix_timestamp and from_unixtime, Something like:  from_unixtime(unix_timestamp( "06-04-2020 12:03:43"),"-MM-dd'T'HH:mm:ss Z") please note that I just wrote without any validation. In any case, you might want to check the documentation of both functions to check all

Re: [Structured Streaming] NullPointerException in long running query

2020-04-28 Thread Edgardo Szrajber
The exception occured while aborting the stage. It might be interesting to try to understand the reason for the abortion.Maybe timeout? How long the query run?Bentzi Sent from Yahoo Mail on Android On Tue, Apr 28, 2020 at 9:25, Jungtaek Lim wrote: The root cause of exception is occurred

Re: [pyspark] Load a master data file to spark ecosystem

2020-04-26 Thread Edgardo Szrajber
In the below  code you are impeding Spark from doing what is meant to do.As mentioned below, the best (and easiest to implement) aproach would be to load each file into a dataframe and join between them.Even doing a key join with RDDS would be better, but in your case you are forcing a one by