Re: reading each JSON file from dataframe...

2022-07-11 Thread Enrico Minack
All you need to do is implement a method readJson that reads a single file given its path. Than, you map the values of column file_path to the respective JSON content as a string. This can be done via an UDF or simply Dataset.map: case class RowWithJsonUri(entity_id: String, file_path:

Re: about cpu cores

2022-07-11 Thread Gourav Sengupta
Hi, please see Sean's answer and please read about parallelism in spark. Regards, Gourav Sengupta On Mon, Jul 11, 2022 at 10:12 AM Tufan Rakshit wrote: > so as an average every 4 core , you get back 3.6 core in Yarn , but you > can use only 3 . > in Kubernetes you get back 3.6 and also can use

Re: about cpu cores

2022-07-11 Thread Tufan Rakshit
so as an average every 4 core , you get back 3.6 core in Yarn , but you can use only 3 . in Kubernetes you get back 3.6 and also can use 3.6 Best Tufan On Mon, 11 Jul 2022 at 11:02, Yong Walt wrote: > We were using Yarn. thanks. > > On Sun, Jul 10, 2022 at 9:02 PM Tufan Rakshit wrote: > >>

Re: about cpu cores

2022-07-11 Thread Yong Walt
We were using Yarn. thanks. On Sun, Jul 10, 2022 at 9:02 PM Tufan Rakshit wrote: > Mainly depends what your cluster manager Yarn or kubernates ? > Best > Tufan > > On Sun, 10 Jul 2022 at 14:38, Sean Owen wrote: > >> Jobs consist of tasks, each of which consumes a core (can be set to >1 >> too,