Re: Spark job stuck after read and not starting next stage

2021-01-20 Thread German Schiavon
Hi, not sure if it is your case, but if the source data is heavy and deeply nested I'd recommend explicitly providing the schema when reading the json. df = spark.read.schema(schema).json(updated_dataset) On Thu, 21 Jan 2021 at 04:15, srinivasarao daruna wrote: > Hi, > I am running a spark

Spark job stuck after read and not starting next stage

2021-01-20 Thread srinivasarao daruna
Hi, I am running a spark job on a huge dataset. I have allocated 10 R5.16xlarge machines. (each consists 64cores, 512G). The source data is json and i need to do some json transformations. So, i read them as text and then convert to a dataframe. ds = spark.read.textFile() updated_dataset =