subject:"Performance tuning on the Databricks pyspark 2.4.4"

Re: Performance tuning on the Databricks pyspark 2.4.4

2020-01-21 Thread ayan guha

For case 1, you can create 3 notebooks and 3 jobs in databricks. Then you can run them in parallel On Wed, 22 Jan 2020 at 3:50 am, anbutech wrote: > Hi sir, > > Could you please help me on the below two cases in the databricks pyspark > data processing terabytes of json data read from aws s3 buc

Performance tuning on the Databricks pyspark 2.4.4

2020-01-21 Thread anbutech

Hi sir, Could you please help me on the below two cases in the databricks pyspark data processing terabytes of json data read from aws s3 bucket. case 1: currently I'm reading multiple tables sequentially to get the day count from each table for ex: table_list.csv having one column with multip