Hi Team, I have asked this question in our stackoverflow group
pyspark - Apache Spark partition by output path - Stack Overflow <https://stackoverflow.com/questions/74089582/apache-spark-partition-by-output-path> *Requirement* 1. I have huge data coming from source and loaded into Azure Data lake in csv format. 2. One of the column in csv file is tenant Id. 3. I need to partition this csv on tenantId and store in ADLS in this directory {*tenantId*}\(tenanid}.csv --> bolder part is storage container 4. There is one more challenge. my source csv file can have more than 50,000 unique tenant Id's. Maximum In one storage account i want to keep only 25000 tenant data in this format {*tenantId*}\(tenanid}.csv , remaining 25000 should go to another storage account. I want to know how do i customize or write custom code for PartitionBy , so that i can have more control on this method, i can write my own logic to map tenant data into their respective storage. Need your help in this regard. Thanks in advance. Regards Venkatesh