Hello experts, I have a huge json file (> 40G) and trying to use Parquet as a file format. Each entry has a unique identifier but other than that, it doesn't have 'well balanced value' column to partition it. Right now it just throws OOM and couldn't figure out what to do with it.
It would be ideal if I could provide a partitioner based on the unique identifier value like computing its hash value or something. One of the option would be to produce a hash value and add it as a separate column, but it doesn't sound right to me. Is there any other ways I can try ? Regards, -- Kohki Nishio