Hello experts,

I have a huge json file (> 40G) and trying to use Parquet as a file format.
Each entry has a unique identifier but other than that, it doesn't have
'well balanced value' column to partition it. Right now it just throws OOM
and couldn't figure out what to do with it.

It would be ideal if I could provide a partitioner based on the unique
identifier value like computing its hash value or something.  One of the
option would be to produce a hash value and add it as a separate column,
but it doesn't sound right to me. Is there any other ways I can try ?

Regards,
-- 
Kohki Nishio

Reply via email to