Hello everyone,
I'm trying to read data from a Phoenix Table using apache Spark. I
actually use the suggested method: sc.phoenixTableAsRDD without issuing
any query (e.g. reading the whole table) and I noticed that the number
of partitions that spark creates is equal to the number of
regionServers. Is there a way to use a custom number of regions?
The problem we actually face is that if a region is bigger than the
available memory of the spark executor, it goes in OOM. Being able to
tune the number of regions, we might use a higher number of partitions
reducing the memory footprint of the processing (and also slowing it
down, i know :( ).
Thank you in advance
#A.M.