I have a time series table with each record being 350 columns.
the primary key is ((date, bucket), objectid, timestamp)
objective is to read 1 day worth of data, which comes to around 12k
partitions, each partition has around 25MB of data,
I see only 1 task active during the read operation, on a 5 node cluster, (8
cores each ), does this mean not enough spark partitions are getting
i have also set the input.split.size_in_mb to a lower number. like 10 .
Any pointers in this regard would be helpful.,