Folks, I have a time series table with each record being 350 columns.
the primary key is ((date, bucket), objectid, timestamp) objective is to read 1 day worth of data, which comes to around 12k partitions, each partition has around 25MB of data, I see only 1 task active during the read operation, on a 5 node cluster, (8 cores each ), does this mean not enough spark partitions are getting created ? i have also set the input.split.size_in_mb to a lower number. like 10 . Any pointers in this regard would be helpful., Thanks,