Hello, When doing analytics is Spark, a common pattern is to load either the whole table into memory or filter on some columns. This is a good pattern for column-oriented files (Parquet) but seems to be a huge anti-pattern in C*. Most common spark operations will result in one of (a) query without a partition key (full table scan), (b) filter on a non-clustering key. A naive implementation of the above will result in all SSTables being read from disk multiple times in random order (for different keys) resulting in horrible cache performance.
Does the DataStax driver do any smart tricks to optimize for this? Cheers, Eugene