To get a node local read from Spark to Cassandra, one has to use a read consistency level of LOCAL_ONE. For some use cases, this is not an option. For example, if you need to use a read consistency level of LOCAL_QUORUM, as many use cases demand, then one is not going to get a node local read.
Also, to insure a node local read, one has to set spark.locality.wait to zero. Whether or not a partition will be streamed to another node or computed locally is dependent on the spark.locality.wait parameters. This parameter can be set to 0 to force all partitions to only be computed on local nodes. If you do some testing, please post your performance numbers.