Hello Phoenix Usergroup I have a query on a table about 170Mio strong selecting out around 700k. The query retrieves row-key fields, fields covered by the index as well as one only occurring in the table itself. We also use index-hinting. This works very quickly when using SQLLine and dumping the results to a file (46s). However writing the same query in Spark and materializing the result in driver memory it takes much longer (10min). I suspect the issue is the index-hinting but I cannot find out how to get Spark to use the correct index. Does anyone know how to do that?
Looking at IO usage and HBase overview I suspect the Spark approach leads to a complete tablescan. The HBase readrate and IO rate on Disk at least would seem like it. Best Regards Dominic Egger