Write to Disk SQLLine vs Spark with secondary indexing

Dominic Egger Mon, 26 Mar 2018 05:51:07 -0700

Hello Phoenix Usergroup
I have a query on a table about 170Mio strong selecting out around 700k.
The query retrieves row-key fields, fields covered by the index as well as
one only occurring in the table itself. We also use index-hinting. This
works very quickly when using SQLLine and dumping the results to a file
(46s). However writing the same query in Spark and materializing the result
in driver memory it takes much longer (10min). I suspect the issue is the
index-hinting but I cannot find out how to get Spark to use the correct
index. Does anyone know how to do that?


Looking at IO usage and HBase overview I suspect the Spark approach leads
to a complete tablescan. The HBase readrate and IO rate on Disk at least
would seem like it.

Best Regards
Dominic Egger

Write to Disk SQLLine vs Spark with secondary indexing

Reply via email to