Hi I want to join a Spark RDD with an HBase table. Im familiar with the different connectors available but couldn't find this functionality.
The idea I have is to first sort the RDD according to a byte[] key [1] and rdd.mapPartitions so that I each partition contains a unique and sequentially sorted range of keys that lines up with the key order in HBase. I should mention that the RDD will always contain almost all the keys that are stored in HBase, so full tables scans are fine. Unfortunately, Spark cannot sort native Java byte[]. And i'm also not sure if mapPartitions really maintain the total sort order of the original RDD. Any suggestions? Cheers, -Kristoffer [1] Guava UnsignedBytes.lexicographicalComparator
