If you operate directly on a Result you only get the latest version of each cell. To get older versions of cells you have a few options:
1) Result::getFamilyMap, if you only want versioned cells from a single family - http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html#getFamilyMap-byte:A- Result (Apache HBase 2.0.0-SNAPSHOT API)<http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html#getFamilyMap-byte:A-> hbase.apache.org @InterfaceAudience.Public @InterfaceStability.Stable public class Result extends Object implements org.apache.hadoop.hbase.CellScannable, org.apache.hadoop.hbase ... 2) Result::getMap - If you need versioned cells from all families - http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html#getMap-- Result (Apache HBase 2.0.0-SNAPSHOT API)<http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html#getMap--> hbase.apache.org @InterfaceAudience.Public @InterfaceStability.Stable public class Result extends Object implements org.apache.hadoop.hbase.CellScannable, org.apache.hadoop.hbase ... 3) Get a cell scanner from Result::cellScanner - http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html#cellScanner-- Result (Apache HBase 2.0.0-SNAPSHOT API)<http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Result.html#cellScanner--> hbase.apache.org @InterfaceAudience.Public @InterfaceStability.Stable public class Result extends Object implements org.apache.hadoop.hbase.CellScannable, org.apache.hadoop.hbase ... So once you have your rows, add another mapping function using one of the methods above to get multi-version rows. https://richardstartin.com/ ________________________________ From: Abir Chokraborty <abir.chokrabort...@gmail.com> Sent: 25 February 2017 07:38 To: user@hbase.apache.org Subject: Reading data for a particular column-cell with 2 or more values of a same row-key HBase table contains the following: ROW COLUMN+CELL Product01 column=cf:ProductFeature, timestamp=1487917201238,value= Feature01 Product01 column=cf:ProductFeature, timestamp=1487917201239,value= Feature02 Product01 column=cf:ProductFeature, timestamp=1487917201240,value= Feature03 Product01 column=cf:Price, timestamp=1487917201242,value=\x012A\xF8 Product01 column=cf:Location, timestamp=1487917201244,value= Texas Here VERSIONS is 3. So it is keeping 3 different values for ProductFeature column. I wrote the following to create an RDD val hbaseRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat], classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable], classOf[org.apache.hadoop.hbase.client.Result]) val resultRDD = hbaseRDD.map(tuple => tuple._2) val testRDD = resultRDD.map(Row.parseRow) val testDF = testRDD.toDF() Here, parseRow is a method that returns tuple of (ROW,ProductFeature,Price,Location). I am only getting +----------------+----------------+---------+---------+ | Row| ProductFeature| Price| Location| +----------------+----------------+---------+---------+ | Product01| Feature03| 65| Texas| +----------------+----------------+---------+---------+ Where do I have to change in the code so that I can create DataFrame for different values of ProductFeature like the following: +----------------+----------------+---------+---------+ | Row| ProductFeature| Price| Location| +----------------+----------------+---------+---------+ | Product01| Feature01| 65| Texas| +----------------+----------------+---------+---------+ | Product01| Feature02| 65| Texas| +----------------+----------------+---------+---------+ | Product01| Feature03| 65| Texas| +----------------+----------------+---------+---------+ -- View this message in context: http://apache-hbase.679495.n3.nabble.com/Reading-data-for-a-particular-column-cell-with-2-or-more-values-of-a-same-row-key-tp4086420.html Sent from the HBase User mailing list archive at Nabble.com.