I guess what you are trying to do is get a columnar projection on your data, sparksql maybe a good option for you (especially if your data is sparse & good for columnar projection). If you are looking to work with simple key value then you are better off using Hbase input reader in hadoopIO & get a pairRDD.
Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @mayur_rustagi <https://twitter.com/mayur_rustagi> On Thu, May 8, 2014 at 10:51 AM, Debasish Das <debasish.da...@gmail.com>wrote: > Hi, > > For each line that we read as textLine from HDFS, we have a schema..if > there is an API that takes the schema as List[Symbol] and maps each token > to the Symbol it will be helpful... > > One solution is to keep data on hdfs as avro/protobuf serialized objects > but not sure if that works on HBase input...we are testing HDFS right now > but finally we will read from a persistent store like hbase...so basically > the immutableBytes need to be converted to a schema view as well incase we > don't want to write the whole row as a protobuf... > > Does RDDs provide a schema view of the dataset on HDFS / HBase ? > > Thanks. > Deb > >