Hello,
Today I used SparkSession.read.format(“HBASETABLE”).options.(“zk”,” zkaddress”).load() API to create a dataset from HBase data source and of course I write code to extends BaseRelation and PrunedFilteredScan to provide Logical plan for this HBase data source. I use InputFormat to create my NewHadoopRDD, each partition of which maps to a region of this HBase table. For each partition, I transform each HBase Result into Row. So I think the lineage of this RDD is quite clear,but when I try to take a peek of this dataset by using dataset.first or dataset.count, there exists an error. This error tells me that this RDD does not have sc with this follow message: private def sc: SparkContext = { if (_sc == null) { throw new SparkException( "RDD transformations and actions can only be invoked by the driver, not inside of other " + "transformations; for example, rdd1.map(x => rdd2.values.count() * x) is invalid because " + "the values transformation and count action cannot be performed inside of the rdd1.map " + "transformation. For more information, see SPARK-5063.") } _sc } In addition, I submit my spark app to a standalone cluster AND it seems that I did’t put transport/action inside of other transformation as the above errer message said. Does any one have some suggestion? Great thanks. Chao,