Hello,

 

Today I used SparkSession.read.format(“HBASETABLE”).options.(“zk”,”
zkaddress”).load() API to create a dataset from HBase data source and of
course I write code to extends BaseRelation and PrunedFilteredScan to
provide Logical plan for this HBase data source.

 

I use InputFormat to create my NewHadoopRDD, each partition of which maps to
a region of this HBase table. For each partition, I transform each HBase
Result into Row.

 

So I think the lineage of this RDD is quite clear,but when I try to take a
peek of this dataset by using dataset.first or dataset.count, there exists
an error.

 

This error tells me that this RDD does not have sc with this follow message:

private def sc: SparkContext = {
  if (_sc == null) {
    throw new SparkException(
      "RDD transformations and actions can only be invoked by the driver,
not inside of other " +
      "transformations; for example, rdd1.map(x => rdd2.values.count() * x)
is invalid because " +
      "the values transformation and count action cannot be performed inside
of the rdd1.map " +
      "transformation. For more information, see SPARK-5063.")
  }
  _sc
}

 

In addition, I submit my spark app to a standalone cluster AND it seems that
I did’t put transport/action inside of other transformation as the above
errer message said.

 

Does any one have some suggestion? Great thanks.

 

Chao,

Reply via email to