Hi, In my spark job, I need to scan HBase table. I set up a scan with custom filters. Then I use
newAPIHadoopRDD function to get a JavaPairRDD variable X. The problem is when no records inside HBase matches my filters, the call X.isEmpty() or X.count() will cause a java.lang.NullPointerException. Part of trace is here: Caused by: java.lang.RuntimeException: java.lang.NullPointerException at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:208) at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320) at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:295) at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160) at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:155) at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:821) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:193) at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:89) at org.apache.hadoop.hbase.client.MetaScanner.allTableRegions(MetaScanner.java:324) at org.apache.hadoop.hbase.client.HRegionLocator.getAllRegionLocations(HRegionLocator.java:88) at org.apache.hadoop.hbase.util.RegionSizeCalculator.init(RegionSizeCalculator.java:94) at org.apache.hadoop.hbase.util.RegionSizeCalculator.<init>(RegionSizeCalculator.java:81) at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:256) at org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:125) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.rdd.RDD.partitions(RDD.scala:250) at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply$mcZ$sp(RDD.scala:1461) at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1461) at org.apache.spark.rdd.RDD$$anonfun$isEmpty$1.apply(RDD.scala:1461) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) at org.apache.spark.rdd.RDD.isEmpty(RDD.scala:1460) at org.apache.spark.api.java.JavaRDDLike$class.isEmpty(JavaRDDLike.scala:544) at org.apache.spark.api.java.AbstractJavaRDDLike.isEmpty(JavaRDDLike.scala:45) ... I just override setConf() function to setSacn(). Please see if this is a bug of spark or issue of my code. Thanks, Huiliang