[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-05-02 Thread Michael Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461775#comment-16461775
 ] 

Michael Jin commented on HBASE-20295:
-

new issue was created: HBASE-20521

[~mdrob], please correct description if it's not clear

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Assignee: Michael Jin
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-20295.branch-1.4.001.patch, 
> HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, 
> HBASE-20295.master.003.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't 

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-05-02 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461747#comment-16461747
 ] 

Ted Yu commented on HBASE-20295:


Please open new issue.

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Assignee: Michael Jin
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-20295.branch-1.4.001.patch, 
> HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, 
> HBASE-20295.master.003.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
>

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-05-02 Thread Michael Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461740#comment-16461740
 ] 

Michael Jin commented on HBASE-20295:
-

[~mdrob], just verified in local, it works from my side

[~yuzhih...@gmail.com], [~elserj], [~stack], fork a new issue or just add a new 
patch in this thread, how about your guys idea?

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Assignee: Michael Jin
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-20295.branch-1.4.001.patch, 
> HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, 
> HBASE-20295.master.003.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-05-02 Thread Michael Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461670#comment-16461670
 ] 

Michael Jin commented on HBASE-20295:
-

[~mdrob], sorry for the fails brought to you, I think checking sequence change 
is OK for me, I will verify my original case in local, please wait a while, I 
will update the verify result later.

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Assignee: Michael Jin
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-20295.branch-1.4.001.patch, 
> HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, 
> HBASE-20295.master.003.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = 

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-05-02 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461605#comment-16461605
 ] 

Mike Drob commented on HBASE-20295:
---

My proposed patch (against branch-2.0) would look like this:

{noformat}
diff --git 
a/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.java
 
b/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.java
index 0a1928b21f..cdf6ccd9b3 100644
--- 
a/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.java
+++ 
b/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.java
@@ -174,9 +174,9 @@ implements Configurable {
   @Override
   public void checkOutputSpecs(JobContext context) throws IOException,
   InterruptedException {
-Configuration hConf = context.getConfiguration();
+Configuration hConf = getConf();
 if(hConf == null) {
-  hConf = this.conf;
+  hConf = context.getConfiguration();
 }

 try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
{noformat}

But I don't know how to verify if this fix still works for the original spark 
use case.

Probably need to fork off into a new issue since this has been released already.

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Assignee: Michael Jin
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-20295.branch-1.4.001.patch, 
> HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, 
> HBASE-20295.master.003.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-05-02 Thread Mike Drob (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461602#comment-16461602
 ] 

Mike Drob commented on HBASE-20295:
---

[~tedyu], [~menjin], [~elserj], [~stack] - Question about this implementation, 
sorry for showing up a month late to the issue...

The patch adds:
{noformat}
Configuration hConf = context.getConfiguration();
if(hConf == null) {
  hConf = this.conf;
}
{noformat}

Are we doing this backwards? If the concern is that sometimes getConf() returns 
null, shouldn't we be using that first and checking it's output instead of 
pulling the conf from the configuration?

I'm having a pig script that fails against hbase-2.0 with this patch included, 
if I revert the patch then my pig works again.

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Assignee: Michael Jin
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-20295.branch-1.4.001.patch, 
> HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, 
> HBASE-20295.master.003.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix 

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-04-08 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430056#comment-16430056
 ] 

Hudson commented on HBASE-20295:


Results for branch HBASE-19064
[build #90 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-19064/90/]: 
(x) *{color:red}-1 overall{color}*

details (if available):

(x) {color:red}-1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-19064/90//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-19064/90//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-19064/90//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Assignee: Michael Jin
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-20295.branch-1.4.001.patch, 
> HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, 
> HBASE-20295.master.003.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> 

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-04-07 Thread Michael Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429569#comment-16429569
 ] 

Michael Jin commented on HBASE-20295:
-

correct, office 2.0 release plan

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Assignee: Michael Jin
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-20295.branch-1.4.001.patch, 
> HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, 
> HBASE-20295.master.003.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-04-07 Thread Michael Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429564#comment-16429564
 ] 

Michael Jin commented on HBASE-20295:
-

Thanks [~stack], [~yuzhih...@gmail.com], [~elserj], I am glad and exciting to 
contribute to HBase community, it's really a wonderful experience to me, 
anything else that I can help for please feel free to let me know,  BTW, would 
you please tell me the official release of 2.0, I cannot wait to try 
hbase-spark module.

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Assignee: Michael Jin
>Priority: Major
> Fix For: 2.0.0
>
> Attachments: HBASE-20295.branch-1.4.001.patch, 
> HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, 
> HBASE-20295.master.003.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin 

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-04-07 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429457#comment-16429457
 ] 

stack commented on HBASE-20295:
---

Thanks for kick.

Pushed to branch-2.0 and branch-2. Thanks for patch [~menjin]

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Assignee: Michael Jin
>Priority: Major
> Attachments: HBASE-20295.branch-1.4.001.patch, 
> HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, 
> HBASE-20295.master.003.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-04-06 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429043#comment-16429043
 ] 

Josh Elser commented on HBASE-20295:


{quote}
stack, might be interested in this one.

Best as I understand, this might affect our OutputFormat via Spark directly (as 
opposed to the hbase-spark integration). Seems pretty low risk to me.
{quote}

[~stack], gentle ping -- guessing this fell off your radar.

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Assignee: Michael Jin
>Priority: Major
> Attachments: HBASE-20295.branch-1.4.001.patch, 
> HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, 
> HBASE-20295.master.003.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName 

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-31 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421304#comment-16421304
 ] 

Hudson commented on HBASE-20295:


Results for branch master
[build #279 on 
builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/279/]: (x) 
*{color:red}-1 overall{color}*

details (if available):

(/) {color:green}+1 general checks{color}
-- For more information [see general 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/279//General_Nightly_Build_Report/]




(x) {color:red}-1 jdk8 hadoop2 checks{color}
-- For more information [see jdk8 (hadoop2) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/279//JDK8_Nightly_Build_Report_(Hadoop2)/]


(x) {color:red}-1 jdk8 hadoop3 checks{color}
-- For more information [see jdk8 (hadoop3) 
report|https://builds.apache.org/job/HBase%20Nightly/job/master/279//JDK8_Nightly_Build_Report_(Hadoop3)/]


(/) {color:green}+1 source release artifact{color}
-- See build output for details.


> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Assignee: Michael Jin
>Priority: Major
> Attachments: HBASE-20295.branch-1.4.001.patch, 
> HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, 
> HBASE-20295.master.003.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a 

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-30 Thread Michael Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420540#comment-16420540
 ] 

Michael Jin commented on HBASE-20295:
-

[~yuzhih...@gmail.com], COOL, Thanks!

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Assignee: Michael Jin
>Priority: Major
> Attachments: HBASE-20295.branch-1.4.001.patch, 
> HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, 
> HBASE-20295.master.003.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
>  

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-30 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420522#comment-16420522
 ] 

Ted Yu commented on HBASE-20295:


Pushed to master branch.

Waiting for Stack's decision on branch-2.0

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Assignee: Michael Jin
>Priority: Major
> Attachments: HBASE-20295.branch-1.4.001.patch, 
> HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, 
> HBASE-20295.master.003.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-30 Thread Michael Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420257#comment-16420257
 ] 

Michael Jin commented on HBASE-20295:
-

:D QA build is OK now.

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Assignee: Michael Jin
>Priority: Major
> Attachments: HBASE-20295.branch-1.4.001.patch, 
> HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, 
> HBASE-20295.master.003.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if 

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-30 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420255#comment-16420255
 ] 

Hadoop QA commented on HBASE-20295:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  3m 
37s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  8m 
53s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
40s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
21s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
44s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
40s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
34s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
25s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
23m  9s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 16m  
2s{color} | {color:green} hbase-mapreduce in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
17s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 66m 59s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:d8b550f |
| JIRA Issue | HBASE-20295 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12916965/HBASE-20295.master.003.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 889508d9af06 4.4.0-98-generic #121-Ubuntu SMP Tue Oct 10 
14:24:03 UTC 2017 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / 522b8075f3 |
| maven | version: Apache Maven 3.5.3 
(3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC3 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/12232/testReport/ |
| Max. process+thread count | 4586 (vs. ulimit of 1) |
| modules | C: hbase-mapreduce U: hbase-mapreduce |
| Console output | 
https://builds.apache.org/job/PreCommit-HBASE-Build/12232/console |
| 

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-29 Thread Josh Elser (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419771#comment-16419771
 ] 

Josh Elser commented on HBASE-20295:


[~stack], might be interested in this one.

Best as I understand, this might affect our OutputFormat via Spark directly (as 
opposed to the hbase-spark integration). Seems pretty low risk to me.

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Assignee: Michael Jin
>Priority: Major
> Attachments: HBASE-20295.branch-1.4.001.patch, 
> HBASE-20295.master.001.patch, HBASE-20295.master.002.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw 

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-29 Thread Michael Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419218#comment-16419218
 ] 

Michael Jin commented on HBASE-20295:
-

[~yuzhih...@gmail.com], I am watching BASE-20314 now, waiting for Sean Busbey's 
reply. Thanks!

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Assignee: Michael Jin
>Priority: Major
> Attachments: HBASE-20295.branch-1.4.001.patch, 
> HBASE-20295.master.001.patch, HBASE-20295.master.002.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-29 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419150#comment-16419150
 ] 

Ted Yu commented on HBASE-20295:


Michael:
Please watch HBASE-20314

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Assignee: Michael Jin
>Priority: Major
> Attachments: HBASE-20295.branch-1.4.001.patch, 
> HBASE-20295.master.001.patch, HBASE-20295.master.002.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-29 Thread Michael Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418734#comment-16418734
 ] 

Michael Jin commented on HBASE-20295:
-

[~yuzhih...@gmail.com], I've checked both jekins job console and UT build log, 
the only suspicious error that I think it might affect build is:  "*09:20:49* 
Can not write to /root/.m2/copy_reference_file.log. Wrong volume permissions? 
Carrying on ..." in 
[https://builds.apache.org/job/PreCommit-HBASE-Build/12202/console] but I don't 
think it is the root cause,  for exception of "Caused by: 
org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM 
terminated without properly saying goodbye. VM crash or System.exit called?" in 
UT log, according FAQ of surefire: 
[http://maven.apache.org/surefire/maven-surefire-plugin/faq.html] , either 
System.exit is called or critical crash is happened will cause this exception, 
I've searched all System.exit call under hbase-mapreduce module, no calls in 
all test cases, and also no "TableOutputFormat.checkOutputSpecs" is called in 
any test case.

No idea why this exception happens, anyone else can help?

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Assignee: Michael Jin
>Priority: Major
> Attachments: HBASE-20295.branch-1.4.001.patch, 
> HBASE-20295.master.001.patch, HBASE-20295.master.002.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create 

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-29 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418683#comment-16418683
 ] 

Hadoop QA commented on HBASE-20295:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
8s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
36s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
28s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
 5s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
40s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
17s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
54s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
20m 18s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
16s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  1m  2s{color} 
| {color:red} hbase-mapreduce in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
 8s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 39m 48s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:d8b550f |
| JIRA Issue | HBASE-20295 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12916778/HBASE-20295.master.002.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux c302a64ffe29 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / d8b550fabc |
| maven | version: Apache Maven 3.5.3 
(3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC3 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/12202/artifact/patchprocess/patch-unit-hbase-mapreduce.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/12202/testReport/ |
| modules | C: hbase-mapreduce U: hbase-mapreduce |
| Console output | 

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-29 Thread Michael Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418649#comment-16418649
 ] 

Michael Jin commented on HBASE-20295:
-

[~yuzhih...@gmail.com], thanks for your suggestion, I attached patch again, is 
qa run automatically triggered when a new patch is attached?

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Assignee: Michael Jin
>Priority: Major
> Attachments: HBASE-20295.branch-1.4.001.patch, 
> HBASE-20295.master.001.patch, HBASE-20295.master.002.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does 

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-29 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418636#comment-16418636
 ] 

Ted Yu commented on HBASE-20295:


You can attach master patch again to trigger qa run.

Thanks

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Assignee: Michael Jin
>Priority: Major
> Attachments: HBASE-20295.branch-1.4.001.patch, 
> HBASE-20295.master.001.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>  

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-28 Thread Michael Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418465#comment-16418465
 ] 

Michael Jin commented on HBASE-20295:
-

[~uagashe], [~yuzhih...@gmail.com], would you please help to launch a build 
again? I think UT error may caused by Dock.

Thanks

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Assignee: Michael Jin
>Priority: Major
> Attachments: HBASE-20295.branch-1.4.001.patch, 
> HBASE-20295.master.001.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-28 Thread Michael Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418286#comment-16418286
 ] 

Michael Jin commented on HBASE-20295:
-

{code:java}
INFO] Running org.apache.hadoop.hbase.snapshot.TestExportSnapshotNoCluster
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.171 s 
- in org.apache.hadoop.hbase.snapshot.TestExportSnapshotNoCluster
[INFO] Running 
org.apache.hadoop.hbase.mapred.TestTableOutputFormatConnectionExhaust
[INFO] Running org.apache.hadoop.hbase.mapreduce.TestWALRecordReader
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.145 s 
- in org.apache.hadoop.hbase.mapred.TestTableOutputFormatConnectionExhaust
[INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.747 s 
- in org.apache.hadoop.hbase.mapreduce.TestWALRecordReader
[INFO] Running org.apache.hadoop.hbase.mapreduce.TestHRegionPartitioner
[INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.96 s - 
in org.apache.hadoop.hbase.mapreduce.TestHRegionPartitioner
[INFO] Running org.apache.hadoop.hbase.mapreduce.TestImportExport
[INFO] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 63.956 
s - in org.apache.hadoop.hbase.mapreduce.TestImportExport
[INFO] 
[INFO] Results:
[INFO] 
[INFO] Tests run: 19, Failures: 0, Errors: 0, Skipped: 0

{code}
All unit test passed in local

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Assignee: Michael Jin
>Priority: Major
> Attachments: HBASE-20295.branch-1.4.001.patch, 
> HBASE-20295.master.001.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> 

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-28 Thread Michael Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418264#comment-16418264
 ] 

Michael Jin commented on HBASE-20295:
-

{code:java}
// code placeholder
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 1
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:496)
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:443)
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:295)
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246)
[ERROR] at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1124)
[ERROR] at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:954)
[ERROR] at 
org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:832)
[ERROR] at 
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:137)
[ERROR] at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
[ERROR] at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154)
[ERROR] at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146)
[ERROR] at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117)
[ERROR] at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81)
[ERROR] at 
org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:56)
[ERROR] at 
org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128)
[ERROR] at 
org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:305)
[ERROR] at 
org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:192)
[ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:105)
[ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:956)
[ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:290)
[ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:194)
[ERROR] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[ERROR] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
[ERROR] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[ERROR] at java.lang.reflect.Method.invoke(Method.java:498)
[ERROR] at 
org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
[ERROR] at 
org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
[ERROR] at 
org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
[ERROR] at 
org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
[ERROR] Caused by: 
org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM 
terminated without properly saying goodbye. VM crash or System.exit called?
[ERROR] Command was /bin/sh -c cd /testptch/hbase/hbase-mapreduce && 
/usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -enableassertions 
-Dhbase.build.id=2018-03-28T23:51:24Z -Xmx2800m 
-Djava.security.egd=file:/dev/./urandom -Djava.net.preferIPv4Stack=true 
-Djava.awt.headless=true -jar 
/testptch/hbase/hbase-mapreduce/target/surefire/surefirebooter8410814171354767862.jar
 /testptch/hbase/hbase-mapreduce/target/surefire 
2018-03-28T23-51-40_553-jvmRun1 surefire7620982142840600885tmp 
surefire_521935780032859030481tmp
[ERROR] Error occurred in starting fork, check output in log
[ERROR] Process Exit Code: 1
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:686)
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:535)
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$700(ForkStarter.java:116)
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:431)
[ERROR] at 
org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:408)
[ERROR] at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[ERROR] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[ERROR] at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[ERROR] at java.lang.Thread.run(Thread.java:748)
[ERROR] 
{code}
This error 

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418258#comment-16418258
 ] 

Hadoop QA commented on HBASE-20295:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m  
1s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
19s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
33s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
18s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  5m 
 8s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
35s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
18s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  4m 
40s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  4m 
49s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 
19m 55s{color} | {color:green} Patch does not cause any errors with Hadoop 
2.6.5 2.7.4 or 3.0.0. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  0m 
42s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 59s{color} 
| {color:red} hbase-mapreduce in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
 9s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 41m 37s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:d8b550f |
| JIRA Issue | HBASE-20295 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12916702/HBASE-20295.master.001.patch
 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  shadedjars  
hadoopcheck  hbaseanti  checkstyle  compile  |
| uname | Linux 9a690f291054 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 
14:43:09 UTC 2018 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh
 |
| git revision | master / d8b550fabc |
| maven | version: Apache Maven 3.5.3 
(3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) |
| Default Java | 1.8.0_162 |
| findbugs | v3.1.0-RC3 |
| unit | 
https://builds.apache.org/job/PreCommit-HBASE-Build/12195/artifact/patchprocess/patch-unit-hbase-mapreduce.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HBASE-Build/12195/testReport/ |
| modules | C: hbase-mapreduce U: hbase-mapreduce |
| Console output | 

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-28 Thread Umesh Agashe (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418236#comment-16418236
 ] 

Umesh Agashe commented on HBASE-20295:
--

+1, lgtm

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Assignee: Michael Jin
>Priority: Major
> Attachments: HBASE-20295.branch-1.4.001.patch, 
> HBASE-20295.master.001.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new 

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-28 Thread Michael Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418227#comment-16418227
 ] 

Michael Jin commented on HBASE-20295:
-

[~yuzhih...@gmail.com], patch for master branch is attached, please check.

Thanks

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Assignee: Michael Jin
>Priority: Major
> Attachments: HBASE-20295.branch-1.4.001.patch, 
> HBASE-20295.master.001.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if 

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-28 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418016#comment-16418016
 ] 

Ted Yu commented on HBASE-20295:


It seems compilation error was not related to the patch:
{code}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile (default-compile) 
on project hbase-server: Execution default-compile of goal 
org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile failed: Plugin 
org.apache.maven.plugins:maven-compiler-plugin:3.6.1 or one of its dependencies 
could not be resolved: Failure to find 
org.apache.hbase:hbase-error-prone:jar:1.4.3 in 
http://repository.apache.org/snapshots was cached in the local repository, 
resolution will not be reattempted until the update interval of Nexus has 
elapsed or updates are forced -> [Help 1]
{code}
Michael:
Please attach patch for master branch.

Thanks

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Assignee: Michael Jin
>Priority: Major
> Attachments: HBASE-20295.branch-1.4.001.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix 

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418011#comment-16418011
 ] 

Hadoop QA commented on HBASE-20295:
---

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
21s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green}  0m  
0s{color} | {color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} branch-1.4 Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  7m 
59s{color} | {color:green} branch-1.4 passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
20s{color} | {color:red} hbase-server in branch-1.4 failed with JDK v1.8.0_163. 
{color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
16s{color} | {color:red} hbase-server in branch-1.4 failed with JDK v1.7.0_171. 
{color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
20s{color} | {color:green} branch-1.4 passed {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  3m 
58s{color} | {color:green} branch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
32s{color} | {color:green} branch-1.4 passed with JDK v1.8.0_163 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
34s{color} | {color:green} branch-1.4 passed with JDK v1.7.0_171 {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
33s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
13s{color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_163. 
{color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 13s{color} 
| {color:red} hbase-server in the patch failed with JDK v1.8.0_163. {color} |
| {color:red}-1{color} | {color:red} compile {color} | {color:red}  0m 
16s{color} | {color:red} hbase-server in the patch failed with JDK v1.7.0_171. 
{color} |
| {color:red}-1{color} | {color:red} javac {color} | {color:red}  0m 16s{color} 
| {color:red} hbase-server in the patch failed with JDK v1.7.0_171. {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
17s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedjars {color} | {color:green}  2m 
35s{color} | {color:green} patch has no errors when building our shaded 
downstream artifacts. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green}  
8m 42s{color} | {color:green} Patch does not cause any errors with Hadoop 2.4.1 
2.5.2 2.6.5 2.7.4. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
25s{color} | {color:green} the patch passed with JDK v1.8.0_163 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed with JDK v1.7.0_171 {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green}103m  
6s{color} | {color:green} hbase-server in the patch passed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
22s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}130m 46s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:74e3133 |
| JIRA Issue | HBASE-20295 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12916549/HBASE-20295.branch-1.4.001.patch
 |
| Optional Tests |  asflicense  javac  

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-28 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16417811#comment-16417811
 ] 

Ted Yu commented on HBASE-20295:


Ran the following tests with patch ported to master branch:
{code}
Running org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat2
Tests run: 18, Failures: 0, Errors: 0, Skipped: 14, Time elapsed: 97.535 sec - 
in org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat2
Running org.apache.hadoop.hbase.mapred.TestTableOutputFormatConnectionExhaust
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 34.129 sec - in 
org.apache.hadoop.hbase.mapred.TestTableOutputFormatConnectionExhaust
{code}
+1 from my side.

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Assignee: Michael Jin
>Priority: Major
> Attachments: HBASE-20295.branch-1.4.001.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = 

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-28 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16417759#comment-16417759
 ] 

Ted Yu commented on HBASE-20295:


Michael:
When you come up with patch for master branch, note the location for the class:
./hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.java

Thanks

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: mapreduce
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Priority: Major
> Attachments: HBASE-20295.branch-1.4.001.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
>   

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-28 Thread Michael Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416923#comment-16416923
 ] 

Michael Jin commented on HBASE-20295:
-

[~yuzhih...@gmail.com], thanks for your suggestion, I created a review board 
account and use submit-patch.py again, view request was created: 
[https://reviews.apache.org/r/66328/]

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Priority: Major
> Attachments: HBASE-20295.branch-1.4.001.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-28 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416914#comment-16416914
 ] 

Ted Yu commented on HBASE-20295:


[~apurtell]:
Can you take a look at the patch ?

Thanks

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-28 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416913#comment-16416913
 ] 

Ted Yu commented on HBASE-20295:


There is description in the refguide on reviewboard :
http://hbase.apache.org/book.html#reviewboard

Since the patch is small, review board is good to have but not required.

Normally we start with master branch first.

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if 

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-27 Thread Michael Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416848#comment-16416848
 ] 

Michael Jin commented on HBASE-20295:
-

Hi [~yuzhih...@gmail.com],

I attached patch file to this post by using submit-patch.py, but no idea about 
review board user name(I googled review board but not find a solution), so I 
use "-srb" to skip create/update review board, if the option is mandatory, 
please kindly let me know how can register/add a review boar user

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Priority: Major
> Attachments: HBASE-20295.branch-1.4.003.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
> if 

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-27 Thread Michael Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416583#comment-16416583
 ] 

Michael Jin commented on HBASE-20295:
-

[~yuzhih...@gmail.com], I am working on it, sorry for my mistake.

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-27 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416539#comment-16416539
 ] 

Ted Yu commented on HBASE-20295:


Have you read the comment I mentioned ?

Why not use ubmit-patch.sh ?

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-27 Thread Michael Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416521#comment-16416521
 ] 

Michael Jin commented on HBASE-20295:
-

[~yuzhih...@gmail.com],

I've make a patch according your post, but the patch is very large, 78.5M, is 
there any wrong with my command?

 

michael@menjin:~/MJ/Projects/hbase/hbase$ git checkout branch-1.4
Branch branch-1.4 set up to track remote branch branch-1.4 from origin.
Switched to a new branch 'branch-1.4'
michael@menjin:~/MJ/Projects/hbase/hbase$ git checkout -B HBASE-20295
Switched to a new branch 'HBASE-20295'
michael@menjin:~/MJ/Projects/hbase/hbase$ git add . 
michael@menjin:~/MJ/Projects/hbase/hbase$ git status
On branch HBASE-20295
Changes to be committed:
 (use "git reset HEAD ..." to unstage)

modified: 
hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.java

michael@menjin:~/MJ/Projects/hbase/hbase$ git commit -a -m "HBASE-20295 fix 
NullPointException in TableOutputFormat.checkOutputSpecs" 
[HBASE-20295 d49fb97b1e] HBASE-20295 fix NullPointException in 
TableOutputFormat.checkOutputSpecs
 1 file changed, 5 insertions(+), 2 deletions(-)
michael@menjin:~/MJ/Projects/hbase/hbase$ dev-support/make_patch.sh 
git_dirty is 0
Patch directory not specified. Falling back to ~/patches/.
/home/michael/patches does not exist. Creating it.
3503 commits exist only in your local branch. Interactive rebase?
Creating patch /home/michael/patches/HBASE-20295.patch using git format-patch

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> 

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-27 Thread Michael Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416501#comment-16416501
 ] 

Michael Jin commented on HBASE-20295:
-

Got it, thanks!

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
> In hbase 

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-27 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416499#comment-16416499
 ] 

Ted Yu commented on HBASE-20295:


See this comment:

https://issues.apache.org/jira/browse/HBASE-19985?focusedCommentId=16384696=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16384696

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new 

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-27 Thread Michael Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416493#comment-16416493
 ] 

Michael Jin commented on HBASE-20295:
-

[~yuzhih...@gmail.com], I am glad to fix this issue, but never fix bug or 
commit code to HBase before, would you please give me some guideline or brief 
introduction how to attach a patch?

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if 

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-27 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416483#comment-16416483
 ] 

Ted Yu commented on HBASE-20295:


Can you attach patch to this issue ?

Thanks

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> 

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-27 Thread Michael Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416477#comment-16416477
 ] 

Michael Jin commented on HBASE-20295:
-

[~yuzhih...@gmail.com], I've verified in local, it works

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-27 Thread Michael Jin (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416471#comment-16416471
 ] 

Michael Jin commented on HBASE-20295:
-

Hi Ted,

Sorry for reply you late, I've verified in local, it works. 

Thanks

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   

[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception

2018-03-27 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415759#comment-16415759
 ] 

Ted Yu commented on HBASE-20295:


Have you verified that the Spark job can continue with the proposed fix ?

thanks

> TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
> ---
>
> Key: HBASE-20295
> URL: https://issues.apache.org/jira/browse/HBASE-20295
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 1.4.0
> Environment: Spark 2.2.1, HBase 1.4.0
>Reporter: Michael Jin
>Priority: Major
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I am using spark write data to HBase by using RDD.
> saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when 
> update my hbase dependency to 1.4.0 in pom.xml, it throw 
> java.lang.NullPointerException, it is caused by a logic error in 
> TableOutputFormat.checkOutputSpecs function, please check below details:
> first let's take a look at SparkHadoopMapReduceWriter.write function in 
> SparkHadoopMapReduceWriter.scala
> {code:java}
> // SparkHadoopMapReduceWriter.write 
> (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala)
> def write[K, V: ClassTag](
> rdd: RDD[(K, V)],
> hadoopConf: Configuration): Unit = {
>   // Extract context and configuration from RDD.
>   val sparkContext = rdd.context
>   val stageId = rdd.id
>   val sparkConf = rdd.conf
>   val conf = new SerializableConfiguration(hadoopConf)
>   // Set up a job.
>   val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date())
>   val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, 
> 0, 0)
>   val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId)
>   val format = jobContext.getOutputFormatClass
>   if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) {
> // FileOutputFormat ignores the filesystem parameter
> val jobFormat = format.newInstance
> jobFormat.checkOutputSpecs(jobContext)
>   }
>   val committer = FileCommitProtocol.instantiate(
> className = classOf[HadoopMapReduceCommitProtocol].getName,
> jobId = stageId.toString,
> outputPath = 
> conf.value.get("mapreduce.output.fileoutputformat.outputdir"),
> isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol]
>   committer.setupJob(jobContext)
> ...{code}
> in "write" function if output spec validation is enabled, it will call 
> checkOutputSpec function in TableOutputFormat class, but the job format is 
> simply created by "vall jobFormat = format.newInstance", this will NOT 
> initialize "conf" member variable in TableOutputFormat class, let's continue 
> check checkOutputSpecs function in TableOutputFormat class
>  
> {code:java}
> // TableOutputFormat.checkOutputSpecs 
> (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   try (Admin admin = 
> ConnectionFactory.createConnection(getConf()).getAdmin()) {
> TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>   tableName.getNameAsString());
> }
>   }
> }
> {code}
>  
> "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" 
> class member is not initialized, so getConf() will return null, so in the 
> next UserProvider create instance process, it throw the 
> NullPointException(Please part of stack trace at the end), it is a little 
> confused that, context passed by function parameter is actually been properly 
> constructed, and it contains Configuration object, why context is never used? 
> So I suggest to use below code to partly fix this issue:
>  
> {code:java}
> // code placeholder
> @Override
> public void checkOutputSpecs(JobContext context) throws IOException,
> InterruptedException {
>   Configuration hConf = context.getConfiguration();
>   if(hConf == null)
> hConf = this.conf;
>   try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) {
> TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
> if (!admin.tableExists(tableName)) {
>   throw new TableNotFoundException("Can't write, table does not exist:" +
>   tableName.getNameAsString());
> }
> if (!admin.isTableEnabled(tableName)) {
>   throw new TableNotEnabledException("Can't write, table is not enabled: 
> " +
>