[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461775#comment-16461775 ] Michael Jin commented on HBASE-20295: - new issue was created: HBASE-20521 [~mdrob], please correct description if it's not clear > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Assignee: Michael Jin >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-20295.branch-1.4.001.patch, > HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, > HBASE-20295.master.003.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461747#comment-16461747 ] Ted Yu commented on HBASE-20295: Please open new issue. > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Assignee: Michael Jin >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-20295.branch-1.4.001.patch, > HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, > HBASE-20295.master.003.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); >
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461740#comment-16461740 ] Michael Jin commented on HBASE-20295: - [~mdrob], just verified in local, it works from my side [~yuzhih...@gmail.com], [~elserj], [~stack], fork a new issue or just add a new patch in this thread, how about your guys idea? > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Assignee: Michael Jin >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-20295.branch-1.4.001.patch, > HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, > HBASE-20295.master.003.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE));
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461670#comment-16461670 ] Michael Jin commented on HBASE-20295: - [~mdrob], sorry for the fails brought to you, I think checking sequence change is OK for me, I will verify my original case in local, please wait a while, I will update the verify result later. > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Assignee: Michael Jin >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-20295.branch-1.4.001.patch, > HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, > HBASE-20295.master.003.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName =
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461605#comment-16461605 ] Mike Drob commented on HBASE-20295: --- My proposed patch (against branch-2.0) would look like this: {noformat} diff --git a/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.java b/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.java index 0a1928b21f..cdf6ccd9b3 100644 --- a/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.java +++ b/hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.java @@ -174,9 +174,9 @@ implements Configurable { @Override public void checkOutputSpecs(JobContext context) throws IOException, InterruptedException { -Configuration hConf = context.getConfiguration(); +Configuration hConf = getConf(); if(hConf == null) { - hConf = this.conf; + hConf = context.getConfiguration(); } try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { {noformat} But I don't know how to verify if this fix still works for the original spark use case. Probably need to fork off into a new issue since this has been released already. > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Assignee: Michael Jin >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-20295.branch-1.4.001.patch, > HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, > HBASE-20295.master.003.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} >
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461602#comment-16461602 ] Mike Drob commented on HBASE-20295: --- [~tedyu], [~menjin], [~elserj], [~stack] - Question about this implementation, sorry for showing up a month late to the issue... The patch adds: {noformat} Configuration hConf = context.getConfiguration(); if(hConf == null) { hConf = this.conf; } {noformat} Are we doing this backwards? If the concern is that sometimes getConf() returns null, shouldn't we be using that first and checking it's output instead of pulling the conf from the configuration? I'm having a pig script that fails against hbase-2.0 with this patch included, if I revert the patch then my pig works again. > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Assignee: Michael Jin >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-20295.branch-1.4.001.patch, > HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, > HBASE-20295.master.003.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16430056#comment-16430056 ] Hudson commented on HBASE-20295: Results for branch HBASE-19064 [build #90 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-19064/90/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-19064/90//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-19064/90//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-19064/90//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Assignee: Michael Jin >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-20295.branch-1.4.001.patch, > HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, > HBASE-20295.master.003.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the >
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429569#comment-16429569 ] Michael Jin commented on HBASE-20295: - correct, office 2.0 release plan > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Assignee: Michael Jin >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-20295.branch-1.4.001.patch, > HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, > HBASE-20295.master.003.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + >
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429564#comment-16429564 ] Michael Jin commented on HBASE-20295: - Thanks [~stack], [~yuzhih...@gmail.com], [~elserj], I am glad and exciting to contribute to HBase community, it's really a wonderful experience to me, anything else that I can help for please feel free to let me know, BTW, would you please tell me the official release of 2.0, I cannot wait to try hbase-spark module. > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Assignee: Michael Jin >Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-20295.branch-1.4.001.patch, > HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, > HBASE-20295.master.003.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429457#comment-16429457 ] stack commented on HBASE-20295: --- Thanks for kick. Pushed to branch-2.0 and branch-2. Thanks for patch [~menjin] > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Assignee: Michael Jin >Priority: Major > Attachments: HBASE-20295.branch-1.4.001.patch, > HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, > HBASE-20295.master.003.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + >
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16429043#comment-16429043 ] Josh Elser commented on HBASE-20295: {quote} stack, might be interested in this one. Best as I understand, this might affect our OutputFormat via Spark directly (as opposed to the hbase-spark integration). Seems pretty low risk to me. {quote} [~stack], gentle ping -- guessing this fell off your radar. > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Assignee: Michael Jin >Priority: Major > Attachments: HBASE-20295.branch-1.4.001.patch, > HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, > HBASE-20295.master.003.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421304#comment-16421304 ] Hudson commented on HBASE-20295: Results for branch master [build #279 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/279/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/279//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/279//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/279//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Assignee: Michael Jin >Priority: Major > Attachments: HBASE-20295.branch-1.4.001.patch, > HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, > HBASE-20295.master.003.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420540#comment-16420540 ] Michael Jin commented on HBASE-20295: - [~yuzhih...@gmail.com], COOL, Thanks! > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Assignee: Michael Jin >Priority: Major > Attachments: HBASE-20295.branch-1.4.001.patch, > HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, > HBASE-20295.master.003.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } >
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420522#comment-16420522 ] Ted Yu commented on HBASE-20295: Pushed to master branch. Waiting for Stack's decision on branch-2.0 > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Assignee: Michael Jin >Priority: Major > Attachments: HBASE-20295.branch-1.4.001.patch, > HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, > HBASE-20295.master.003.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + >
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420257#comment-16420257 ] Michael Jin commented on HBASE-20295: - :D QA build is OK now. > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Assignee: Michael Jin >Priority: Major > Attachments: HBASE-20295.branch-1.4.001.patch, > HBASE-20295.master.001.patch, HBASE-20295.master.002.patch, > HBASE-20295.master.003.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420255#comment-16420255 ] Hadoop QA commented on HBASE-20295: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 3m 37s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 53s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 40s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 21s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 44s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 40s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 13s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 19s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 25s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 23m 9s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 16m 2s{color} | {color:green} hbase-mapreduce in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 17s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 66m 59s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:d8b550f | | JIRA Issue | HBASE-20295 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12916965/HBASE-20295.master.003.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 889508d9af06 4.4.0-98-generic #121-Ubuntu SMP Tue Oct 10 14:24:03 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 522b8075f3 | | maven | version: Apache Maven 3.5.3 (3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC3 | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/12232/testReport/ | | Max. process+thread count | 4586 (vs. ulimit of 1) | | modules | C: hbase-mapreduce U: hbase-mapreduce | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/12232/console | |
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419771#comment-16419771 ] Josh Elser commented on HBASE-20295: [~stack], might be interested in this one. Best as I understand, this might affect our OutputFormat via Spark directly (as opposed to the hbase-spark integration). Seems pretty low risk to me. > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Assignee: Michael Jin >Priority: Major > Attachments: HBASE-20295.branch-1.4.001.patch, > HBASE-20295.master.001.patch, HBASE-20295.master.002.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419218#comment-16419218 ] Michael Jin commented on HBASE-20295: - [~yuzhih...@gmail.com], I am watching BASE-20314 now, waiting for Sean Busbey's reply. Thanks! > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Assignee: Michael Jin >Priority: Major > Attachments: HBASE-20295.branch-1.4.001.patch, > HBASE-20295.master.001.patch, HBASE-20295.master.002.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + >
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16419150#comment-16419150 ] Ted Yu commented on HBASE-20295: Michael: Please watch HBASE-20314 > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Assignee: Michael Jin >Priority: Major > Attachments: HBASE-20295.branch-1.4.001.patch, > HBASE-20295.master.001.patch, HBASE-20295.master.002.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { >
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418734#comment-16418734 ] Michael Jin commented on HBASE-20295: - [~yuzhih...@gmail.com], I've checked both jekins job console and UT build log, the only suspicious error that I think it might affect build is: "*09:20:49* Can not write to /root/.m2/copy_reference_file.log. Wrong volume permissions? Carrying on ..." in [https://builds.apache.org/job/PreCommit-HBASE-Build/12202/console] but I don't think it is the root cause, for exception of "Caused by: org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called?" in UT log, according FAQ of surefire: [http://maven.apache.org/surefire/maven-surefire-plugin/faq.html] , either System.exit is called or critical crash is happened will cause this exception, I've searched all System.exit call under hbase-mapreduce module, no calls in all test cases, and also no "TableOutputFormat.checkOutputSpecs" is called in any test case. No idea why this exception happens, anyone else can help? > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Assignee: Michael Jin >Priority: Major > Attachments: HBASE-20295.branch-1.4.001.patch, > HBASE-20295.master.001.patch, HBASE-20295.master.002.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418683#comment-16418683 ] Hadoop QA commented on HBASE-20295: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 8s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 36s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 5s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 40s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 17s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 51s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 35s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 54s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 20m 18s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 47s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 1m 2s{color} | {color:red} hbase-mapreduce in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 8s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 39m 48s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:d8b550f | | JIRA Issue | HBASE-20295 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12916778/HBASE-20295.master.002.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux c302a64ffe29 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / d8b550fabc | | maven | version: Apache Maven 3.5.3 (3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC3 | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/12202/artifact/patchprocess/patch-unit-hbase-mapreduce.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/12202/testReport/ | | modules | C: hbase-mapreduce U: hbase-mapreduce | | Console output |
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418649#comment-16418649 ] Michael Jin commented on HBASE-20295: - [~yuzhih...@gmail.com], thanks for your suggestion, I attached patch again, is qa run automatically triggered when a new patch is attached? > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Assignee: Michael Jin >Priority: Major > Attachments: HBASE-20295.branch-1.4.001.patch, > HBASE-20295.master.001.patch, HBASE-20295.master.002.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418636#comment-16418636 ] Ted Yu commented on HBASE-20295: You can attach master patch again to trigger qa run. Thanks > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Assignee: Michael Jin >Priority: Major > Attachments: HBASE-20295.branch-1.4.001.patch, > HBASE-20295.master.001.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { >
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418465#comment-16418465 ] Michael Jin commented on HBASE-20295: - [~uagashe], [~yuzhih...@gmail.com], would you please help to launch a build again? I think UT error may caused by Dock. Thanks > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Assignee: Michael Jin >Priority: Major > Attachments: HBASE-20295.branch-1.4.001.patch, > HBASE-20295.master.001.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + >
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418286#comment-16418286 ] Michael Jin commented on HBASE-20295: - {code:java} INFO] Running org.apache.hadoop.hbase.snapshot.TestExportSnapshotNoCluster [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.171 s - in org.apache.hadoop.hbase.snapshot.TestExportSnapshotNoCluster [INFO] Running org.apache.hadoop.hbase.mapred.TestTableOutputFormatConnectionExhaust [INFO] Running org.apache.hadoop.hbase.mapreduce.TestWALRecordReader [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.145 s - in org.apache.hadoop.hbase.mapred.TestTableOutputFormatConnectionExhaust [INFO] Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.747 s - in org.apache.hadoop.hbase.mapreduce.TestWALRecordReader [INFO] Running org.apache.hadoop.hbase.mapreduce.TestHRegionPartitioner [INFO] Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 3.96 s - in org.apache.hadoop.hbase.mapreduce.TestHRegionPartitioner [INFO] Running org.apache.hadoop.hbase.mapreduce.TestImportExport [INFO] Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 63.956 s - in org.apache.hadoop.hbase.mapreduce.TestImportExport [INFO] [INFO] Results: [INFO] [INFO] Tests run: 19, Failures: 0, Errors: 0, Skipped: 0 {code} All unit test passed in local > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Assignee: Michael Jin >Priority: Major > Attachments: HBASE-20295.branch-1.4.001.patch, > HBASE-20295.master.001.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); >
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418264#comment-16418264 ] Michael Jin commented on HBASE-20295: - {code:java} // code placeholder [ERROR] Error occurred in starting fork, check output in log [ERROR] Process Exit Code: 1 [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.awaitResultsDone(ForkStarter.java:496) [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.runSuitesForkPerTestSet(ForkStarter.java:443) [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:295) [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.run(ForkStarter.java:246) [ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeProvider(AbstractSurefireMojo.java:1124) [ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.executeAfterPreconditionsChecked(AbstractSurefireMojo.java:954) [ERROR] at org.apache.maven.plugin.surefire.AbstractSurefireMojo.execute(AbstractSurefireMojo.java:832) [ERROR] at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:137) [ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) [ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:154) [ERROR] at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:146) [ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:117) [ERROR] at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:81) [ERROR] at org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:56) [ERROR] at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:128) [ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:305) [ERROR] at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:192) [ERROR] at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:105) [ERROR] at org.apache.maven.cli.MavenCli.execute(MavenCli.java:956) [ERROR] at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:290) [ERROR] at org.apache.maven.cli.MavenCli.main(MavenCli.java:194) [ERROR] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [ERROR] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [ERROR] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [ERROR] at java.lang.reflect.Method.invoke(Method.java:498) [ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) [ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) [ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) [ERROR] at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356) [ERROR] Caused by: org.apache.maven.surefire.booter.SurefireBooterForkException: The forked VM terminated without properly saying goodbye. VM crash or System.exit called? [ERROR] Command was /bin/sh -c cd /testptch/hbase/hbase-mapreduce && /usr/lib/jvm/java-8-openjdk-amd64/jre/bin/java -enableassertions -Dhbase.build.id=2018-03-28T23:51:24Z -Xmx2800m -Djava.security.egd=file:/dev/./urandom -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true -jar /testptch/hbase/hbase-mapreduce/target/surefire/surefirebooter8410814171354767862.jar /testptch/hbase/hbase-mapreduce/target/surefire 2018-03-28T23-51-40_553-jvmRun1 surefire7620982142840600885tmp surefire_521935780032859030481tmp [ERROR] Error occurred in starting fork, check output in log [ERROR] Process Exit Code: 1 [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:686) [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.fork(ForkStarter.java:535) [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter.access$700(ForkStarter.java:116) [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:431) [ERROR] at org.apache.maven.plugin.surefire.booterclient.ForkStarter$2.call(ForkStarter.java:408) [ERROR] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [ERROR] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [ERROR] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [ERROR] at java.lang.Thread.run(Thread.java:748) [ERROR] {code} This error
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418258#comment-16418258 ] Hadoop QA commented on HBASE-20295: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 2m 1s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 19s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 33s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 8s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 35s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 18s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 40s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 29s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 49s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 19m 55s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 42s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 59s{color} | {color:red} hbase-mapreduce in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 9s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 41m 37s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:d8b550f | | JIRA Issue | HBASE-20295 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12916702/HBASE-20295.master.001.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 9a690f291054 3.13.0-139-generic #188-Ubuntu SMP Tue Jan 9 14:43:09 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / d8b550fabc | | maven | version: Apache Maven 3.5.3 (3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC3 | | unit | https://builds.apache.org/job/PreCommit-HBASE-Build/12195/artifact/patchprocess/patch-unit-hbase-mapreduce.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/12195/testReport/ | | modules | C: hbase-mapreduce U: hbase-mapreduce | | Console output |
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418236#comment-16418236 ] Umesh Agashe commented on HBASE-20295: -- +1, lgtm > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Assignee: Michael Jin >Priority: Major > Attachments: HBASE-20295.branch-1.4.001.patch, > HBASE-20295.master.001.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418227#comment-16418227 ] Michael Jin commented on HBASE-20295: - [~yuzhih...@gmail.com], patch for master branch is attached, please check. Thanks > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Assignee: Michael Jin >Priority: Major > Attachments: HBASE-20295.branch-1.4.001.patch, > HBASE-20295.master.001.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418016#comment-16418016 ] Ted Yu commented on HBASE-20295: It seems compilation error was not related to the patch: {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile (default-compile) on project hbase-server: Execution default-compile of goal org.apache.maven.plugins:maven-compiler-plugin:3.6.1:compile failed: Plugin org.apache.maven.plugins:maven-compiler-plugin:3.6.1 or one of its dependencies could not be resolved: Failure to find org.apache.hbase:hbase-error-prone:jar:1.4.3 in http://repository.apache.org/snapshots was cached in the local repository, resolution will not be reattempted until the update interval of Nexus has elapsed or updates are forced -> [Help 1] {code} Michael: Please attach patch for master branch. Thanks > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Assignee: Michael Jin >Priority: Major > Attachments: HBASE-20295.branch-1.4.001.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418011#comment-16418011 ] Hadoop QA commented on HBASE-20295: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 0s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | || || || || {color:brown} branch-1.4 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 59s{color} | {color:green} branch-1.4 passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 20s{color} | {color:red} hbase-server in branch-1.4 failed with JDK v1.8.0_163. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 16s{color} | {color:red} hbase-server in branch-1.4 failed with JDK v1.7.0_171. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 20s{color} | {color:green} branch-1.4 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 3m 58s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 32s{color} | {color:green} branch-1.4 passed with JDK v1.8.0_163 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 34s{color} | {color:green} branch-1.4 passed with JDK v1.7.0_171 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 33s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 13s{color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_163. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 13s{color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_163. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 16s{color} | {color:red} hbase-server in the patch failed with JDK v1.7.0_171. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 16s{color} | {color:red} hbase-server in the patch failed with JDK v1.7.0_171. {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 17s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 35s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 8m 42s{color} | {color:green} Patch does not cause any errors with Hadoop 2.4.1 2.5.2 2.6.5 2.7.4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 25s{color} | {color:green} the patch passed with JDK v1.8.0_163 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} the patch passed with JDK v1.7.0_171 {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green}103m 6s{color} | {color:green} hbase-server in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 22s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}130m 46s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:74e3133 | | JIRA Issue | HBASE-20295 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12916549/HBASE-20295.branch-1.4.001.patch | | Optional Tests | asflicense javac
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16417811#comment-16417811 ] Ted Yu commented on HBASE-20295: Ran the following tests with patch ported to master branch: {code} Running org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat2 Tests run: 18, Failures: 0, Errors: 0, Skipped: 14, Time elapsed: 97.535 sec - in org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat2 Running org.apache.hadoop.hbase.mapred.TestTableOutputFormatConnectionExhaust Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 34.129 sec - in org.apache.hadoop.hbase.mapred.TestTableOutputFormatConnectionExhaust {code} +1 from my side. > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Assignee: Michael Jin >Priority: Major > Attachments: HBASE-20295.branch-1.4.001.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf =
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16417759#comment-16417759 ] Ted Yu commented on HBASE-20295: Michael: When you come up with patch for master branch, note the location for the class: ./hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.java Thanks > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: mapreduce >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Priority: Major > Attachments: HBASE-20295.branch-1.4.001.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); >
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416923#comment-16416923 ] Michael Jin commented on HBASE-20295: - [~yuzhih...@gmail.com], thanks for your suggestion, I created a review board account and use submit-patch.py again, view request was created: [https://reviews.apache.org/r/66328/] > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Priority: Major > Attachments: HBASE-20295.branch-1.4.001.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString());
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416914#comment-16416914 ] Ted Yu commented on HBASE-20295: [~apurtell]: Can you take a look at the patch ? Thanks > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Priority: Major > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > }
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416913#comment-16416913 ] Ted Yu commented on HBASE-20295: There is description in the refguide on reviewboard : http://hbase.apache.org/book.html#reviewboard Since the patch is small, review board is good to have but not required. Normally we start with master branch first. > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Priority: Major > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416848#comment-16416848 ] Michael Jin commented on HBASE-20295: - Hi [~yuzhih...@gmail.com], I attached patch file to this post by using submit-patch.py, but no idea about review board user name(I googled review board but not find a solution), so I use "-srb" to skip create/update review board, if the option is mandatory, please kindly let me know how can register/add a review boar user > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Priority: Major > Attachments: HBASE-20295.branch-1.4.003.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE)); > if
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416583#comment-16416583 ] Michael Jin commented on HBASE-20295: - [~yuzhih...@gmail.com], I am working on it, sorry for my mistake. > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Priority: Major > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + >
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416539#comment-16416539 ] Ted Yu commented on HBASE-20295: Have you read the comment I mentioned ? Why not use ubmit-patch.sh ? > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Priority: Major > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + >
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416521#comment-16416521 ] Michael Jin commented on HBASE-20295: - [~yuzhih...@gmail.com], I've make a patch according your post, but the patch is very large, 78.5M, is there any wrong with my command? michael@menjin:~/MJ/Projects/hbase/hbase$ git checkout branch-1.4 Branch branch-1.4 set up to track remote branch branch-1.4 from origin. Switched to a new branch 'branch-1.4' michael@menjin:~/MJ/Projects/hbase/hbase$ git checkout -B HBASE-20295 Switched to a new branch 'HBASE-20295' michael@menjin:~/MJ/Projects/hbase/hbase$ git add . michael@menjin:~/MJ/Projects/hbase/hbase$ git status On branch HBASE-20295 Changes to be committed: (use "git reset HEAD ..." to unstage) modified: hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.java michael@menjin:~/MJ/Projects/hbase/hbase$ git commit -a -m "HBASE-20295 fix NullPointException in TableOutputFormat.checkOutputSpecs" [HBASE-20295 d49fb97b1e] HBASE-20295 fix NullPointException in TableOutputFormat.checkOutputSpecs 1 file changed, 5 insertions(+), 2 deletions(-) michael@menjin:~/MJ/Projects/hbase/hbase$ dev-support/make_patch.sh git_dirty is 0 Patch directory not specified. Falling back to ~/patches/. /home/michael/patches does not exist. Creating it. 3503 commits exist only in your local branch. Interactive rebase? Creating patch /home/michael/patches/HBASE-20295.patch using git format-patch > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Priority: Major > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } >
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416501#comment-16416501 ] Michael Jin commented on HBASE-20295: - Got it, thanks! > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Priority: Major > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > In hbase
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416499#comment-16416499 ] Ted Yu commented on HBASE-20295: See this comment: https://issues.apache.org/jira/browse/HBASE-19985?focusedCommentId=16384696=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16384696 > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Priority: Major > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416493#comment-16416493 ] Michael Jin commented on HBASE-20295: - [~yuzhih...@gmail.com], I am glad to fix this issue, but never fix bug or commit code to HBase before, would you please give me some guideline or brief introduction how to attach a patch? > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Priority: Major > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416483#comment-16416483 ] Ted Yu commented on HBASE-20295: Can you attach patch to this issue ? Thanks > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Priority: Major > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } >
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416477#comment-16416477 ] Michael Jin commented on HBASE-20295: - [~yuzhih...@gmail.com], I've verified in local, it works > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Priority: Major > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + >
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16416471#comment-16416471 ] Michael Jin commented on HBASE-20295: - Hi Ted, Sorry for reply you late, I've verified in local, it works. Thanks > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Priority: Major > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + >
[jira] [Commented] (HBASE-20295) TableOutputFormat.checkOutputSpecs throw NullPointerException Exception
[ https://issues.apache.org/jira/browse/HBASE-20295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16415759#comment-16415759 ] Ted Yu commented on HBASE-20295: Have you verified that the Spark job can continue with the proposed fix ? thanks > TableOutputFormat.checkOutputSpecs throw NullPointerException Exception > --- > > Key: HBASE-20295 > URL: https://issues.apache.org/jira/browse/HBASE-20295 > Project: HBase > Issue Type: Bug > Components: Client >Affects Versions: 1.4.0 > Environment: Spark 2.2.1, HBase 1.4.0 >Reporter: Michael Jin >Priority: Major > Original Estimate: 168h > Remaining Estimate: 168h > > I am using spark write data to HBase by using RDD. > saveAsNewAPIHadoopDataset function, it works fine with hbase 1.3.1, but when > update my hbase dependency to 1.4.0 in pom.xml, it throw > java.lang.NullPointerException, it is caused by a logic error in > TableOutputFormat.checkOutputSpecs function, please check below details: > first let's take a look at SparkHadoopMapReduceWriter.write function in > SparkHadoopMapReduceWriter.scala > {code:java} > // SparkHadoopMapReduceWriter.write > (org.apache.spark.internal.io.SparkHadoopMapReduceWriter.scala) > def write[K, V: ClassTag]( > rdd: RDD[(K, V)], > hadoopConf: Configuration): Unit = { > // Extract context and configuration from RDD. > val sparkContext = rdd.context > val stageId = rdd.id > val sparkConf = rdd.conf > val conf = new SerializableConfiguration(hadoopConf) > // Set up a job. > val jobTrackerId = SparkHadoopWriterUtils.createJobTrackerID(new Date()) > val jobAttemptId = new TaskAttemptID(jobTrackerId, stageId, TaskType.MAP, > 0, 0) > val jobContext = new TaskAttemptContextImpl(conf.value, jobAttemptId) > val format = jobContext.getOutputFormatClass > if (SparkHadoopWriterUtils.isOutputSpecValidationEnabled(sparkConf)) { > // FileOutputFormat ignores the filesystem parameter > val jobFormat = format.newInstance > jobFormat.checkOutputSpecs(jobContext) > } > val committer = FileCommitProtocol.instantiate( > className = classOf[HadoopMapReduceCommitProtocol].getName, > jobId = stageId.toString, > outputPath = > conf.value.get("mapreduce.output.fileoutputformat.outputdir"), > isAppend = false).asInstanceOf[HadoopMapReduceCommitProtocol] > committer.setupJob(jobContext) > ...{code} > in "write" function if output spec validation is enabled, it will call > checkOutputSpec function in TableOutputFormat class, but the job format is > simply created by "vall jobFormat = format.newInstance", this will NOT > initialize "conf" member variable in TableOutputFormat class, let's continue > check checkOutputSpecs function in TableOutputFormat class > > {code:java} > // TableOutputFormat.checkOutputSpecs > (org.apache.hadoop.hbase.mapreduce.TableOutputFormat.java) HBASE 1.4.0 > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > try (Admin admin = > ConnectionFactory.createConnection(getConf()).getAdmin()) { > TableName tableName = TableName.valueOf(this.conf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + > tableName.getNameAsString()); > } > } > } > {code} > > "ConnectionFactory.createConnection(getConf())", as mentioned above "conf" > class member is not initialized, so getConf() will return null, so in the > next UserProvider create instance process, it throw the > NullPointException(Please part of stack trace at the end), it is a little > confused that, context passed by function parameter is actually been properly > constructed, and it contains Configuration object, why context is never used? > So I suggest to use below code to partly fix this issue: > > {code:java} > // code placeholder > @Override > public void checkOutputSpecs(JobContext context) throws IOException, > InterruptedException { > Configuration hConf = context.getConfiguration(); > if(hConf == null) > hConf = this.conf; > try (Admin admin = ConnectionFactory.createConnection(hConf).getAdmin()) { > TableName tableName = TableName.valueOf(hConf.get(OUTPUT_TABLE)); > if (!admin.tableExists(tableName)) { > throw new TableNotFoundException("Can't write, table does not exist:" + > tableName.getNameAsString()); > } > if (!admin.isTableEnabled(tableName)) { > throw new TableNotEnabledException("Can't write, table is not enabled: > " + >