[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence
[ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257579#comment-13257579 ] Zhihong Yu commented on HBASE-5741: --- Integrated to 0.94 as well. ImportTsv does not check for table existence - Key: HBASE-5741 URL: https://issues.apache.org/jira/browse/HBASE-5741 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.4 Reporter: Clint Heath Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.1 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch The usage statement for the importtsv command to hbase claims this: Note: if you do not use this option, then the target table must already exist in HBase (in reference to the importtsv.bulk.output command-line option) The truth is, the table must exist no matter what, importtsv cannot and will not create it for you. This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it: (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java) 305 HTable table = new HTable(conf, tableName); The HTable method signature in use there assumes the table exists and runs a meta scan on it: (From org.apache.hadoop.hbase.client.HTable.java) 142 * Creates an object to access a HBase table. ... 151 public HTable(Configuration conf, final String tableName) What we should do inside of createSubmittableJob is something similar to what the completebulkloads command would do: (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java) 690 boolean tableExists = this.doesTableExist(tableName); 691 if (!tableExists) this.createTable(tableName,dirPath); Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception: 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence
[ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257659#comment-13257659 ] Hudson commented on HBASE-5741: --- Integrated in HBase-0.94 #133 (See [https://builds.apache.org/job/HBase-0.94/133/]) HBASE-5741 ImportTsv does not check for table existence (Himanshu) (Revision 1328032) Result = FAILURE tedyu : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportTsv.java ImportTsv does not check for table existence - Key: HBASE-5741 URL: https://issues.apache.org/jira/browse/HBASE-5741 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.4 Reporter: Clint Heath Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.1 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch The usage statement for the importtsv command to hbase claims this: Note: if you do not use this option, then the target table must already exist in HBase (in reference to the importtsv.bulk.output command-line option) The truth is, the table must exist no matter what, importtsv cannot and will not create it for you. This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it: (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java) 305 HTable table = new HTable(conf, tableName); The HTable method signature in use there assumes the table exists and runs a meta scan on it: (From org.apache.hadoop.hbase.client.HTable.java) 142 * Creates an object to access a HBase table. ... 151 public HTable(Configuration conf, final String tableName) What we should do inside of createSubmittableJob is something similar to what the completebulkloads command would do: (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java) 690 boolean tableExists = this.doesTableExist(tableName); 691 if (!tableExists) this.createTable(tableName,dirPath); Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception: 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence
[ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257714#comment-13257714 ] Hudson commented on HBASE-5741: --- Integrated in HBase-0.94-security #17 (See [https://builds.apache.org/job/HBase-0.94-security/17/]) HBASE-5741 ImportTsv does not check for table existence (Himanshu) (Revision 1328032) Result = FAILURE tedyu : Files : * /hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java * /hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportTsv.java ImportTsv does not check for table existence - Key: HBASE-5741 URL: https://issues.apache.org/jira/browse/HBASE-5741 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.4 Reporter: Clint Heath Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.1 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch The usage statement for the importtsv command to hbase claims this: Note: if you do not use this option, then the target table must already exist in HBase (in reference to the importtsv.bulk.output command-line option) The truth is, the table must exist no matter what, importtsv cannot and will not create it for you. This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it: (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java) 305 HTable table = new HTable(conf, tableName); The HTable method signature in use there assumes the table exists and runs a meta scan on it: (From org.apache.hadoop.hbase.client.HTable.java) 142 * Creates an object to access a HBase table. ... 151 public HTable(Configuration conf, final String tableName) What we should do inside of createSubmittableJob is something similar to what the completebulkloads command would do: (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java) 690 boolean tableExists = this.doesTableExist(tableName); 691 if (!tableExists) this.createTable(tableName,dirPath); Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception: 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence
[ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256231#comment-13256231 ] Hudson commented on HBASE-5741: --- Integrated in HBase-TRUNK-security #174 (See [https://builds.apache.org/job/HBase-TRUNK-security/174/]) HBASE-5741 ImportTsv does not check for table existence (Himanshu) (Revision 1327338) Result = FAILURE tedyu : Files : * /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java * /hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportTsv.java ImportTsv does not check for table existence - Key: HBASE-5741 URL: https://issues.apache.org/jira/browse/HBASE-5741 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.4 Reporter: Clint Heath Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.1 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch The usage statement for the importtsv command to hbase claims this: Note: if you do not use this option, then the target table must already exist in HBase (in reference to the importtsv.bulk.output command-line option) The truth is, the table must exist no matter what, importtsv cannot and will not create it for you. This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it: (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java) 305 HTable table = new HTable(conf, tableName); The HTable method signature in use there assumes the table exists and runs a meta scan on it: (From org.apache.hadoop.hbase.client.HTable.java) 142 * Creates an object to access a HBase table. ... 151 public HTable(Configuration conf, final String tableName) What we should do inside of createSubmittableJob is something similar to what the completebulkloads command would do: (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java) 690 boolean tableExists = this.doesTableExist(tableName); 691 if (!tableExists) this.createTable(tableName,dirPath); Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception: 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence
[ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255995#comment-13255995 ] Clint Heath commented on HBASE-5741: Lars, et al, thank you for your questions and comments. I may be missing something, so please correct me if I'm wrong, but here's how I see the situation: 1) importtsv will auto-split the input data in region-server-sized Hfiles, based on hbase.hregion.max.filesize (because we use the HFileOutputFormat and it checks this parameter from the client configs). 2) completebulkload then distributes the resulting HFiles to the region servers, thereby populating your cluster with all the regions needed to host your data. Alleviating all the memstore flushes, compactions, splitting, and other overhead associated with running billions of individual put()s to incrementally load your data. This is why we recommend the bulk load process to initially populate HBase tables. 3) pre-splitting your table (based on rowkeys, as is the only mechanism) is useless in this scenario, because it will not take into account the size of data associated with each key. The customer would literally have to walk through their data and determine how many Gigs of data are associated with each rowkey in order to preemptively split their tables in a bulk load scenario...and even then, HBase only takes those pre-splits as suggestions and if the data ends up needing to be split differently, that will happen automatically. So what purpose does pre-splitting serve in a bulk load? Think initial load of massive amounts of data. 4) Lars had a good point about compression, yes, typically a customer *should* set up their table with compression first, but they typically don't know this and they can do it after the fact without consequence. In conclusion, we provide this handy command-line tool to help our HBase customers (most of whom are complete newbies and don't know what they're doing) get their HBase tables up and running, yet we give them conflicting information about how to use the tool. If we need to tell them to pre-create the table, then we should tell them so...clearly. However, I think it is just simpler and more intuitive if the tool did that automatically, as the javadocs indicate it will. Since region splits are determined internally by the HFileOutputFormat anyway (if they use the bulkload option), what's the harm? This enables people to get their data loaded into HBase with the least amount of education, ramp-up, java development, data mining, etc., etc. ImportTsv does not check for table existence - Key: HBASE-5741 URL: https://issues.apache.org/jira/browse/HBASE-5741 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.4 Reporter: Clint Heath Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.1 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch The usage statement for the importtsv command to hbase claims this: Note: if you do not use this option, then the target table must already exist in HBase (in reference to the importtsv.bulk.output command-line option) The truth is, the table must exist no matter what, importtsv cannot and will not create it for you. This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it: (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java) 305 HTable table = new HTable(conf, tableName); The HTable method signature in use there assumes the table exists and runs a meta scan on it: (From org.apache.hadoop.hbase.client.HTable.java) 142 * Creates an object to access a HBase table. ... 151 public HTable(Configuration conf, final String tableName) What we should do inside of createSubmittableJob is something similar to what the completebulkloads command would do: (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java) 690 boolean tableExists = this.doesTableExist(tableName); 691 if (!tableExists) this.createTable(tableName,dirPath); Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception: 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly,
[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence
[ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256083#comment-13256083 ] Lars Hofhansl commented on HBASE-5741: -- Thanks Clint. No harm, I just think it will hide complexity that should not be hidden. I also don't think it's too much to ask a user to create the table first (but we should definitely fix the Javadoc in that case). That all said, I am +-0 on this. It's a simple change, and v3 looks good. ImportTsv does not check for table existence - Key: HBASE-5741 URL: https://issues.apache.org/jira/browse/HBASE-5741 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.4 Reporter: Clint Heath Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.1 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch The usage statement for the importtsv command to hbase claims this: Note: if you do not use this option, then the target table must already exist in HBase (in reference to the importtsv.bulk.output command-line option) The truth is, the table must exist no matter what, importtsv cannot and will not create it for you. This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it: (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java) 305 HTable table = new HTable(conf, tableName); The HTable method signature in use there assumes the table exists and runs a meta scan on it: (From org.apache.hadoop.hbase.client.HTable.java) 142 * Creates an object to access a HBase table. ... 151 public HTable(Configuration conf, final String tableName) What we should do inside of createSubmittableJob is something similar to what the completebulkloads command would do: (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java) 690 boolean tableExists = this.doesTableExist(tableName); 691 if (!tableExists) this.createTable(tableName,dirPath); Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception: 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence
[ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256107#comment-13256107 ] Zhihong Yu commented on HBASE-5741: --- Integrated to trunk. Thanks for the patch Himanshu. Will wait for 0.94.0 to come out before integrating to 0.94 ImportTsv does not check for table existence - Key: HBASE-5741 URL: https://issues.apache.org/jira/browse/HBASE-5741 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.4 Reporter: Clint Heath Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.1 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch The usage statement for the importtsv command to hbase claims this: Note: if you do not use this option, then the target table must already exist in HBase (in reference to the importtsv.bulk.output command-line option) The truth is, the table must exist no matter what, importtsv cannot and will not create it for you. This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it: (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java) 305 HTable table = new HTable(conf, tableName); The HTable method signature in use there assumes the table exists and runs a meta scan on it: (From org.apache.hadoop.hbase.client.HTable.java) 142 * Creates an object to access a HBase table. ... 151 public HTable(Configuration conf, final String tableName) What we should do inside of createSubmittableJob is something similar to what the completebulkloads command would do: (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java) 690 boolean tableExists = this.doesTableExist(tableName); 691 if (!tableExists) this.createTable(tableName,dirPath); Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception: 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence
[ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253637#comment-13253637 ] Himanshu Vashishtha commented on HBASE-5741: I am waiting for Clint to give me some numbers of such use cases to make the cut; (Apparently, he is on vacation these days and will be back this Monday) Thanks. ImportTsv does not check for table existence - Key: HBASE-5741 URL: https://issues.apache.org/jira/browse/HBASE-5741 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.4 Reporter: Clint Heath Assignee: Himanshu Vashishtha Fix For: 0.96.0, 0.94.1 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch The usage statement for the importtsv command to hbase claims this: Note: if you do not use this option, then the target table must already exist in HBase (in reference to the importtsv.bulk.output command-line option) The truth is, the table must exist no matter what, importtsv cannot and will not create it for you. This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it: (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java) 305 HTable table = new HTable(conf, tableName); The HTable method signature in use there assumes the table exists and runs a meta scan on it: (From org.apache.hadoop.hbase.client.HTable.java) 142 * Creates an object to access a HBase table. ... 151 public HTable(Configuration conf, final String tableName) What we should do inside of createSubmittableJob is something similar to what the completebulkloads command would do: (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java) 690 boolean tableExists = this.doesTableExist(tableName); 691 if (!tableExists) this.createTable(tableName,dirPath); Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception: 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence
[ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251851#comment-13251851 ] jirapos...@reviews.apache.org commented on HBASE-5741: -- --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4700/ --- (Updated 2012-04-11 19:01:39.660787) Review request for hbase. Changes --- Changes done as per Ted. Summary --- There is a bulk output option in the importtsv workload. It outputs the HFiles in a user defined directory. The current code assumes that a table with its name equal to the given output directory exists, and throws an exception otherwise. Here is a patch for creating a table in case it doesn't exist. This addresses bug HBase-5741. https://issues.apache.org/jira/browse/HBase-5741 Diffs (updated) - src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java ab22fc4 src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportTsv.java ac30a62 Diff: https://reviews.apache.org/r/4700/diff Testing --- Added a new test for bulkoutput; All importtsv tests pass. Thanks, Himanshu ImportTsv does not check for table existence - Key: HBASE-5741 URL: https://issues.apache.org/jira/browse/HBASE-5741 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.4 Reporter: Clint Heath Assignee: Himanshu Vashishtha Attachments: HBase-5741.patch The usage statement for the importtsv command to hbase claims this: Note: if you do not use this option, then the target table must already exist in HBase (in reference to the importtsv.bulk.output command-line option) The truth is, the table must exist no matter what, importtsv cannot and will not create it for you. This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it: (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java) 305 HTable table = new HTable(conf, tableName); The HTable method signature in use there assumes the table exists and runs a meta scan on it: (From org.apache.hadoop.hbase.client.HTable.java) 142 * Creates an object to access a HBase table. ... 151 public HTable(Configuration conf, final String tableName) What we should do inside of createSubmittableJob is something similar to what the completebulkloads command would do: (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java) 690 boolean tableExists = this.doesTableExist(tableName); 691 if (!tableExists) this.createTable(tableName,dirPath); Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception: 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence
[ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252025#comment-13252025 ] Hadoop QA commented on HBASE-5741: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12522338/5741-94.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1486//console This message is automatically generated. ImportTsv does not check for table existence - Key: HBASE-5741 URL: https://issues.apache.org/jira/browse/HBASE-5741 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.4 Reporter: Clint Heath Assignee: Himanshu Vashishtha Fix For: 0.94.0, 0.96.0 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch The usage statement for the importtsv command to hbase claims this: Note: if you do not use this option, then the target table must already exist in HBase (in reference to the importtsv.bulk.output command-line option) The truth is, the table must exist no matter what, importtsv cannot and will not create it for you. This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it: (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java) 305 HTable table = new HTable(conf, tableName); The HTable method signature in use there assumes the table exists and runs a meta scan on it: (From org.apache.hadoop.hbase.client.HTable.java) 142 * Creates an object to access a HBase table. ... 151 public HTable(Configuration conf, final String tableName) What we should do inside of createSubmittableJob is something similar to what the completebulkloads command would do: (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java) 690 boolean tableExists = this.doesTableExist(tableName); 691 if (!tableExists) this.createTable(tableName,dirPath); Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception: 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence
[ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252036#comment-13252036 ] Hadoop QA commented on HBASE-5741: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12522330/5741-v3.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1485//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1485//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1485//console This message is automatically generated. ImportTsv does not check for table existence - Key: HBASE-5741 URL: https://issues.apache.org/jira/browse/HBASE-5741 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.4 Reporter: Clint Heath Assignee: Himanshu Vashishtha Fix For: 0.94.0, 0.96.0 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch The usage statement for the importtsv command to hbase claims this: Note: if you do not use this option, then the target table must already exist in HBase (in reference to the importtsv.bulk.output command-line option) The truth is, the table must exist no matter what, importtsv cannot and will not create it for you. This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it: (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java) 305 HTable table = new HTable(conf, tableName); The HTable method signature in use there assumes the table exists and runs a meta scan on it: (From org.apache.hadoop.hbase.client.HTable.java) 142 * Creates an object to access a HBase table. ... 151 public HTable(Configuration conf, final String tableName) What we should do inside of createSubmittableJob is something similar to what the completebulkloads command would do: (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java) 690 boolean tableExists = this.doesTableExist(tableName); 691 if (!tableExists) this.createTable(tableName,dirPath); Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception: 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence
[ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252038#comment-13252038 ] Zhihong Yu commented on HBASE-5741: --- I would integrate patches to trunk and 0.94 tomorrow morning if there is no objection. ImportTsv does not check for table existence - Key: HBASE-5741 URL: https://issues.apache.org/jira/browse/HBASE-5741 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.4 Reporter: Clint Heath Assignee: Himanshu Vashishtha Fix For: 0.94.0, 0.96.0 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch The usage statement for the importtsv command to hbase claims this: Note: if you do not use this option, then the target table must already exist in HBase (in reference to the importtsv.bulk.output command-line option) The truth is, the table must exist no matter what, importtsv cannot and will not create it for you. This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it: (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java) 305 HTable table = new HTable(conf, tableName); The HTable method signature in use there assumes the table exists and runs a meta scan on it: (From org.apache.hadoop.hbase.client.HTable.java) 142 * Creates an object to access a HBase table. ... 151 public HTable(Configuration conf, final String tableName) What we should do inside of createSubmittableJob is something similar to what the completebulkloads command would do: (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java) 690 boolean tableExists = this.doesTableExist(tableName); 691 if (!tableExists) this.createTable(tableName,dirPath); Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception: 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence
[ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252064#comment-13252064 ] Lars Hofhansl commented on HBASE-5741: -- Sorry for chiming in late here, but do we actually want ImportTsv to create the table for us? Personally I'd rather have it fail, which forces me to think about how I want create the table (pre split, compression, etc). ImportTsv does not check for table existence - Key: HBASE-5741 URL: https://issues.apache.org/jira/browse/HBASE-5741 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.4 Reporter: Clint Heath Assignee: Himanshu Vashishtha Fix For: 0.94.0, 0.96.0 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch The usage statement for the importtsv command to hbase claims this: Note: if you do not use this option, then the target table must already exist in HBase (in reference to the importtsv.bulk.output command-line option) The truth is, the table must exist no matter what, importtsv cannot and will not create it for you. This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it: (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java) 305 HTable table = new HTable(conf, tableName); The HTable method signature in use there assumes the table exists and runs a meta scan on it: (From org.apache.hadoop.hbase.client.HTable.java) 142 * Creates an object to access a HBase table. ... 151 public HTable(Configuration conf, final String tableName) What we should do inside of createSubmittableJob is something similar to what the completebulkloads command would do: (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java) 690 boolean tableExists = this.doesTableExist(tableName); 691 if (!tableExists) this.createTable(tableName,dirPath); Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception: 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence
[ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252071#comment-13252071 ] Zhihong Yu commented on HBASE-5741: --- How about adding an option to ImportTsv, default to false, allowing user to auto-create the table ? ImportTsv does not check for table existence - Key: HBASE-5741 URL: https://issues.apache.org/jira/browse/HBASE-5741 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.4 Reporter: Clint Heath Assignee: Himanshu Vashishtha Fix For: 0.94.0, 0.96.0 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch The usage statement for the importtsv command to hbase claims this: Note: if you do not use this option, then the target table must already exist in HBase (in reference to the importtsv.bulk.output command-line option) The truth is, the table must exist no matter what, importtsv cannot and will not create it for you. This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it: (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java) 305 HTable table = new HTable(conf, tableName); The HTable method signature in use there assumes the table exists and runs a meta scan on it: (From org.apache.hadoop.hbase.client.HTable.java) 142 * Creates an object to access a HBase table. ... 151 public HTable(Configuration conf, final String tableName) What we should do inside of createSubmittableJob is something similar to what the completebulkloads command would do: (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java) 690 boolean tableExists = this.doesTableExist(tableName); 691 if (!tableExists) this.createTable(tableName,dirPath); Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception: 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence
[ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252076#comment-13252076 ] Lars Hofhansl commented on HBASE-5741: -- Import does not do that either. At least we should be consistent between the various importing tools. It seems overkill to burden these tools with that. That said, I am not opposed, just questioning the usefulness. ImportTsv does not check for table existence - Key: HBASE-5741 URL: https://issues.apache.org/jira/browse/HBASE-5741 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.4 Reporter: Clint Heath Assignee: Himanshu Vashishtha Fix For: 0.94.0, 0.96.0 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch The usage statement for the importtsv command to hbase claims this: Note: if you do not use this option, then the target table must already exist in HBase (in reference to the importtsv.bulk.output command-line option) The truth is, the table must exist no matter what, importtsv cannot and will not create it for you. This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it: (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java) 305 HTable table = new HTable(conf, tableName); The HTable method signature in use there assumes the table exists and runs a meta scan on it: (From org.apache.hadoop.hbase.client.HTable.java) 142 * Creates an object to access a HBase table. ... 151 public HTable(Configuration conf, final String tableName) What we should do inside of createSubmittableJob is something similar to what the completebulkloads command would do: (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java) 690 boolean tableExists = this.doesTableExist(tableName); 691 if (!tableExists) this.createTable(tableName,dirPath); Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception: 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence
[ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252081#comment-13252081 ] Hadoop QA commented on HBASE-5741: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12522343/5741-v3.txt against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/1487//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/1487//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/1487//console This message is automatically generated. ImportTsv does not check for table existence - Key: HBASE-5741 URL: https://issues.apache.org/jira/browse/HBASE-5741 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.4 Reporter: Clint Heath Assignee: Himanshu Vashishtha Fix For: 0.94.0, 0.96.0 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch The usage statement for the importtsv command to hbase claims this: Note: if you do not use this option, then the target table must already exist in HBase (in reference to the importtsv.bulk.output command-line option) The truth is, the table must exist no matter what, importtsv cannot and will not create it for you. This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it: (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java) 305 HTable table = new HTable(conf, tableName); The HTable method signature in use there assumes the table exists and runs a meta scan on it: (From org.apache.hadoop.hbase.client.HTable.java) 142 * Creates an object to access a HBase table. ... 151 public HTable(Configuration conf, final String tableName) What we should do inside of createSubmittableJob is something similar to what the completebulkloads command would do: (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java) 690 boolean tableExists = this.doesTableExist(tableName); 691 if (!tableExists) this.createTable(tableName,dirPath); Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception: 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence
[ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252087#comment-13252087 ] Himanshu Vashishtha commented on HBASE-5741: As Clint mentioned, there are some workloads that use ImportTsv to create HFiles, which later on are used in the bulk import method : my motivation for the patch. What you suggest Lars, failing the importtsv, or adding a similar thing in the Import tool too? I think later makes it consistent. ImportTsv does not check for table existence - Key: HBASE-5741 URL: https://issues.apache.org/jira/browse/HBASE-5741 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.4 Reporter: Clint Heath Assignee: Himanshu Vashishtha Fix For: 0.94.0, 0.96.0 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch The usage statement for the importtsv command to hbase claims this: Note: if you do not use this option, then the target table must already exist in HBase (in reference to the importtsv.bulk.output command-line option) The truth is, the table must exist no matter what, importtsv cannot and will not create it for you. This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it: (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java) 305 HTable table = new HTable(conf, tableName); The HTable method signature in use there assumes the table exists and runs a meta scan on it: (From org.apache.hadoop.hbase.client.HTable.java) 142 * Creates an object to access a HBase table. ... 151 public HTable(Configuration conf, final String tableName) What we should do inside of createSubmittableJob is something similar to what the completebulkloads command would do: (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java) 690 boolean tableExists = this.doesTableExist(tableName); 691 if (!tableExists) this.createTable(tableName,dirPath); Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception: 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence
[ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252095#comment-13252095 ] Lars Hofhansl commented on HBASE-5741: -- Even the created HFileOutputFormat use the table in HBase to find out about splits. We will rarely bulk import into an empty table that is not pre-split...? (I'm assuming here, you folks at Cloudera see many more use cases that I do) If you believe strongly that we should add this, let's do it :) We can do Import in a separate patch, or not do that as we see fit. ImportTsv does not check for table existence - Key: HBASE-5741 URL: https://issues.apache.org/jira/browse/HBASE-5741 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.4 Reporter: Clint Heath Assignee: Himanshu Vashishtha Fix For: 0.94.0, 0.96.0 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, HBase-5741.patch The usage statement for the importtsv command to hbase claims this: Note: if you do not use this option, then the target table must already exist in HBase (in reference to the importtsv.bulk.output command-line option) The truth is, the table must exist no matter what, importtsv cannot and will not create it for you. This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it: (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java) 305 HTable table = new HTable(conf, tableName); The HTable method signature in use there assumes the table exists and runs a meta scan on it: (From org.apache.hadoop.hbase.client.HTable.java) 142 * Creates an object to access a HBase table. ... 151 public HTable(Configuration conf, final String tableName) What we should do inside of createSubmittableJob is something similar to what the completebulkloads command would do: (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java) 690 boolean tableExists = this.doesTableExist(tableName); 691 if (!tableExists) this.createTable(tableName,dirPath); Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception: 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence
[ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250813#comment-13250813 ] Himanshu Vashishtha commented on HBASE-5741: You are right. I am on it now. Thanks. ImportTsv does not check for table existence - Key: HBASE-5741 URL: https://issues.apache.org/jira/browse/HBASE-5741 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.4 Reporter: Clint Heath Assignee: Himanshu Vashishtha The usage statement for the importtsv command to hbase claims this: Note: if you do not use this option, then the target table must already exist in HBase (in reference to the importtsv.bulk.output command-line option) The truth is, the table must exist no matter what, importtsv cannot and will not create it for you. This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it: (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java) 305 HTable table = new HTable(conf, tableName); The HTable method signature in use there assumes the table exists and runs a meta scan on it: (From org.apache.hadoop.hbase.client.HTable.java) 142 * Creates an object to access a HBase table. ... 151 public HTable(Configuration conf, final String tableName) What we should do inside of createSubmittableJob is something similar to what the completebulkloads command would do: (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java) 690 boolean tableExists = this.doesTableExist(tableName); 691 if (!tableExists) this.createTable(tableName,dirPath); Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception: 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence
[ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249966#comment-13249966 ] Clint Heath commented on HBASE-5741: Thanks for the feedback. We I have customers who are using the workflow you mentioned. This is why I discovered the confusing javadoc reference. ImportTsv does not check for table existence - Key: HBASE-5741 URL: https://issues.apache.org/jira/browse/HBASE-5741 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.4 Reporter: Clint Heath Assignee: Himanshu Vashishtha The usage statement for the importtsv command to hbase claims this: Note: if you do not use this option, then the target table must already exist in HBase (in reference to the importtsv.bulk.output command-line option) The truth is, the table must exist no matter what, importtsv cannot and will not create it for you. This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it: (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java) 305 HTable table = new HTable(conf, tableName); The HTable method signature in use there assumes the table exists and runs a meta scan on it: (From org.apache.hadoop.hbase.client.HTable.java) 142 * Creates an object to access a HBase table. ... 151 public HTable(Configuration conf, final String tableName) What we should do inside of createSubmittableJob is something similar to what the completebulkloads command would do: (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java) 690 boolean tableExists = this.doesTableExist(tableName); 691 if (!tableExists) this.createTable(tableName,dirPath); Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception: 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence
[ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249442#comment-13249442 ] Himanshu Vashishtha commented on HBASE-5741: Yes, the javadoc says it half correct when it says that the table must exist in HBase if you are not using the option importtsv.bulk.output. The behavior, in absence of the above property is to do inserts into the provided table; so it throws an exception if the table doesn't exist. When we use this option, importtsv tries to configure job by making an attempt to read all the regions of this table and then use the start keys of these regions as partition-markers for the TotalOrderPartitioner class. This usage eliminates the first start-row (which is always a EMPTY_BYTE_ARRAY), so even if we create a new table, it will be useless as for using it for configuring a job. For the case with LoadIncrementalHFile, the destination of the directory containing the hFiles is assumed to be in a specific format: tableName/cf1/listOfHFiles /cf2/listOfHFiles --where we are storing the hfiles for a specific column family in a separate sub-directory. If we do create a table based on the parameters given to the importtsv command, it will not be useful for the case of bulkload usecase as in the importtsv job, we dump hfiles based on rows; so all coulmn famlies for a specific row lands in one hfile. It would be great to know if we actually use this workflow: create HFiles from importtsv job, and then use bulkload to insert those HFiles in HBase table. I think we should change the javadoc. Please let me know if you have any questions; hbase-mapreduce use cases are exciting. ImportTsv does not check for table existence - Key: HBASE-5741 URL: https://issues.apache.org/jira/browse/HBASE-5741 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.4 Reporter: Clint Heath Assignee: Himanshu Vashishtha The usage statement for the importtsv command to hbase claims this: Note: if you do not use this option, then the target table must already exist in HBase (in reference to the importtsv.bulk.output command-line option) The truth is, the table must exist no matter what, importtsv cannot and will not create it for you. This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it: (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java) 305 HTable table = new HTable(conf, tableName); The HTable method signature in use there assumes the table exists and runs a meta scan on it: (From org.apache.hadoop.hbase.client.HTable.java) 142 * Creates an object to access a HBase table. ... 151 public HTable(Configuration conf, final String tableName) What we should do inside of createSubmittableJob is something similar to what the completebulkloads command would do: (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java) 690 boolean tableExists = this.doesTableExist(tableName); 691 if (!tableExists) this.createTable(tableName,dirPath); Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception: 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence
[ https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249443#comment-13249443 ] Himanshu Vashishtha commented on HBASE-5741: btw, by making an attempt to read all the regions of this table and then use the start keys of these regions, I mean get the start rows for all the existing regions of the table and then use them for job configuration. ImportTsv does not check for table existence - Key: HBASE-5741 URL: https://issues.apache.org/jira/browse/HBASE-5741 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 0.90.4 Reporter: Clint Heath Assignee: Himanshu Vashishtha The usage statement for the importtsv command to hbase claims this: Note: if you do not use this option, then the target table must already exist in HBase (in reference to the importtsv.bulk.output command-line option) The truth is, the table must exist no matter what, importtsv cannot and will not create it for you. This is the case because the createSubmittableJob method of ImportTsv does not even attempt to check if the table exists already, much less create it: (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java) 305 HTable table = new HTable(conf, tableName); The HTable method signature in use there assumes the table exists and runs a meta scan on it: (From org.apache.hadoop.hbase.client.HTable.java) 142 * Creates an object to access a HBase table. ... 151 public HTable(Configuration conf, final String tableName) What we should do inside of createSubmittableJob is something similar to what the completebulkloads command would do: (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java) 690 boolean tableExists = this.doesTableExist(tableName); 691 if (!tableExists) this.createTable(tableName,dirPath); Currently the docs are misleading, the table in fact must exist prior to running importtsv. We should check if it exists rather than assume it's already there and throw the below exception: 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: Encountered problems when prefetch META table: org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for table: myTable2, row=myTable2,,99 at org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150) ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira