[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

2012-04-19 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257579#comment-13257579
 ] 

Zhihong Yu commented on HBASE-5741:
---

Integrated to 0.94 as well.

 ImportTsv does not check for table existence 
 -

 Key: HBASE-5741
 URL: https://issues.apache.org/jira/browse/HBASE-5741
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.4
Reporter: Clint Heath
Assignee: Himanshu Vashishtha
 Fix For: 0.96.0, 0.94.1

 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, 
 HBase-5741.patch


 The usage statement for the importtsv command to hbase claims this:
 Note: if you do not use this option, then the target table must already 
 exist in HBase (in reference to the importtsv.bulk.output command-line 
 option)
 The truth is, the table must exist no matter what, importtsv cannot and will 
 not create it for you.
 This is the case because the createSubmittableJob method of ImportTsv does 
 not even attempt to check if the table exists already, much less create it:
 (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
 305 HTable table = new HTable(conf, tableName);
 The HTable method signature in use there assumes the table exists and runs a 
 meta scan on it:
 (From org.apache.hadoop.hbase.client.HTable.java)
 142 * Creates an object to access a HBase table.
 ...
 151 public HTable(Configuration conf, final String tableName)
 What we should do inside of createSubmittableJob is something similar to what 
 the completebulkloads command would do:
 (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
 690 boolean tableExists = this.doesTableExist(tableName);
 691 if (!tableExists) this.createTable(tableName,dirPath);
 Currently the docs are misleading, the table in fact must exist prior to 
 running importtsv. We should check if it exists rather than assume it's 
 already there and throw the below exception:
 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table: 
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: myTable2, row=myTable2,,99
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
 ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

2012-04-19 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257659#comment-13257659
 ] 

Hudson commented on HBASE-5741:
---

Integrated in HBase-0.94 #133 (See 
[https://builds.apache.org/job/HBase-0.94/133/])
HBASE-5741 ImportTsv does not check for table existence (Himanshu) 
(Revision 1328032)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportTsv.java


 ImportTsv does not check for table existence 
 -

 Key: HBASE-5741
 URL: https://issues.apache.org/jira/browse/HBASE-5741
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.4
Reporter: Clint Heath
Assignee: Himanshu Vashishtha
 Fix For: 0.96.0, 0.94.1

 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, 
 HBase-5741.patch


 The usage statement for the importtsv command to hbase claims this:
 Note: if you do not use this option, then the target table must already 
 exist in HBase (in reference to the importtsv.bulk.output command-line 
 option)
 The truth is, the table must exist no matter what, importtsv cannot and will 
 not create it for you.
 This is the case because the createSubmittableJob method of ImportTsv does 
 not even attempt to check if the table exists already, much less create it:
 (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
 305 HTable table = new HTable(conf, tableName);
 The HTable method signature in use there assumes the table exists and runs a 
 meta scan on it:
 (From org.apache.hadoop.hbase.client.HTable.java)
 142 * Creates an object to access a HBase table.
 ...
 151 public HTable(Configuration conf, final String tableName)
 What we should do inside of createSubmittableJob is something similar to what 
 the completebulkloads command would do:
 (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
 690 boolean tableExists = this.doesTableExist(tableName);
 691 if (!tableExists) this.createTable(tableName,dirPath);
 Currently the docs are misleading, the table in fact must exist prior to 
 running importtsv. We should check if it exists rather than assume it's 
 already there and throw the below exception:
 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table: 
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: myTable2, row=myTable2,,99
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
 ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

2012-04-19 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13257714#comment-13257714
 ] 

Hudson commented on HBASE-5741:
---

Integrated in HBase-0.94-security #17 (See 
[https://builds.apache.org/job/HBase-0.94-security/17/])
HBASE-5741 ImportTsv does not check for table existence (Himanshu) 
(Revision 1328032)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/branches/0.94/src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java
* 
/hbase/branches/0.94/src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportTsv.java


 ImportTsv does not check for table existence 
 -

 Key: HBASE-5741
 URL: https://issues.apache.org/jira/browse/HBASE-5741
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.4
Reporter: Clint Heath
Assignee: Himanshu Vashishtha
 Fix For: 0.96.0, 0.94.1

 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, 
 HBase-5741.patch


 The usage statement for the importtsv command to hbase claims this:
 Note: if you do not use this option, then the target table must already 
 exist in HBase (in reference to the importtsv.bulk.output command-line 
 option)
 The truth is, the table must exist no matter what, importtsv cannot and will 
 not create it for you.
 This is the case because the createSubmittableJob method of ImportTsv does 
 not even attempt to check if the table exists already, much less create it:
 (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
 305 HTable table = new HTable(conf, tableName);
 The HTable method signature in use there assumes the table exists and runs a 
 meta scan on it:
 (From org.apache.hadoop.hbase.client.HTable.java)
 142 * Creates an object to access a HBase table.
 ...
 151 public HTable(Configuration conf, final String tableName)
 What we should do inside of createSubmittableJob is something similar to what 
 the completebulkloads command would do:
 (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
 690 boolean tableExists = this.doesTableExist(tableName);
 691 if (!tableExists) this.createTable(tableName,dirPath);
 Currently the docs are misleading, the table in fact must exist prior to 
 running importtsv. We should check if it exists rather than assume it's 
 already there and throw the below exception:
 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table: 
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: myTable2, row=myTable2,,99
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
 ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

2012-04-18 Thread Hudson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256231#comment-13256231
 ] 

Hudson commented on HBASE-5741:
---

Integrated in HBase-TRUNK-security #174 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/174/])
HBASE-5741 ImportTsv does not check for table existence (Himanshu) 
(Revision 1327338)

 Result = FAILURE
tedyu : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportTsv.java


 ImportTsv does not check for table existence 
 -

 Key: HBASE-5741
 URL: https://issues.apache.org/jira/browse/HBASE-5741
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.4
Reporter: Clint Heath
Assignee: Himanshu Vashishtha
 Fix For: 0.96.0, 0.94.1

 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, 
 HBase-5741.patch


 The usage statement for the importtsv command to hbase claims this:
 Note: if you do not use this option, then the target table must already 
 exist in HBase (in reference to the importtsv.bulk.output command-line 
 option)
 The truth is, the table must exist no matter what, importtsv cannot and will 
 not create it for you.
 This is the case because the createSubmittableJob method of ImportTsv does 
 not even attempt to check if the table exists already, much less create it:
 (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
 305 HTable table = new HTable(conf, tableName);
 The HTable method signature in use there assumes the table exists and runs a 
 meta scan on it:
 (From org.apache.hadoop.hbase.client.HTable.java)
 142 * Creates an object to access a HBase table.
 ...
 151 public HTable(Configuration conf, final String tableName)
 What we should do inside of createSubmittableJob is something similar to what 
 the completebulkloads command would do:
 (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
 690 boolean tableExists = this.doesTableExist(tableName);
 691 if (!tableExists) this.createTable(tableName,dirPath);
 Currently the docs are misleading, the table in fact must exist prior to 
 running importtsv. We should check if it exists rather than assume it's 
 already there and throw the below exception:
 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table: 
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: myTable2, row=myTable2,,99
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
 ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

2012-04-17 Thread Clint Heath (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13255995#comment-13255995
 ] 

Clint Heath commented on HBASE-5741:


Lars, et al, thank you for your questions and comments.  I may be missing 
something, so please correct me if I'm wrong, but here's how I see the 
situation:

1) importtsv will auto-split the input data in region-server-sized Hfiles, 
based on hbase.hregion.max.filesize (because we use the HFileOutputFormat and 
it checks this parameter from the client configs).

2) completebulkload then distributes the resulting HFiles to the region 
servers, thereby populating your cluster with all the regions needed to host 
your data.  Alleviating all the memstore flushes, compactions, splitting, and 
other overhead associated with running billions of individual put()s to 
incrementally load your data.  This is why we recommend the bulk load process 
to initially populate HBase tables.

3) pre-splitting your table (based on rowkeys, as is the only mechanism) is 
useless in this scenario, because it will not take into account the size of 
data associated with each key.  The customer would literally have to walk 
through their data and determine how many Gigs of data are associated with each 
rowkey in order to preemptively split their tables in a bulk load 
scenario...and even then, HBase only takes those pre-splits as suggestions 
and if the data ends up needing to be split differently, that will happen 
automatically. So what purpose does pre-splitting serve in a bulk load?  Think 
initial load of massive amounts of data.

4) Lars had a good point about compression, yes, typically a customer *should* 
set up their table with compression first, but they typically don't know this 
and they can do it after the fact without consequence.

In conclusion, we provide this handy command-line tool to help our HBase 
customers (most of whom are complete newbies and don't know what they're doing) 
get their HBase tables up and running, yet we give them conflicting information 
about how to use the tool.  If we need to tell them to pre-create the table, 
then we should tell them so...clearly.  However, I think it is just simpler and 
more intuitive if the tool did that automatically, as the javadocs indicate it 
will.  Since region splits are determined internally by the HFileOutputFormat 
anyway (if they use the bulkload option), what's the harm?  This enables people 
to get their data loaded into HBase with the least amount of education, 
ramp-up, java development, data mining, etc., etc.

 ImportTsv does not check for table existence 
 -

 Key: HBASE-5741
 URL: https://issues.apache.org/jira/browse/HBASE-5741
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.4
Reporter: Clint Heath
Assignee: Himanshu Vashishtha
 Fix For: 0.96.0, 0.94.1

 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, 
 HBase-5741.patch


 The usage statement for the importtsv command to hbase claims this:
 Note: if you do not use this option, then the target table must already 
 exist in HBase (in reference to the importtsv.bulk.output command-line 
 option)
 The truth is, the table must exist no matter what, importtsv cannot and will 
 not create it for you.
 This is the case because the createSubmittableJob method of ImportTsv does 
 not even attempt to check if the table exists already, much less create it:
 (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
 305 HTable table = new HTable(conf, tableName);
 The HTable method signature in use there assumes the table exists and runs a 
 meta scan on it:
 (From org.apache.hadoop.hbase.client.HTable.java)
 142 * Creates an object to access a HBase table.
 ...
 151 public HTable(Configuration conf, final String tableName)
 What we should do inside of createSubmittableJob is something similar to what 
 the completebulkloads command would do:
 (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
 690 boolean tableExists = this.doesTableExist(tableName);
 691 if (!tableExists) this.createTable(tableName,dirPath);
 Currently the docs are misleading, the table in fact must exist prior to 
 running importtsv. We should check if it exists rather than assume it's 
 already there and throw the below exception:
 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table: 
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: myTable2, row=myTable2,,99
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
 ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, 

[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

2012-04-17 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256083#comment-13256083
 ] 

Lars Hofhansl commented on HBASE-5741:
--

Thanks Clint. No harm, I just think it will hide complexity that should not be 
hidden. I also don't think it's too much to ask a user to create the table 
first (but we should definitely fix the Javadoc in that case).

That all said, I am +-0 on this. It's a simple change, and v3 looks good.

 ImportTsv does not check for table existence 
 -

 Key: HBASE-5741
 URL: https://issues.apache.org/jira/browse/HBASE-5741
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.4
Reporter: Clint Heath
Assignee: Himanshu Vashishtha
 Fix For: 0.96.0, 0.94.1

 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, 
 HBase-5741.patch


 The usage statement for the importtsv command to hbase claims this:
 Note: if you do not use this option, then the target table must already 
 exist in HBase (in reference to the importtsv.bulk.output command-line 
 option)
 The truth is, the table must exist no matter what, importtsv cannot and will 
 not create it for you.
 This is the case because the createSubmittableJob method of ImportTsv does 
 not even attempt to check if the table exists already, much less create it:
 (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
 305 HTable table = new HTable(conf, tableName);
 The HTable method signature in use there assumes the table exists and runs a 
 meta scan on it:
 (From org.apache.hadoop.hbase.client.HTable.java)
 142 * Creates an object to access a HBase table.
 ...
 151 public HTable(Configuration conf, final String tableName)
 What we should do inside of createSubmittableJob is something similar to what 
 the completebulkloads command would do:
 (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
 690 boolean tableExists = this.doesTableExist(tableName);
 691 if (!tableExists) this.createTable(tableName,dirPath);
 Currently the docs are misleading, the table in fact must exist prior to 
 running importtsv. We should check if it exists rather than assume it's 
 already there and throw the below exception:
 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table: 
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: myTable2, row=myTable2,,99
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
 ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

2012-04-17 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13256107#comment-13256107
 ] 

Zhihong Yu commented on HBASE-5741:
---

Integrated to trunk.

Thanks for the patch Himanshu.

Will wait for 0.94.0 to come out before integrating to 0.94

 ImportTsv does not check for table existence 
 -

 Key: HBASE-5741
 URL: https://issues.apache.org/jira/browse/HBASE-5741
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.4
Reporter: Clint Heath
Assignee: Himanshu Vashishtha
 Fix For: 0.96.0, 0.94.1

 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, 
 HBase-5741.patch


 The usage statement for the importtsv command to hbase claims this:
 Note: if you do not use this option, then the target table must already 
 exist in HBase (in reference to the importtsv.bulk.output command-line 
 option)
 The truth is, the table must exist no matter what, importtsv cannot and will 
 not create it for you.
 This is the case because the createSubmittableJob method of ImportTsv does 
 not even attempt to check if the table exists already, much less create it:
 (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
 305 HTable table = new HTable(conf, tableName);
 The HTable method signature in use there assumes the table exists and runs a 
 meta scan on it:
 (From org.apache.hadoop.hbase.client.HTable.java)
 142 * Creates an object to access a HBase table.
 ...
 151 public HTable(Configuration conf, final String tableName)
 What we should do inside of createSubmittableJob is something similar to what 
 the completebulkloads command would do:
 (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
 690 boolean tableExists = this.doesTableExist(tableName);
 691 if (!tableExists) this.createTable(tableName,dirPath);
 Currently the docs are misleading, the table in fact must exist prior to 
 running importtsv. We should check if it exists rather than assume it's 
 already there and throw the below exception:
 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table: 
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: myTable2, row=myTable2,,99
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
 ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

2012-04-13 Thread Himanshu Vashishtha (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13253637#comment-13253637
 ] 

Himanshu Vashishtha commented on HBASE-5741:


I am waiting for Clint to give me some numbers of such use cases to make the 
cut; (Apparently, he is on vacation these days and will be back this Monday)
Thanks.

 ImportTsv does not check for table existence 
 -

 Key: HBASE-5741
 URL: https://issues.apache.org/jira/browse/HBASE-5741
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.4
Reporter: Clint Heath
Assignee: Himanshu Vashishtha
 Fix For: 0.96.0, 0.94.1

 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, 
 HBase-5741.patch


 The usage statement for the importtsv command to hbase claims this:
 Note: if you do not use this option, then the target table must already 
 exist in HBase (in reference to the importtsv.bulk.output command-line 
 option)
 The truth is, the table must exist no matter what, importtsv cannot and will 
 not create it for you.
 This is the case because the createSubmittableJob method of ImportTsv does 
 not even attempt to check if the table exists already, much less create it:
 (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
 305 HTable table = new HTable(conf, tableName);
 The HTable method signature in use there assumes the table exists and runs a 
 meta scan on it:
 (From org.apache.hadoop.hbase.client.HTable.java)
 142 * Creates an object to access a HBase table.
 ...
 151 public HTable(Configuration conf, final String tableName)
 What we should do inside of createSubmittableJob is something similar to what 
 the completebulkloads command would do:
 (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
 690 boolean tableExists = this.doesTableExist(tableName);
 691 if (!tableExists) this.createTable(tableName,dirPath);
 Currently the docs are misleading, the table in fact must exist prior to 
 running importtsv. We should check if it exists rather than assume it's 
 already there and throw the below exception:
 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table: 
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: myTable2, row=myTable2,,99
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
 ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

2012-04-11 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251851#comment-13251851
 ] 

jirapos...@reviews.apache.org commented on HBASE-5741:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4700/
---

(Updated 2012-04-11 19:01:39.660787)


Review request for hbase.


Changes
---

Changes done as per Ted.


Summary
---

There is a bulk output option in the importtsv workload. It outputs the HFiles 
in a user defined directory. The current code assumes that a table with its 
name equal to the given output directory exists, and throws an exception 
otherwise. Here is a patch for creating a table in case it doesn't exist.


This addresses bug HBase-5741.
https://issues.apache.org/jira/browse/HBase-5741


Diffs (updated)
-

  src/main/java/org/apache/hadoop/hbase/mapreduce/ImportTsv.java ab22fc4 
  src/test/java/org/apache/hadoop/hbase/mapreduce/TestImportTsv.java ac30a62 

Diff: https://reviews.apache.org/r/4700/diff


Testing
---

Added a new test for bulkoutput; All importtsv tests pass.


Thanks,

Himanshu



 ImportTsv does not check for table existence 
 -

 Key: HBASE-5741
 URL: https://issues.apache.org/jira/browse/HBASE-5741
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.4
Reporter: Clint Heath
Assignee: Himanshu Vashishtha
 Attachments: HBase-5741.patch


 The usage statement for the importtsv command to hbase claims this:
 Note: if you do not use this option, then the target table must already 
 exist in HBase (in reference to the importtsv.bulk.output command-line 
 option)
 The truth is, the table must exist no matter what, importtsv cannot and will 
 not create it for you.
 This is the case because the createSubmittableJob method of ImportTsv does 
 not even attempt to check if the table exists already, much less create it:
 (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
 305 HTable table = new HTable(conf, tableName);
 The HTable method signature in use there assumes the table exists and runs a 
 meta scan on it:
 (From org.apache.hadoop.hbase.client.HTable.java)
 142 * Creates an object to access a HBase table.
 ...
 151 public HTable(Configuration conf, final String tableName)
 What we should do inside of createSubmittableJob is something similar to what 
 the completebulkloads command would do:
 (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
 690 boolean tableExists = this.doesTableExist(tableName);
 691 if (!tableExists) this.createTable(tableName,dirPath);
 Currently the docs are misleading, the table in fact must exist prior to 
 running importtsv. We should check if it exists rather than assume it's 
 already there and throw the below exception:
 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table: 
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: myTable2, row=myTable2,,99
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
 ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

2012-04-11 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252025#comment-13252025
 ] 

Hadoop QA commented on HBASE-5741:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12522338/5741-94.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1486//console

This message is automatically generated.

 ImportTsv does not check for table existence 
 -

 Key: HBASE-5741
 URL: https://issues.apache.org/jira/browse/HBASE-5741
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.4
Reporter: Clint Heath
Assignee: Himanshu Vashishtha
 Fix For: 0.94.0, 0.96.0

 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, 
 HBase-5741.patch


 The usage statement for the importtsv command to hbase claims this:
 Note: if you do not use this option, then the target table must already 
 exist in HBase (in reference to the importtsv.bulk.output command-line 
 option)
 The truth is, the table must exist no matter what, importtsv cannot and will 
 not create it for you.
 This is the case because the createSubmittableJob method of ImportTsv does 
 not even attempt to check if the table exists already, much less create it:
 (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
 305 HTable table = new HTable(conf, tableName);
 The HTable method signature in use there assumes the table exists and runs a 
 meta scan on it:
 (From org.apache.hadoop.hbase.client.HTable.java)
 142 * Creates an object to access a HBase table.
 ...
 151 public HTable(Configuration conf, final String tableName)
 What we should do inside of createSubmittableJob is something similar to what 
 the completebulkloads command would do:
 (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
 690 boolean tableExists = this.doesTableExist(tableName);
 691 if (!tableExists) this.createTable(tableName,dirPath);
 Currently the docs are misleading, the table in fact must exist prior to 
 running importtsv. We should check if it exists rather than assume it's 
 already there and throw the below exception:
 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table: 
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: myTable2, row=myTable2,,99
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
 ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

2012-04-11 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252036#comment-13252036
 ] 

Hadoop QA commented on HBASE-5741:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12522330/5741-v3.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 2 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1485//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1485//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1485//console

This message is automatically generated.

 ImportTsv does not check for table existence 
 -

 Key: HBASE-5741
 URL: https://issues.apache.org/jira/browse/HBASE-5741
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.4
Reporter: Clint Heath
Assignee: Himanshu Vashishtha
 Fix For: 0.94.0, 0.96.0

 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, 
 HBase-5741.patch


 The usage statement for the importtsv command to hbase claims this:
 Note: if you do not use this option, then the target table must already 
 exist in HBase (in reference to the importtsv.bulk.output command-line 
 option)
 The truth is, the table must exist no matter what, importtsv cannot and will 
 not create it for you.
 This is the case because the createSubmittableJob method of ImportTsv does 
 not even attempt to check if the table exists already, much less create it:
 (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
 305 HTable table = new HTable(conf, tableName);
 The HTable method signature in use there assumes the table exists and runs a 
 meta scan on it:
 (From org.apache.hadoop.hbase.client.HTable.java)
 142 * Creates an object to access a HBase table.
 ...
 151 public HTable(Configuration conf, final String tableName)
 What we should do inside of createSubmittableJob is something similar to what 
 the completebulkloads command would do:
 (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
 690 boolean tableExists = this.doesTableExist(tableName);
 691 if (!tableExists) this.createTable(tableName,dirPath);
 Currently the docs are misleading, the table in fact must exist prior to 
 running importtsv. We should check if it exists rather than assume it's 
 already there and throw the below exception:
 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table: 
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: myTable2, row=myTable2,,99
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
 ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

2012-04-11 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252038#comment-13252038
 ] 

Zhihong Yu commented on HBASE-5741:
---

I would integrate patches to trunk and 0.94 tomorrow morning if there is no 
objection.

 ImportTsv does not check for table existence 
 -

 Key: HBASE-5741
 URL: https://issues.apache.org/jira/browse/HBASE-5741
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.4
Reporter: Clint Heath
Assignee: Himanshu Vashishtha
 Fix For: 0.94.0, 0.96.0

 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, 
 HBase-5741.patch


 The usage statement for the importtsv command to hbase claims this:
 Note: if you do not use this option, then the target table must already 
 exist in HBase (in reference to the importtsv.bulk.output command-line 
 option)
 The truth is, the table must exist no matter what, importtsv cannot and will 
 not create it for you.
 This is the case because the createSubmittableJob method of ImportTsv does 
 not even attempt to check if the table exists already, much less create it:
 (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
 305 HTable table = new HTable(conf, tableName);
 The HTable method signature in use there assumes the table exists and runs a 
 meta scan on it:
 (From org.apache.hadoop.hbase.client.HTable.java)
 142 * Creates an object to access a HBase table.
 ...
 151 public HTable(Configuration conf, final String tableName)
 What we should do inside of createSubmittableJob is something similar to what 
 the completebulkloads command would do:
 (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
 690 boolean tableExists = this.doesTableExist(tableName);
 691 if (!tableExists) this.createTable(tableName,dirPath);
 Currently the docs are misleading, the table in fact must exist prior to 
 running importtsv. We should check if it exists rather than assume it's 
 already there and throw the below exception:
 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table: 
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: myTable2, row=myTable2,,99
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
 ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

2012-04-11 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252064#comment-13252064
 ] 

Lars Hofhansl commented on HBASE-5741:
--

Sorry for chiming in late here, but do we actually want ImportTsv to create the 
table for us?
Personally I'd rather have it fail, which forces me to think about how I want 
create the table (pre split, compression, etc).


 ImportTsv does not check for table existence 
 -

 Key: HBASE-5741
 URL: https://issues.apache.org/jira/browse/HBASE-5741
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.4
Reporter: Clint Heath
Assignee: Himanshu Vashishtha
 Fix For: 0.94.0, 0.96.0

 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, 
 HBase-5741.patch


 The usage statement for the importtsv command to hbase claims this:
 Note: if you do not use this option, then the target table must already 
 exist in HBase (in reference to the importtsv.bulk.output command-line 
 option)
 The truth is, the table must exist no matter what, importtsv cannot and will 
 not create it for you.
 This is the case because the createSubmittableJob method of ImportTsv does 
 not even attempt to check if the table exists already, much less create it:
 (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
 305 HTable table = new HTable(conf, tableName);
 The HTable method signature in use there assumes the table exists and runs a 
 meta scan on it:
 (From org.apache.hadoop.hbase.client.HTable.java)
 142 * Creates an object to access a HBase table.
 ...
 151 public HTable(Configuration conf, final String tableName)
 What we should do inside of createSubmittableJob is something similar to what 
 the completebulkloads command would do:
 (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
 690 boolean tableExists = this.doesTableExist(tableName);
 691 if (!tableExists) this.createTable(tableName,dirPath);
 Currently the docs are misleading, the table in fact must exist prior to 
 running importtsv. We should check if it exists rather than assume it's 
 already there and throw the below exception:
 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table: 
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: myTable2, row=myTable2,,99
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
 ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

2012-04-11 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252071#comment-13252071
 ] 

Zhihong Yu commented on HBASE-5741:
---

How about adding an option to ImportTsv, default to false, allowing user to 
auto-create the table ?

 ImportTsv does not check for table existence 
 -

 Key: HBASE-5741
 URL: https://issues.apache.org/jira/browse/HBASE-5741
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.4
Reporter: Clint Heath
Assignee: Himanshu Vashishtha
 Fix For: 0.94.0, 0.96.0

 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, 
 HBase-5741.patch


 The usage statement for the importtsv command to hbase claims this:
 Note: if you do not use this option, then the target table must already 
 exist in HBase (in reference to the importtsv.bulk.output command-line 
 option)
 The truth is, the table must exist no matter what, importtsv cannot and will 
 not create it for you.
 This is the case because the createSubmittableJob method of ImportTsv does 
 not even attempt to check if the table exists already, much less create it:
 (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
 305 HTable table = new HTable(conf, tableName);
 The HTable method signature in use there assumes the table exists and runs a 
 meta scan on it:
 (From org.apache.hadoop.hbase.client.HTable.java)
 142 * Creates an object to access a HBase table.
 ...
 151 public HTable(Configuration conf, final String tableName)
 What we should do inside of createSubmittableJob is something similar to what 
 the completebulkloads command would do:
 (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
 690 boolean tableExists = this.doesTableExist(tableName);
 691 if (!tableExists) this.createTable(tableName,dirPath);
 Currently the docs are misleading, the table in fact must exist prior to 
 running importtsv. We should check if it exists rather than assume it's 
 already there and throw the below exception:
 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table: 
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: myTable2, row=myTable2,,99
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
 ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

2012-04-11 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252076#comment-13252076
 ] 

Lars Hofhansl commented on HBASE-5741:
--

Import does not do that either. At least we should be consistent between the 
various importing tools.
It seems overkill to burden these tools with that.

That said, I am not opposed, just questioning the usefulness.

 ImportTsv does not check for table existence 
 -

 Key: HBASE-5741
 URL: https://issues.apache.org/jira/browse/HBASE-5741
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.4
Reporter: Clint Heath
Assignee: Himanshu Vashishtha
 Fix For: 0.94.0, 0.96.0

 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, 
 HBase-5741.patch


 The usage statement for the importtsv command to hbase claims this:
 Note: if you do not use this option, then the target table must already 
 exist in HBase (in reference to the importtsv.bulk.output command-line 
 option)
 The truth is, the table must exist no matter what, importtsv cannot and will 
 not create it for you.
 This is the case because the createSubmittableJob method of ImportTsv does 
 not even attempt to check if the table exists already, much less create it:
 (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
 305 HTable table = new HTable(conf, tableName);
 The HTable method signature in use there assumes the table exists and runs a 
 meta scan on it:
 (From org.apache.hadoop.hbase.client.HTable.java)
 142 * Creates an object to access a HBase table.
 ...
 151 public HTable(Configuration conf, final String tableName)
 What we should do inside of createSubmittableJob is something similar to what 
 the completebulkloads command would do:
 (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
 690 boolean tableExists = this.doesTableExist(tableName);
 691 if (!tableExists) this.createTable(tableName,dirPath);
 Currently the docs are misleading, the table in fact must exist prior to 
 running importtsv. We should check if it exists rather than assume it's 
 already there and throw the below exception:
 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table: 
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: myTable2, row=myTable2,,99
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
 ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

2012-04-11 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252081#comment-13252081
 ] 

Hadoop QA commented on HBASE-5741:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12522343/5741-v3.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 2 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1487//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1487//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1487//console

This message is automatically generated.

 ImportTsv does not check for table existence 
 -

 Key: HBASE-5741
 URL: https://issues.apache.org/jira/browse/HBASE-5741
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.4
Reporter: Clint Heath
Assignee: Himanshu Vashishtha
 Fix For: 0.94.0, 0.96.0

 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, 
 HBase-5741.patch


 The usage statement for the importtsv command to hbase claims this:
 Note: if you do not use this option, then the target table must already 
 exist in HBase (in reference to the importtsv.bulk.output command-line 
 option)
 The truth is, the table must exist no matter what, importtsv cannot and will 
 not create it for you.
 This is the case because the createSubmittableJob method of ImportTsv does 
 not even attempt to check if the table exists already, much less create it:
 (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
 305 HTable table = new HTable(conf, tableName);
 The HTable method signature in use there assumes the table exists and runs a 
 meta scan on it:
 (From org.apache.hadoop.hbase.client.HTable.java)
 142 * Creates an object to access a HBase table.
 ...
 151 public HTable(Configuration conf, final String tableName)
 What we should do inside of createSubmittableJob is something similar to what 
 the completebulkloads command would do:
 (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
 690 boolean tableExists = this.doesTableExist(tableName);
 691 if (!tableExists) this.createTable(tableName,dirPath);
 Currently the docs are misleading, the table in fact must exist prior to 
 running importtsv. We should check if it exists rather than assume it's 
 already there and throw the below exception:
 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table: 
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: myTable2, row=myTable2,,99
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
 ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

2012-04-11 Thread Himanshu Vashishtha (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252087#comment-13252087
 ] 

Himanshu Vashishtha commented on HBASE-5741:


As Clint mentioned, there are some workloads that use ImportTsv to create 
HFiles, which later on are used in the bulk import method : my motivation for 
the patch.

What you suggest Lars, failing the importtsv, or adding a similar thing in the 
Import tool too? I think later makes it consistent.

 ImportTsv does not check for table existence 
 -

 Key: HBASE-5741
 URL: https://issues.apache.org/jira/browse/HBASE-5741
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.4
Reporter: Clint Heath
Assignee: Himanshu Vashishtha
 Fix For: 0.94.0, 0.96.0

 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, 
 HBase-5741.patch


 The usage statement for the importtsv command to hbase claims this:
 Note: if you do not use this option, then the target table must already 
 exist in HBase (in reference to the importtsv.bulk.output command-line 
 option)
 The truth is, the table must exist no matter what, importtsv cannot and will 
 not create it for you.
 This is the case because the createSubmittableJob method of ImportTsv does 
 not even attempt to check if the table exists already, much less create it:
 (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
 305 HTable table = new HTable(conf, tableName);
 The HTable method signature in use there assumes the table exists and runs a 
 meta scan on it:
 (From org.apache.hadoop.hbase.client.HTable.java)
 142 * Creates an object to access a HBase table.
 ...
 151 public HTable(Configuration conf, final String tableName)
 What we should do inside of createSubmittableJob is something similar to what 
 the completebulkloads command would do:
 (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
 690 boolean tableExists = this.doesTableExist(tableName);
 691 if (!tableExists) this.createTable(tableName,dirPath);
 Currently the docs are misleading, the table in fact must exist prior to 
 running importtsv. We should check if it exists rather than assume it's 
 already there and throw the below exception:
 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table: 
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: myTable2, row=myTable2,,99
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
 ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

2012-04-11 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13252095#comment-13252095
 ] 

Lars Hofhansl commented on HBASE-5741:
--

Even the created HFileOutputFormat use the table in HBase to find out about 
splits. We will rarely bulk import into an empty table that is not 
pre-split...? (I'm assuming here, you folks at Cloudera see many more use cases 
that I do)

If you believe strongly that we should add this, let's do it :)
We can do Import in a separate patch, or not do that as we see fit.


 ImportTsv does not check for table existence 
 -

 Key: HBASE-5741
 URL: https://issues.apache.org/jira/browse/HBASE-5741
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.4
Reporter: Clint Heath
Assignee: Himanshu Vashishtha
 Fix For: 0.94.0, 0.96.0

 Attachments: 5741-94.txt, 5741-v3.txt, HBase-5741-v2.patch, 
 HBase-5741.patch


 The usage statement for the importtsv command to hbase claims this:
 Note: if you do not use this option, then the target table must already 
 exist in HBase (in reference to the importtsv.bulk.output command-line 
 option)
 The truth is, the table must exist no matter what, importtsv cannot and will 
 not create it for you.
 This is the case because the createSubmittableJob method of ImportTsv does 
 not even attempt to check if the table exists already, much less create it:
 (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
 305 HTable table = new HTable(conf, tableName);
 The HTable method signature in use there assumes the table exists and runs a 
 meta scan on it:
 (From org.apache.hadoop.hbase.client.HTable.java)
 142 * Creates an object to access a HBase table.
 ...
 151 public HTable(Configuration conf, final String tableName)
 What we should do inside of createSubmittableJob is something similar to what 
 the completebulkloads command would do:
 (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
 690 boolean tableExists = this.doesTableExist(tableName);
 691 if (!tableExists) this.createTable(tableName,dirPath);
 Currently the docs are misleading, the table in fact must exist prior to 
 running importtsv. We should check if it exists rather than assume it's 
 already there and throw the below exception:
 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table: 
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: myTable2, row=myTable2,,99
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
 ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

2012-04-10 Thread Himanshu Vashishtha (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250813#comment-13250813
 ] 

Himanshu Vashishtha commented on HBASE-5741:


You are right. I am on it now. Thanks.

 ImportTsv does not check for table existence 
 -

 Key: HBASE-5741
 URL: https://issues.apache.org/jira/browse/HBASE-5741
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.4
Reporter: Clint Heath
Assignee: Himanshu Vashishtha

 The usage statement for the importtsv command to hbase claims this:
 Note: if you do not use this option, then the target table must already 
 exist in HBase (in reference to the importtsv.bulk.output command-line 
 option)
 The truth is, the table must exist no matter what, importtsv cannot and will 
 not create it for you.
 This is the case because the createSubmittableJob method of ImportTsv does 
 not even attempt to check if the table exists already, much less create it:
 (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
 305 HTable table = new HTable(conf, tableName);
 The HTable method signature in use there assumes the table exists and runs a 
 meta scan on it:
 (From org.apache.hadoop.hbase.client.HTable.java)
 142 * Creates an object to access a HBase table.
 ...
 151 public HTable(Configuration conf, final String tableName)
 What we should do inside of createSubmittableJob is something similar to what 
 the completebulkloads command would do:
 (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
 690 boolean tableExists = this.doesTableExist(tableName);
 691 if (!tableExists) this.createTable(tableName,dirPath);
 Currently the docs are misleading, the table in fact must exist prior to 
 running importtsv. We should check if it exists rather than assume it's 
 already there and throw the below exception:
 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table: 
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: myTable2, row=myTable2,,99
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
 ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

2012-04-09 Thread Clint Heath (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249966#comment-13249966
 ] 

Clint Heath commented on HBASE-5741:


Thanks for the feedback.  We I have customers who are using the workflow you 
mentioned.  This is why I discovered the confusing javadoc reference.

 ImportTsv does not check for table existence 
 -

 Key: HBASE-5741
 URL: https://issues.apache.org/jira/browse/HBASE-5741
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.4
Reporter: Clint Heath
Assignee: Himanshu Vashishtha

 The usage statement for the importtsv command to hbase claims this:
 Note: if you do not use this option, then the target table must already 
 exist in HBase (in reference to the importtsv.bulk.output command-line 
 option)
 The truth is, the table must exist no matter what, importtsv cannot and will 
 not create it for you.
 This is the case because the createSubmittableJob method of ImportTsv does 
 not even attempt to check if the table exists already, much less create it:
 (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
 305 HTable table = new HTable(conf, tableName);
 The HTable method signature in use there assumes the table exists and runs a 
 meta scan on it:
 (From org.apache.hadoop.hbase.client.HTable.java)
 142 * Creates an object to access a HBase table.
 ...
 151 public HTable(Configuration conf, final String tableName)
 What we should do inside of createSubmittableJob is something similar to what 
 the completebulkloads command would do:
 (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
 690 boolean tableExists = this.doesTableExist(tableName);
 691 if (!tableExists) this.createTable(tableName,dirPath);
 Currently the docs are misleading, the table in fact must exist prior to 
 running importtsv. We should check if it exists rather than assume it's 
 already there and throw the below exception:
 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table: 
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: myTable2, row=myTable2,,99
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
 ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

2012-04-07 Thread Himanshu Vashishtha (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249442#comment-13249442
 ] 

Himanshu Vashishtha commented on HBASE-5741:


Yes, the javadoc says it half correct when it says that the table must exist 
in HBase if you are not using the option importtsv.bulk.output.
The behavior, in absence of the above property is to do inserts into the 
provided table; so it throws an exception if the table doesn't exist.

When we use this option, importtsv tries to configure job by making an attempt 
to read all the regions of this table and then use the start keys of these 
regions as partition-markers for the TotalOrderPartitioner class. This usage 
eliminates the first start-row (which is always a EMPTY_BYTE_ARRAY), so even if 
we create a new table, it will be useless as for using it for configuring a 
job. 

For the case with LoadIncrementalHFile, the destination of the directory 
containing the hFiles is assumed to be in a specific format:
tableName/cf1/listOfHFiles
   /cf2/listOfHFiles

 --where we are storing the hfiles for a specific column family in a separate 
sub-directory.

If we do create a table based on the parameters given to the importtsv command, 
it will not be useful for the case of bulkload usecase as in the importtsv job, 
we dump hfiles based on rows; so all coulmn famlies for a specific row lands in 
one hfile. 

It would be great to know if we actually use this workflow: create HFiles from 
importtsv job, and then use bulkload to insert those HFiles in HBase table.

I think we should change the javadoc.

Please let me know if you have any questions; hbase-mapreduce use cases are 
exciting.

 ImportTsv does not check for table existence 
 -

 Key: HBASE-5741
 URL: https://issues.apache.org/jira/browse/HBASE-5741
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.4
Reporter: Clint Heath
Assignee: Himanshu Vashishtha

 The usage statement for the importtsv command to hbase claims this:
 Note: if you do not use this option, then the target table must already 
 exist in HBase (in reference to the importtsv.bulk.output command-line 
 option)
 The truth is, the table must exist no matter what, importtsv cannot and will 
 not create it for you.
 This is the case because the createSubmittableJob method of ImportTsv does 
 not even attempt to check if the table exists already, much less create it:
 (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
 305 HTable table = new HTable(conf, tableName);
 The HTable method signature in use there assumes the table exists and runs a 
 meta scan on it:
 (From org.apache.hadoop.hbase.client.HTable.java)
 142 * Creates an object to access a HBase table.
 ...
 151 public HTable(Configuration conf, final String tableName)
 What we should do inside of createSubmittableJob is something similar to what 
 the completebulkloads command would do:
 (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
 690 boolean tableExists = this.doesTableExist(tableName);
 691 if (!tableExists) this.createTable(tableName,dirPath);
 Currently the docs are misleading, the table in fact must exist prior to 
 running importtsv. We should check if it exists rather than assume it's 
 already there and throw the below exception:
 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table: 
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: myTable2, row=myTable2,,99
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
 ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5741) ImportTsv does not check for table existence

2012-04-07 Thread Himanshu Vashishtha (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249443#comment-13249443
 ] 

Himanshu Vashishtha commented on HBASE-5741:


btw, by making an attempt to read all the regions of this table and then use 
the start keys of these regions, I mean get the start rows for all the 
existing regions of the table and then use them for job configuration.

 ImportTsv does not check for table existence 
 -

 Key: HBASE-5741
 URL: https://issues.apache.org/jira/browse/HBASE-5741
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Affects Versions: 0.90.4
Reporter: Clint Heath
Assignee: Himanshu Vashishtha

 The usage statement for the importtsv command to hbase claims this:
 Note: if you do not use this option, then the target table must already 
 exist in HBase (in reference to the importtsv.bulk.output command-line 
 option)
 The truth is, the table must exist no matter what, importtsv cannot and will 
 not create it for you.
 This is the case because the createSubmittableJob method of ImportTsv does 
 not even attempt to check if the table exists already, much less create it:
 (From org.apache.hadoop.hbase.mapreduce.ImportTsv.java)
 305 HTable table = new HTable(conf, tableName);
 The HTable method signature in use there assumes the table exists and runs a 
 meta scan on it:
 (From org.apache.hadoop.hbase.client.HTable.java)
 142 * Creates an object to access a HBase table.
 ...
 151 public HTable(Configuration conf, final String tableName)
 What we should do inside of createSubmittableJob is something similar to what 
 the completebulkloads command would do:
 (Taken from org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles.java)
 690 boolean tableExists = this.doesTableExist(tableName);
 691 if (!tableExists) this.createTable(tableName,dirPath);
 Currently the docs are misleading, the table in fact must exist prior to 
 running importtsv. We should check if it exists rather than assume it's 
 already there and throw the below exception:
 12/03/14 17:15:42 WARN client.HConnectionManager$HConnectionImplementation: 
 Encountered problems when prefetch META table: 
 org.apache.hadoop.hbase.TableNotFoundException: Cannot find row in .META. for 
 table: myTable2, row=myTable2,,99
   at 
 org.apache.hadoop.hbase.client.MetaScanner.metaScan(MetaScanner.java:150)
 ...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira