[jira] [Commented] (HBASE-5604) M/R tools to replay WAL files

2012-04-10 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250959#comment-13250959
 ] 

Lars Hofhansl commented on HBASE-5604:
--

* Let's not overkill on the exceptions. This is an inner class of WALPlayer, 
WALPlayer will always pass the correct arguments. Maybe I'll make the mapper 
classed private and remove all exception handling.
* Thought about using SimpleDateFormat, then punted. I guess you're right, 
should make it bit more user friendly.
* The TODO was left over. Not sure that even makes sense.


 M/R tools to replay WAL files
 -

 Key: HBASE-5604
 URL: https://issues.apache.org/jira/browse/HBASE-5604
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 5604-v4.txt, 5604-v6.txt, 5604-v7.txt, 5604-v8.txt, 
 HLog-5604-v3.txt


 Just an idea I had. Might be useful for restore of a backup using the HLogs.
 This could an M/R (with a mapper per HLog file).
 The tool would get a timerange and a (set of) table(s). We'd pick the right 
 HLogs based on time before the M/R job is started and then have a mapper per 
 HLog file.
 The mapper would then go through the HLog, filter all WALEdits that didn't 
 fit into the time range or are not any of the tables and then uses 
 HFileOutputFormat to generate HFiles.
 Would need to indicate the splits we want, probably from a live table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5604) M/R tools to replay WAL files

2012-04-10 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250974#comment-13250974
 ] 

stack commented on HBASE-5604:
--

bq. Thought about using SimpleDateFormat, then punted. I guess you're right, 
should make it bit more user friendly.

Passing dates instead of ms will be a pain doing the parse.  Outputting a date 
instead of ms will be of little use when hbase only shows ms.



 M/R tools to replay WAL files
 -

 Key: HBASE-5604
 URL: https://issues.apache.org/jira/browse/HBASE-5604
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 5604-v4.txt, 5604-v6.txt, 5604-v7.txt, 5604-v8.txt, 
 HLog-5604-v3.txt


 Just an idea I had. Might be useful for restore of a backup using the HLogs.
 This could an M/R (with a mapper per HLog file).
 The tool would get a timerange and a (set of) table(s). We'd pick the right 
 HLogs based on time before the M/R job is started and then have a mapper per 
 HLog file.
 The mapper would then go through the HLog, filter all WALEdits that didn't 
 fit into the time range or are not any of the tables and then uses 
 HFileOutputFormat to generate HFiles.
 Would need to indicate the splits we want, probably from a live table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5604) M/R tools to replay WAL files

2012-04-10 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251007#comment-13251007
 ] 

stack commented on HBASE-5604:
--

bq. The parsing would be done with SimpleDateFormat.

If user entered data exactly right (they probably will have looked at hbase, 
done math w/ ms's, then converted to date to pass your tool only to have it 
convert back to ms).


 M/R tools to replay WAL files
 -

 Key: HBASE-5604
 URL: https://issues.apache.org/jira/browse/HBASE-5604
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 5604-v4.txt, 5604-v6.txt, 5604-v7.txt, 5604-v8.txt, 
 HLog-5604-v3.txt


 Just an idea I had. Might be useful for restore of a backup using the HLogs.
 This could an M/R (with a mapper per HLog file).
 The tool would get a timerange and a (set of) table(s). We'd pick the right 
 HLogs based on time before the M/R job is started and then have a mapper per 
 HLog file.
 The mapper would then go through the HLog, filter all WALEdits that didn't 
 fit into the time range or are not any of the tables and then uses 
 HFileOutputFormat to generate HFiles.
 Would need to indicate the splits we want, probably from a live table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5604) M/R tools to replay WAL files

2012-04-10 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251053#comment-13251053
 ] 

Lars Hofhansl commented on HBASE-5604:
--

@Stack: Yeah, probably. You'd vote for leaving it the way it is?

 M/R tools to replay WAL files
 -

 Key: HBASE-5604
 URL: https://issues.apache.org/jira/browse/HBASE-5604
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 5604-v4.txt, 5604-v6.txt, 5604-v7.txt, 5604-v8.txt, 
 HLog-5604-v3.txt


 Just an idea I had. Might be useful for restore of a backup using the HLogs.
 This could an M/R (with a mapper per HLog file).
 The tool would get a timerange and a (set of) table(s). We'd pick the right 
 HLogs based on time before the M/R job is started and then have a mapper per 
 HLog file.
 The mapper would then go through the HLog, filter all WALEdits that didn't 
 fit into the time range or are not any of the tables and then uses 
 HFileOutputFormat to generate HFiles.
 Would need to indicate the splits we want, probably from a live table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5604) M/R tools to replay WAL files

2012-04-10 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13251117#comment-13251117
 ] 

Hadoop QA commented on HBASE-5604:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12522168/5604-v9.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 3 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1470//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1470//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1470//console

This message is automatically generated.

 M/R tools to replay WAL files
 -

 Key: HBASE-5604
 URL: https://issues.apache.org/jira/browse/HBASE-5604
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 5604-v4.txt, 5604-v6.txt, 5604-v7.txt, 5604-v8.txt, 
 5604-v9.txt, HLog-5604-v3.txt


 Just an idea I had. Might be useful for restore of a backup using the HLogs.
 This could an M/R (with a mapper per HLog file).
 The tool would get a timerange and a (set of) table(s). We'd pick the right 
 HLogs based on time before the M/R job is started and then have a mapper per 
 HLog file.
 The mapper would then go through the HLog, filter all WALEdits that didn't 
 fit into the time range or are not any of the tables and then uses 
 HFileOutputFormat to generate HFiles.
 Would need to indicate the splits we want, probably from a live table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5604) M/R tools to replay WAL files

2012-04-09 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250393#comment-13250393
 ] 

Zhihong Yu commented on HBASE-5604:
---

{code}
+public class WALPlayer extends Configured implements Tool {
{code}
Javadoc for the above new class is desirable.
{code}
+public void setup(Context context) {
+  table = 
Bytes.toBytes(context.getConfiguration().getStrings(TABLES_KEY)[0]);
+}
{code}
Why index of 0 is always used above ?
{code}
+public void setup(Context context) {
+  String[] tableMap = context.getConfiguration().getStrings(TABLE_MAP_KEY);
+  int i = 0;
+  for (String table : context.getConfiguration().getStrings(TABLES_KEY)) {
+tables.put(Bytes.toBytes(table), Bytes.toBytes(tableMap[i++]));
{code}
I think validation on the lengths of the two String[] should be performed. If 
they don't match, bail out early.
{code}
+// Aggregate as much as possible into a single Put/Delete
+// operation before writing to the context.
{code}
Shall we utilize Put.heapSize() and remember the aggregate size of the Put so 
that we can write to context when certain threshold is reached ?

 M/R tools to replay WAL files
 -

 Key: HBASE-5604
 URL: https://issues.apache.org/jira/browse/HBASE-5604
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 5604-v4.txt, 5604-v6.txt, 5604-v7.txt, HLog-5604-v3.txt


 Just an idea I had. Might be useful for restore of a backup using the HLogs.
 This could an M/R (with a mapper per HLog file).
 The tool would get a timerange and a (set of) table(s). We'd pick the right 
 HLogs based on time before the M/R job is started and then have a mapper per 
 HLog file.
 The mapper would then go through the HLog, filter all WALEdits that didn't 
 fit into the time range or are not any of the tables and then uses 
 HFileOutputFormat to generate HFiles.
 Would need to indicate the splits we want, probably from a live table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5604) M/R tools to replay WAL files

2012-04-09 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250439#comment-13250439
 ] 

Lars Hofhansl commented on HBASE-5604:
--

Thanks Ted.

* I'll add Javadoc.
* index of 0 is used here, since when creating HFiles for bulk import only a 
single table is currently allowed (that is also documented in usage(), but 
perhaps not clearly enough...?). I'll add a comment to that extent.
* That the two arrays are of the same size if guaranteed in 
WALPlayer.createSubmittableJob(), but perhaps it is better to double check here.
* Checking heapSize() seems unnecessary. After all, this is single WALEdit.


 M/R tools to replay WAL files
 -

 Key: HBASE-5604
 URL: https://issues.apache.org/jira/browse/HBASE-5604
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 5604-v4.txt, 5604-v6.txt, 5604-v7.txt, HLog-5604-v3.txt


 Just an idea I had. Might be useful for restore of a backup using the HLogs.
 This could an M/R (with a mapper per HLog file).
 The tool would get a timerange and a (set of) table(s). We'd pick the right 
 HLogs based on time before the M/R job is started and then have a mapper per 
 HLog file.
 The mapper would then go through the HLog, filter all WALEdits that didn't 
 fit into the time range or are not any of the tables and then uses 
 HFileOutputFormat to generate HFiles.
 Would need to indicate the splits we want, probably from a live table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5604) M/R tools to replay WAL files

2012-04-09 Thread Zhihong Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250450#comment-13250450
 ] 

Zhihong Yu commented on HBASE-5604:
---

{code}
+  if (tablesToUse == null || tableMap == null || tablesToUse.length != 
tableMap.length) {
+// this can only happen when HLogMapper is used directly by a class 
other than WALPlayer
+throw new IOException(No tables or incorrect table mapping 
specified.);
{code}
I think if we provide separate exceptions for the first two checks and the last 
check, it would be easier for user to understand.
{code}
+System.err.println(  -D + HLogInputFormat.START_TIME_KEY + =ms (only 
apply edit after this time));
+System.err.println(  -D + HLogInputFormat.END_TIME_KEY + =ms (only 
apply edit before this time));
{code}
User would have to resort to conversion tool in order to find out the ms 
readings for desired date / time. Can we make this more user-friendly ?
e.g. in TimeStampingFileContext.java we have:
{code}
this.sdf = new SimpleDateFormat(-MM-dd'T'HH:mm:ss);
{code}
See also 
http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html#parse%28java.lang.String,%20java.text.ParsePosition%29
{code}
+public String[] getLocations() throws IOException, InterruptedException {
+  // TODO: Find the data node with the most blocks for this HLog?
{code}
Would the above be addressed in a separate JIRA ?
{code}
+if (i0) LOG.info(Skipped  + i +  entries.);
{code}
Minor: add spaces around ''



 M/R tools to replay WAL files
 -

 Key: HBASE-5604
 URL: https://issues.apache.org/jira/browse/HBASE-5604
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 5604-v4.txt, 5604-v6.txt, 5604-v7.txt, 5604-v8.txt, 
 HLog-5604-v3.txt


 Just an idea I had. Might be useful for restore of a backup using the HLogs.
 This could an M/R (with a mapper per HLog file).
 The tool would get a timerange and a (set of) table(s). We'd pick the right 
 HLogs based on time before the M/R job is started and then have a mapper per 
 HLog file.
 The mapper would then go through the HLog, filter all WALEdits that didn't 
 fit into the time range or are not any of the tables and then uses 
 HFileOutputFormat to generate HFiles.
 Would need to indicate the splits we want, probably from a live table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5604) M/R tools to replay WAL files

2012-04-09 Thread Hadoop QA (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13250453#comment-13250453
 ] 

Hadoop QA commented on HBASE-5604:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12522068/5604-v8.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 3 new Findbugs (version 1.3.9) 
warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestFromClientSide

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1460//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1460//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/1460//console

This message is automatically generated.

 M/R tools to replay WAL files
 -

 Key: HBASE-5604
 URL: https://issues.apache.org/jira/browse/HBASE-5604
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Attachments: 5604-v4.txt, 5604-v6.txt, 5604-v7.txt, 5604-v8.txt, 
 HLog-5604-v3.txt


 Just an idea I had. Might be useful for restore of a backup using the HLogs.
 This could an M/R (with a mapper per HLog file).
 The tool would get a timerange and a (set of) table(s). We'd pick the right 
 HLogs based on time before the M/R job is started and then have a mapper per 
 HLog file.
 The mapper would then go through the HLog, filter all WALEdits that didn't 
 fit into the time range or are not any of the tables and then uses 
 HFileOutputFormat to generate HFiles.
 Would need to indicate the splits we want, probably from a live table.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira