Re: Review Request: SQOOP-683 Documenting sqoop.mysql.export.sleep.ms - easy throttling feature for direct MySQL exports
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7880/ --- (Updated Nov. 8, 2012, 6:33 p.m.) Review request for Sqoop. Changes --- Thank you for the suggestions! All of them make sense to me, see new patch :) Description --- Code review for SQOOP-683, see https://issues.apache.org/jira/browse/SQOOP-683. Diffs (updated) - src/docs/user/compatibility.txt 3576fd7 Diff: https://reviews.apache.org/r/7880/diff/ Testing --- Converted to XML with asciidoc, the affected part: simparaSometimes you need to export large data with Sqoop to a live MySQL cluster that is under a high load serving random queries from the users of our product. While data consistency issues during the export can be easily solved with a staging table, there is still a problem: the performance impact caused by the heavy export./simpara simparaFirst off, the resources of MySQL dedicated to the import process can affect the performance of the live product, both on the master and on the slaves. Second, even if the servers can handle the import with no significant performance impact (mysqlimport should be relatively cheap), importing big tables can cause serious replication lag in the cluster risking data inconsistency./simpara simparaWith literal-D sqoop.mysql.export.sleep.ms=time/literal, where emphasistime/emphasis is a value in milliseconds, you can let the server relax between checkpoints and the replicas catch up by pausing the export process after transferring the number of bytes specified in literalsqoop.mysql.export.checkpoint.bytes/literal. Experiment with different settings of these two parameters to archieve an export pace that doesn#8217;t endanger the stability of your MySQL cluster./simpara importantsimparaNote that any arguments to Sqoop that are of the form literal-D parameter=value/literal are Hadoop emphasisgeneric arguments/emphasis and must appear before any tool-specific arguments (for example, literal--connect/literal, literal--table/literal, etc). Don#8217;t forget that these parameters only work with the literal--direct/literal flag set./simpara/important Thanks, Zoltán Tóth-Czifra
Review Request: SQOOP-683 Documenting sqoop.mysql.export.sleep.ms - easy throttling feature for direct MySQL exports
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7880/ --- Review request for Sqoop. Description --- Code review for SQOOP-683, see https://issues.apache.org/jira/browse/SQOOP-683. Diffs - src/docs/user/compatibility.txt 3576fd7 Diff: https://reviews.apache.org/r/7880/diff/ Testing --- Converted to XML with asciidoc, the affected part: simparaSometimes you need to export large data with Sqoop to a live MySQL cluster that is under a high load serving random queries from the users of our product. While data consistency issues during the export can be easily solved with a staging table, there is still a problem: the performance impact caused by the heavy export./simpara simparaFirst off, the resources of MySQL dedicated to the import process can affect the performance of the live product, both on the master and on the slaves. Second, even if the servers can handle the import with no significant performance impact (mysqlimport should be relatively cheap), importing big tables can cause serious replication lag in the cluster risking data inconsistency./simpara simparaWith literal-D sqoop.mysql.export.sleep.ms=time/literal, where emphasistime/emphasis is a value in milliseconds, you can let the server relax between checkpoints and the replicas catch up by pausing the export process after transferring the number of bytes specified in literalsqoop.mysql.export.checkpoint.bytes/literal. Experiment with different settings of these two parameters to archieve an export pace that doesn#8217;t endanger the stability of your MySQL cluster./simpara importantsimparaNote that any arguments to Sqoop that are of the form literal-D parameter=value/literal are Hadoop emphasisgeneric arguments/emphasis and must appear before any tool-specific arguments (for example, literal--connect/literal, literal--table/literal, etc). Don#8217;t forget that these parameters only work with the literal--direct/literal flag set./simpara/important Thanks, Zoltán Tóth-Czifra
Re: Review Request: SQOOP-604 Easy throttling feature for MySQL exports
On Nov. 3, 2012, 5:18 a.m., Abhijeet Gaikwad wrote: Looks good :) ant checkstyle - no errors ant test - success Thank you for your help Abhijeet! - Zoltán --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7135/#review13075 --- On Nov. 2, 2012, 12:32 p.m., Zoltán Tóth-Czifra wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7135/ --- (Updated Nov. 2, 2012, 12:32 p.m.) Review request for Sqoop. Description --- Code review for SQOOP-604, see https://issues.apache.org/jira/browse/SQOOP-604 The solution in short: Using the already existing checkpoint feature of the direct (--direct) MySQL exports (the export process is restarted every X bytes written), extending it with a new config value that would simply make the thread sleep for X milliseconds at the checkbpoints. With low enough byte count limit this can be a simple yet powerful throttling mechanism. Diffs - src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java a4e8b88 Diff: https://reviews.apache.org/r/7135/diff/ Testing --- Executing with different settings of sqoop.mysql.export.checkpoint.bytes and sqoop.mysql.export.sleep.ms: 33554432B / 0ms: Transferred 4.7579 MB in 8.7175 seconds (558.8826 KB/sec) 102400B / 500ms: Transferred 4.7579 MB in 35.7794 seconds (136.1698 KB/sec) 51200B / 500ms: Transferred 4.758 MB in 57.8675 seconds (84.1959 KB/sec) 51200B / 250ms: Transferred 4.7579 MB in 35.0293 seconds (139.0854 KB/sec) I did not add unit tests yet and as it involves calling to Thread.sleep, I find testing this difficult. Unfortunately there is no machine or environment object that could be injected to these classes as mocks that could take care of time-related fixtures. Thanks, Zoltán Tóth-Czifra
Re: Review Request: SQOOP-604 Easy throttling feature for MySQL exports
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7135/ --- (Updated Nov. 2, 2012, 12:32 p.m.) Review request for Sqoop. Changes --- Sure thing, I'm sorry. Checkstyle passes now with my changes. Description --- Code review for SQOOP-604, see https://issues.apache.org/jira/browse/SQOOP-604 The solution in short: Using the already existing checkpoint feature of the direct (--direct) MySQL exports (the export process is restarted every X bytes written), extending it with a new config value that would simply make the thread sleep for X milliseconds at the checkbpoints. With low enough byte count limit this can be a simple yet powerful throttling mechanism. Diffs (updated) - src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java a4e8b88 Diff: https://reviews.apache.org/r/7135/diff/ Testing --- Executing with different settings of sqoop.mysql.export.checkpoint.bytes and sqoop.mysql.export.sleep.ms: 33554432B / 0ms: Transferred 4.7579 MB in 8.7175 seconds (558.8826 KB/sec) 102400B / 500ms: Transferred 4.7579 MB in 35.7794 seconds (136.1698 KB/sec) 51200B / 500ms: Transferred 4.758 MB in 57.8675 seconds (84.1959 KB/sec) 51200B / 250ms: Transferred 4.7579 MB in 35.0293 seconds (139.0854 KB/sec) I did not add unit tests yet and as it involves calling to Thread.sleep, I find testing this difficult. Unfortunately there is no machine or environment object that could be injected to these classes as mocks that could take care of time-related fixtures. Thanks, Zoltán Tóth-Czifra
Re: Review Request: SQOOP-604 Easy throttling feature for MySQL exports
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7135/ --- (Updated Oct. 4, 2012, 12:25 p.m.) Review request for Sqoop. Changes --- Sorry, I'm retarded. Description --- Code review for SQOOP-604, see https://issues.apache.org/jira/browse/SQOOP-604 The solution in short: Using the already existing checkpoint feature of the direct (--direct) MySQL exports (the export process is restarted every X bytes written), extending it with a new config value that would simply make the thread sleep for X milliseconds at the checkbpoints. With low enough byte count limit this can be a simple yet powerful throttling mechanism. Diffs (updated) - src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java a4e8b88 Diff: https://reviews.apache.org/r/7135/diff/ Testing --- Executing with different settings of sqoop.mysql.export.checkpoint.bytes and sqoop.mysql.export.sleep.ms: 33554432B / 0ms: Transferred 4.7579 MB in 8.7175 seconds (558.8826 KB/sec) 102400B / 500ms: Transferred 4.7579 MB in 35.7794 seconds (136.1698 KB/sec) 51200B / 500ms: Transferred 4.758 MB in 57.8675 seconds (84.1959 KB/sec) 51200B / 250ms: Transferred 4.7579 MB in 35.0293 seconds (139.0854 KB/sec) I did not add unit tests yet and as it involves calling to Thread.sleep, I find testing this difficult. Unfortunately there is no machine or environment object that could be injected to these classes as mocks that could take care of time-related fixtures. Thanks, Zoltán Tóth-Czifra
Re: Review Request: SQOOP-604 Easy throttling feature for MySQL exports
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7135/ --- (Updated Oct. 2, 2012, 10:08 a.m.) Review request for Sqoop. Changes --- You are right, I was in a hurry and here is the result. Anyways, I attach the fixed patch. Compiled with no checkstyle warnings. Output of test: 2012-10-02 12:03:17,575 WARN com.cloudera.sqoop.mapreduce.MySQLExportMapper: Value for sqoop.mysql.export.sleep.ms has to be smaller than mapred.task.timeout Description --- Code review for SQOOP-604, see https://issues.apache.org/jira/browse/SQOOP-604 The solution in short: Using the already existing checkpoint feature of the direct (--direct) MySQL exports (the export process is restarted every X bytes written), extending it with a new config value that would simply make the thread sleep for X milliseconds at the checkbpoints. With low enough byte count limit this can be a simple yet powerful throttling mechanism. Diffs (updated) - src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java a4e8b88 Diff: https://reviews.apache.org/r/7135/diff/ Testing --- Executing with different settings of sqoop.mysql.export.checkpoint.bytes and sqoop.mysql.export.sleep.ms: 33554432B / 0ms: Transferred 4.7579 MB in 8.7175 seconds (558.8826 KB/sec) 102400B / 500ms: Transferred 4.7579 MB in 35.7794 seconds (136.1698 KB/sec) 51200B / 500ms: Transferred 4.758 MB in 57.8675 seconds (84.1959 KB/sec) 51200B / 250ms: Transferred 4.7579 MB in 35.0293 seconds (139.0854 KB/sec) I did not add unit tests yet and as it involves calling to Thread.sleep, I find testing this difficult. Unfortunately there is no machine or environment object that could be injected to these classes as mocks that could take care of time-related fixtures. Thanks, Zoltán Tóth-Czifra
Re: Review Request: SQOOP-604 Easy throttling feature for MySQL exports
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7135/ --- (Updated Oct. 2, 2012, 4:08 p.m.) Review request for Sqoop. Changes --- Sorry! Description --- Code review for SQOOP-604, see https://issues.apache.org/jira/browse/SQOOP-604 The solution in short: Using the already existing checkpoint feature of the direct (--direct) MySQL exports (the export process is restarted every X bytes written), extending it with a new config value that would simply make the thread sleep for X milliseconds at the checkbpoints. With low enough byte count limit this can be a simple yet powerful throttling mechanism. Diffs (updated) - src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java a4e8b88 Diff: https://reviews.apache.org/r/7135/diff/ Testing --- Executing with different settings of sqoop.mysql.export.checkpoint.bytes and sqoop.mysql.export.sleep.ms: 33554432B / 0ms: Transferred 4.7579 MB in 8.7175 seconds (558.8826 KB/sec) 102400B / 500ms: Transferred 4.7579 MB in 35.7794 seconds (136.1698 KB/sec) 51200B / 500ms: Transferred 4.758 MB in 57.8675 seconds (84.1959 KB/sec) 51200B / 250ms: Transferred 4.7579 MB in 35.0293 seconds (139.0854 KB/sec) I did not add unit tests yet and as it involves calling to Thread.sleep, I find testing this difficult. Unfortunately there is no machine or environment object that could be injected to these classes as mocks that could take care of time-related fixtures. Thanks, Zoltán Tóth-Czifra
Re: Review Request: SQOOP-604 Easy throttling feature for MySQL exports
On Sept. 28, 2012, 9:59 a.m., Abhijeet Gaikwad wrote: src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java, line 329 https://reviews.apache.org/r/7135/diff/1/?file=155911#file155911line329 What happens when MYSQL_CHECKPOINT_SLEEP_KEY is greater than mapred.task.timeout? If the job is killed, we need to handle the scenario. That's a good point! Given that the default value of mapred.task.timeout is 60 (10m) I consider this very unlikely, the ideal value of the new config key has order of magniture of a few hundred ms. However, in some extreme cases (or when clearly misusing this feature) it is possible that this case needs to be handled. Do you have any suggestion? For example, limiting sqoop.mysql.export.sleep.ms to a maximum of the value in mapred.task.timeout? - Zoltán --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7135/#review12019 --- On Sept. 27, 2012, 3:47 p.m., Zoltán Tóth-Czifra wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7135/ --- (Updated Sept. 27, 2012, 3:47 p.m.) Review request for Sqoop. Description --- Code review for SQOOP-604, see https://issues.apache.org/jira/browse/SQOOP-604 The solution in short: Using the already existing checkpoint feature of the direct (--direct) MySQL exports (the export process is restarted every X bytes written), extending it with a new config value that would simply make the thread sleep for X milliseconds at the checkbpoints. With low enough byte count limit this can be a simple yet powerful throttling mechanism. Diffs - src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java a4e8b88 Diff: https://reviews.apache.org/r/7135/diff/ Testing --- Executing with different settings of sqoop.mysql.export.checkpoint.bytes and sqoop.mysql.export.sleep.ms: 33554432B / 0ms: Transferred 4.7579 MB in 8.7175 seconds (558.8826 KB/sec) 102400B / 500ms: Transferred 4.7579 MB in 35.7794 seconds (136.1698 KB/sec) 51200B / 500ms: Transferred 4.758 MB in 57.8675 seconds (84.1959 KB/sec) 51200B / 250ms: Transferred 4.7579 MB in 35.0293 seconds (139.0854 KB/sec) I did not add unit tests yet and as it involves calling to Thread.sleep, I find testing this difficult. Unfortunately there is no machine or environment object that could be injected to these classes as mocks that could take care of time-related fixtures. Thanks, Zoltán Tóth-Czifra