Btw, Christian, can you tell me how you fed the custom log4j config to sqoop? I am interested in debugging the MySQLManager class.
Thanks, ~Pratik On Mon, Sep 15, 2014 at 10:39 AM, pratik khadloya <[email protected]> wrote: > You're right they are import only arguments, i misread your original > question. > Am surprised that there are no logs in the JT. You should be able to see > the logs for attempt attempt_1410271365435_0034_m_000000_0 > and also able to see which machine ran that map job. You can click on the > machine name and then on bottom left there is a "Local logs" link which you > can click and finally see the local mapper logs for that task tracker. > > The general url to directly get to those logs is: > http://<task-tracker-machine-name>:50060/tasktracker.jsp > > I suspect the loading command maybe failing due to some column mismatch or > some delimiter problems. > > ~Pratik > > On Mon, Sep 15, 2014 at 10:18 AM, Christian Verkerk < > [email protected]> wrote: > >> Hi, >> >> The jobtracker logs are all empty. The --split-by and --boundary-query >> are sqoop import only arguments AFAICT. The split sizes, as in the >> size of the file that is loaded into MySQL, is about 32MB. >> >> The sqoop export job I posted _does_ get data into MySQL, it just >> stops after awhile (due to load, presumably) and so running just one >> query against MySQL will work just fine and will not reproduce the >> error. >> >> The key is that I need some way to get more information on the exact >> error mysqlimport hits. >> >> Kind regards, >> >> Christian >> >> On Mon, Sep 15, 2014 at 7:41 AM, pratik khadloya <[email protected]> >> wrote: >> > Is there any reason given for the termination in the jobtracker logs? >> > Also, i see that you have not specified any --split-by and/or >> > --boundary-query option. >> > Does sqoop take time to determine the splits, if yes then specifying >> these >> > settings might help. >> > >> > Also, check what the split sizes are, you may be running into a data >> skew >> > depending on the splitting column used (generally the primary key of the >> > table). >> > The query is printed in the sqoop logs, try running the same directly on >> > mysql and see how mysql responds. >> > >> > ~Pratik >> > >> > On Mon, Sep 15, 2014 at 7:14 AM, Christian Verkerk >> > <[email protected]> wrote: >> >> >> >> Hi, >> >> >> >> I'm trying to run a sqoop export for a large dataset (at least 1B >> >> rows) with the following sqoop export call: >> >> >> >> sqoop export --direct \ >> >> --connect <host> \ >> >> --table <table> \ >> >> --export-dir /user/hive/warehouse/<table> \ >> >> --num-mappers 8 \ >> >> --username <username> \ >> >> --password <password> \ >> >> --input-fields-terminated-by ',' \ >> >> --verbose >> >> >> >> Behind the scenes, I've found that sqoop export does what you'd expect >> >> it to: it farms out the work to a (num-mappers) number of different >> >> nodes with a NodeManager role, gets about 32MB worth of HDFS data into >> >> a temp file on each of the nodes and sends it along to mysqlimport >> >> which generates a LOAD DATA LOCAL INFILE for the tempfile into the >> >> MySQL table. >> >> >> >> The following error occurs depending on the level of parallelism used >> >> (via num-mappers), that is, 2 mappers doesn't trigger it but 10 >> >> definitely will: >> >> >> >> 14/09/14 17:34:25 INFO mapreduce.Job: map 25% reduce 0% 14/09/14 >> >> 17:34:27 INFO mapreduce.Job: Task Id : >> >> attempt_1410271365435_0034_m_000000_0, Status : FAILED Error: >> >> java.io.IOException: mysqlimport terminated with error code 1 at >> >> >> >> >> org.apache.sqoop.mapreduce.MySQLExportMapper.closeExportHandles(MySQLExportMapper.java:313) >> >> at >> >> >> org.apache.sqoop.mapreduce.MySQLExportMapper.run(MySQLExportMapper.java:250) >> >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at >> >> org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at >> >> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at >> >> java.security.AccessController.doPrivileged(Native Method) at >> >> javax.security.auth.Subject.doAs(Subject.java:415) at >> >> >> >> >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) >> >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) >> >> >> >> I understand there is some limit to the level of parallelism that can >> >> be achieved in the job -- mysqld can get tied up processing too many >> >> things at once etc. but I'd like to know how to turn the debugging on >> >> for the org.apache.sqoop.mapreduce.MySQLExportMapper class so that I >> >> can actually see the mysqlimport error. >> >> >> >> Reading through the following code[0] (not sure if this is the >> >> relevant version BTW), I see that a logger is set up that should be >> >> giving a lot of information[1] about the mysqlimport calls, but I >> >> don't seem to be getting any of this fun in my logs. >> >> >> >> [0] >> >> >> https://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java >> >> [1] `LOG.debug("Starting mysqlimport with arguments:");` >> >> >> >> >> >> Additional info: >> >> >> >> I have log4j.properties setup in the following basic way: >> >> >> >> log4j.rootLogger=${root.logger} >> >> root.logger=INFO,console >> >> >> >> log4j.logger.org.apache.hadoop.mapred=TRACE >> >> log4j.logger.org.apache.sqoop.mapreduce=TRACE >> >> log4j.logger.org.apache.sqoop.mapreduce.MySQLExportMapper=TRACE >> >> >> >> log4j.appender.console=org.apache.log4j.ConsoleAppender >> >> log4j.appender.console.target=System.err >> >> log4j.appender.console.layout=org.apache.log4j.PatternLayout >> >> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} >> >> %p %c{2}: %m%n >> >> >> >> What I have found is that the `max_allowed_packet` setting in MySQL >> >> seems to affect this behaviour somewhat but I'd rather get more >> >> information about the actual error than attempt to tweak a setting >> >> "blind". >> >> >> >> Relevant versioning: >> >> >> >> Cloudera Hadoop Distribution (5.1.2) >> >> mysqlimport: Ver 3.7 Distrib 5.5.38, for debian-linux-gnu >> >> sqoop version: 1.4.4 >> >> >> >> Kind regards, >> >> >> >> Christian Verkerk >> >> >> >> -- >> >> Christian Verkerk >> >> Software Engineer, Tubular Labs >> >> [email protected] >> > >> > >> >> >> >> -- >> Christian Verkerk >> Software Engineer, Tubular Labs >> [email protected] >> >> On Mon, Sep 15, 2014 at 7:41 AM, pratik khadloya <[email protected]> >> wrote: >> > Is there any reason given for the termination in the jobtracker logs? >> > Also, i see that you have not specified any --split-by and/or >> > --boundary-query option. >> > Does sqoop take time to determine the splits, if yes then specifying >> these >> > settings might help. >> > >> > Also, check what the split sizes are, you may be running into a data >> skew >> > depending on the splitting column used (generally the primary key of the >> > table). >> > The query is printed in the sqoop logs, try running the same directly on >> > mysql and see how mysql responds. >> > >> > ~Pratik >> > >> > On Mon, Sep 15, 2014 at 7:14 AM, Christian Verkerk >> > <[email protected]> wrote: >> >> >> >> Hi, >> >> >> >> I'm trying to run a sqoop export for a large dataset (at least 1B >> >> rows) with the following sqoop export call: >> >> >> >> sqoop export --direct \ >> >> --connect <host> \ >> >> --table <table> \ >> >> --export-dir /user/hive/warehouse/<table> \ >> >> --num-mappers 8 \ >> >> --username <username> \ >> >> --password <password> \ >> >> --input-fields-terminated-by ',' \ >> >> --verbose >> >> >> >> Behind the scenes, I've found that sqoop export does what you'd expect >> >> it to: it farms out the work to a (num-mappers) number of different >> >> nodes with a NodeManager role, gets about 32MB worth of HDFS data into >> >> a temp file on each of the nodes and sends it along to mysqlimport >> >> which generates a LOAD DATA LOCAL INFILE for the tempfile into the >> >> MySQL table. >> >> >> >> The following error occurs depending on the level of parallelism used >> >> (via num-mappers), that is, 2 mappers doesn't trigger it but 10 >> >> definitely will: >> >> >> >> 14/09/14 17:34:25 INFO mapreduce.Job: map 25% reduce 0% 14/09/14 >> >> 17:34:27 INFO mapreduce.Job: Task Id : >> >> attempt_1410271365435_0034_m_000000_0, Status : FAILED Error: >> >> java.io.IOException: mysqlimport terminated with error code 1 at >> >> >> >> >> org.apache.sqoop.mapreduce.MySQLExportMapper.closeExportHandles(MySQLExportMapper.java:313) >> >> at >> >> >> org.apache.sqoop.mapreduce.MySQLExportMapper.run(MySQLExportMapper.java:250) >> >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at >> >> org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at >> >> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at >> >> java.security.AccessController.doPrivileged(Native Method) at >> >> javax.security.auth.Subject.doAs(Subject.java:415) at >> >> >> >> >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) >> >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) >> >> >> >> I understand there is some limit to the level of parallelism that can >> >> be achieved in the job -- mysqld can get tied up processing too many >> >> things at once etc. but I'd like to know how to turn the debugging on >> >> for the org.apache.sqoop.mapreduce.MySQLExportMapper class so that I >> >> can actually see the mysqlimport error. >> >> >> >> Reading through the following code[0] (not sure if this is the >> >> relevant version BTW), I see that a logger is set up that should be >> >> giving a lot of information[1] about the mysqlimport calls, but I >> >> don't seem to be getting any of this fun in my logs. >> >> >> >> [0] >> >> >> https://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java >> >> [1] `LOG.debug("Starting mysqlimport with arguments:");` >> >> >> >> >> >> Additional info: >> >> >> >> I have log4j.properties setup in the following basic way: >> >> >> >> log4j.rootLogger=${root.logger} >> >> root.logger=INFO,console >> >> >> >> log4j.logger.org.apache.hadoop.mapred=TRACE >> >> log4j.logger.org.apache.sqoop.mapreduce=TRACE >> >> log4j.logger.org.apache.sqoop.mapreduce.MySQLExportMapper=TRACE >> >> >> >> log4j.appender.console=org.apache.log4j.ConsoleAppender >> >> log4j.appender.console.target=System.err >> >> log4j.appender.console.layout=org.apache.log4j.PatternLayout >> >> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} >> >> %p %c{2}: %m%n >> >> >> >> What I have found is that the `max_allowed_packet` setting in MySQL >> >> seems to affect this behaviour somewhat but I'd rather get more >> >> information about the actual error than attempt to tweak a setting >> >> "blind". >> >> >> >> Relevant versioning: >> >> >> >> Cloudera Hadoop Distribution (5.1.2) >> >> mysqlimport: Ver 3.7 Distrib 5.5.38, for debian-linux-gnu >> >> sqoop version: 1.4.4 >> >> >> >> Kind regards, >> >> >> >> Christian Verkerk >> >> >> >> -- >> >> Christian Verkerk >> >> Software Engineer, Tubular Labs >> >> [email protected] >> > >> > >> >> >> >> -- >> Christian Verkerk >> Software Engineer, Tubular Labs >> [email protected] >> > >
