You're right they are import only arguments, i misread your original question. Am surprised that there are no logs in the JT. You should be able to see the logs for attempt attempt_1410271365435_0034_m_000000_0 and also able to see which machine ran that map job. You can click on the machine name and then on bottom left there is a "Local logs" link which you can click and finally see the local mapper logs for that task tracker.
The general url to directly get to those logs is: http://<task-tracker-machine-name>:50060/tasktracker.jsp I suspect the loading command maybe failing due to some column mismatch or some delimiter problems. ~Pratik On Mon, Sep 15, 2014 at 10:18 AM, Christian Verkerk < [email protected]> wrote: > Hi, > > The jobtracker logs are all empty. The --split-by and --boundary-query > are sqoop import only arguments AFAICT. The split sizes, as in the > size of the file that is loaded into MySQL, is about 32MB. > > The sqoop export job I posted _does_ get data into MySQL, it just > stops after awhile (due to load, presumably) and so running just one > query against MySQL will work just fine and will not reproduce the > error. > > The key is that I need some way to get more information on the exact > error mysqlimport hits. > > Kind regards, > > Christian > > On Mon, Sep 15, 2014 at 7:41 AM, pratik khadloya <[email protected]> > wrote: > > Is there any reason given for the termination in the jobtracker logs? > > Also, i see that you have not specified any --split-by and/or > > --boundary-query option. > > Does sqoop take time to determine the splits, if yes then specifying > these > > settings might help. > > > > Also, check what the split sizes are, you may be running into a data skew > > depending on the splitting column used (generally the primary key of the > > table). > > The query is printed in the sqoop logs, try running the same directly on > > mysql and see how mysql responds. > > > > ~Pratik > > > > On Mon, Sep 15, 2014 at 7:14 AM, Christian Verkerk > > <[email protected]> wrote: > >> > >> Hi, > >> > >> I'm trying to run a sqoop export for a large dataset (at least 1B > >> rows) with the following sqoop export call: > >> > >> sqoop export --direct \ > >> --connect <host> \ > >> --table <table> \ > >> --export-dir /user/hive/warehouse/<table> \ > >> --num-mappers 8 \ > >> --username <username> \ > >> --password <password> \ > >> --input-fields-terminated-by ',' \ > >> --verbose > >> > >> Behind the scenes, I've found that sqoop export does what you'd expect > >> it to: it farms out the work to a (num-mappers) number of different > >> nodes with a NodeManager role, gets about 32MB worth of HDFS data into > >> a temp file on each of the nodes and sends it along to mysqlimport > >> which generates a LOAD DATA LOCAL INFILE for the tempfile into the > >> MySQL table. > >> > >> The following error occurs depending on the level of parallelism used > >> (via num-mappers), that is, 2 mappers doesn't trigger it but 10 > >> definitely will: > >> > >> 14/09/14 17:34:25 INFO mapreduce.Job: map 25% reduce 0% 14/09/14 > >> 17:34:27 INFO mapreduce.Job: Task Id : > >> attempt_1410271365435_0034_m_000000_0, Status : FAILED Error: > >> java.io.IOException: mysqlimport terminated with error code 1 at > >> > >> > org.apache.sqoop.mapreduce.MySQLExportMapper.closeExportHandles(MySQLExportMapper.java:313) > >> at > >> > org.apache.sqoop.mapreduce.MySQLExportMapper.run(MySQLExportMapper.java:250) > >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at > >> org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at > >> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at > >> java.security.AccessController.doPrivileged(Native Method) at > >> javax.security.auth.Subject.doAs(Subject.java:415) at > >> > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) > >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) > >> > >> I understand there is some limit to the level of parallelism that can > >> be achieved in the job -- mysqld can get tied up processing too many > >> things at once etc. but I'd like to know how to turn the debugging on > >> for the org.apache.sqoop.mapreduce.MySQLExportMapper class so that I > >> can actually see the mysqlimport error. > >> > >> Reading through the following code[0] (not sure if this is the > >> relevant version BTW), I see that a logger is set up that should be > >> giving a lot of information[1] about the mysqlimport calls, but I > >> don't seem to be getting any of this fun in my logs. > >> > >> [0] > >> > https://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java > >> [1] `LOG.debug("Starting mysqlimport with arguments:");` > >> > >> > >> Additional info: > >> > >> I have log4j.properties setup in the following basic way: > >> > >> log4j.rootLogger=${root.logger} > >> root.logger=INFO,console > >> > >> log4j.logger.org.apache.hadoop.mapred=TRACE > >> log4j.logger.org.apache.sqoop.mapreduce=TRACE > >> log4j.logger.org.apache.sqoop.mapreduce.MySQLExportMapper=TRACE > >> > >> log4j.appender.console=org.apache.log4j.ConsoleAppender > >> log4j.appender.console.target=System.err > >> log4j.appender.console.layout=org.apache.log4j.PatternLayout > >> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} > >> %p %c{2}: %m%n > >> > >> What I have found is that the `max_allowed_packet` setting in MySQL > >> seems to affect this behaviour somewhat but I'd rather get more > >> information about the actual error than attempt to tweak a setting > >> "blind". > >> > >> Relevant versioning: > >> > >> Cloudera Hadoop Distribution (5.1.2) > >> mysqlimport: Ver 3.7 Distrib 5.5.38, for debian-linux-gnu > >> sqoop version: 1.4.4 > >> > >> Kind regards, > >> > >> Christian Verkerk > >> > >> -- > >> Christian Verkerk > >> Software Engineer, Tubular Labs > >> [email protected] > > > > > > > > -- > Christian Verkerk > Software Engineer, Tubular Labs > [email protected] > > On Mon, Sep 15, 2014 at 7:41 AM, pratik khadloya <[email protected]> > wrote: > > Is there any reason given for the termination in the jobtracker logs? > > Also, i see that you have not specified any --split-by and/or > > --boundary-query option. > > Does sqoop take time to determine the splits, if yes then specifying > these > > settings might help. > > > > Also, check what the split sizes are, you may be running into a data skew > > depending on the splitting column used (generally the primary key of the > > table). > > The query is printed in the sqoop logs, try running the same directly on > > mysql and see how mysql responds. > > > > ~Pratik > > > > On Mon, Sep 15, 2014 at 7:14 AM, Christian Verkerk > > <[email protected]> wrote: > >> > >> Hi, > >> > >> I'm trying to run a sqoop export for a large dataset (at least 1B > >> rows) with the following sqoop export call: > >> > >> sqoop export --direct \ > >> --connect <host> \ > >> --table <table> \ > >> --export-dir /user/hive/warehouse/<table> \ > >> --num-mappers 8 \ > >> --username <username> \ > >> --password <password> \ > >> --input-fields-terminated-by ',' \ > >> --verbose > >> > >> Behind the scenes, I've found that sqoop export does what you'd expect > >> it to: it farms out the work to a (num-mappers) number of different > >> nodes with a NodeManager role, gets about 32MB worth of HDFS data into > >> a temp file on each of the nodes and sends it along to mysqlimport > >> which generates a LOAD DATA LOCAL INFILE for the tempfile into the > >> MySQL table. > >> > >> The following error occurs depending on the level of parallelism used > >> (via num-mappers), that is, 2 mappers doesn't trigger it but 10 > >> definitely will: > >> > >> 14/09/14 17:34:25 INFO mapreduce.Job: map 25% reduce 0% 14/09/14 > >> 17:34:27 INFO mapreduce.Job: Task Id : > >> attempt_1410271365435_0034_m_000000_0, Status : FAILED Error: > >> java.io.IOException: mysqlimport terminated with error code 1 at > >> > >> > org.apache.sqoop.mapreduce.MySQLExportMapper.closeExportHandles(MySQLExportMapper.java:313) > >> at > >> > org.apache.sqoop.mapreduce.MySQLExportMapper.run(MySQLExportMapper.java:250) > >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at > >> org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at > >> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at > >> java.security.AccessController.doPrivileged(Native Method) at > >> javax.security.auth.Subject.doAs(Subject.java:415) at > >> > >> > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554) > >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162) > >> > >> I understand there is some limit to the level of parallelism that can > >> be achieved in the job -- mysqld can get tied up processing too many > >> things at once etc. but I'd like to know how to turn the debugging on > >> for the org.apache.sqoop.mapreduce.MySQLExportMapper class so that I > >> can actually see the mysqlimport error. > >> > >> Reading through the following code[0] (not sure if this is the > >> relevant version BTW), I see that a logger is set up that should be > >> giving a lot of information[1] about the mysqlimport calls, but I > >> don't seem to be getting any of this fun in my logs. > >> > >> [0] > >> > https://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java > >> [1] `LOG.debug("Starting mysqlimport with arguments:");` > >> > >> > >> Additional info: > >> > >> I have log4j.properties setup in the following basic way: > >> > >> log4j.rootLogger=${root.logger} > >> root.logger=INFO,console > >> > >> log4j.logger.org.apache.hadoop.mapred=TRACE > >> log4j.logger.org.apache.sqoop.mapreduce=TRACE > >> log4j.logger.org.apache.sqoop.mapreduce.MySQLExportMapper=TRACE > >> > >> log4j.appender.console=org.apache.log4j.ConsoleAppender > >> log4j.appender.console.target=System.err > >> log4j.appender.console.layout=org.apache.log4j.PatternLayout > >> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} > >> %p %c{2}: %m%n > >> > >> What I have found is that the `max_allowed_packet` setting in MySQL > >> seems to affect this behaviour somewhat but I'd rather get more > >> information about the actual error than attempt to tweak a setting > >> "blind". > >> > >> Relevant versioning: > >> > >> Cloudera Hadoop Distribution (5.1.2) > >> mysqlimport: Ver 3.7 Distrib 5.5.38, for debian-linux-gnu > >> sqoop version: 1.4.4 > >> > >> Kind regards, > >> > >> Christian Verkerk > >> > >> -- > >> Christian Verkerk > >> Software Engineer, Tubular Labs > >> [email protected] > > > > > > > > -- > Christian Verkerk > Software Engineer, Tubular Labs > [email protected] >
