You're right they are import only arguments, i misread your original
question.
Am surprised that there are no logs in the JT. You should be able to see
the logs for attempt attempt_1410271365435_0034_m_000000_0
and also able to see which machine ran that map job. You can click on the
machine name and then on bottom left there is a "Local logs" link which you
can click and finally see the local mapper logs for that task tracker.

The general url to directly get to those logs is:
http://<task-tracker-machine-name>:50060/tasktracker.jsp

I suspect the loading command maybe failing due to some column mismatch or
some delimiter problems.

~Pratik

On Mon, Sep 15, 2014 at 10:18 AM, Christian Verkerk <
[email protected]> wrote:

> Hi,
>
> The jobtracker logs are all empty. The --split-by and --boundary-query
> are sqoop import only arguments AFAICT. The split sizes, as in the
> size of the file that is loaded into MySQL, is about 32MB.
>
> The sqoop export job I posted _does_ get data into MySQL, it just
> stops after awhile (due to load, presumably) and so running just one
> query against MySQL will work just fine and will not reproduce the
> error.
>
> The key is that I need some way to get more information on the exact
> error mysqlimport hits.
>
> Kind regards,
>
> Christian
>
> On Mon, Sep 15, 2014 at 7:41 AM, pratik khadloya <[email protected]>
> wrote:
> > Is there any reason given for the termination in the jobtracker logs?
> > Also, i see that you have not specified any --split-by and/or
> > --boundary-query option.
> > Does sqoop take time to determine the splits, if yes then specifying
> these
> > settings might help.
> >
> > Also, check what the split sizes are, you may be running into a data skew
> > depending on the splitting column used (generally the primary key of the
> > table).
> > The query is printed in the sqoop logs, try running the same directly on
> > mysql and see how mysql responds.
> >
> > ~Pratik
> >
> > On Mon, Sep 15, 2014 at 7:14 AM, Christian Verkerk
> > <[email protected]> wrote:
> >>
> >> Hi,
> >>
> >> I'm trying to run a sqoop export for a large dataset (at least 1B
> >> rows) with the following sqoop export call:
> >>
> >> sqoop export --direct \
> >> --connect <host> \
> >> --table <table> \
> >> --export-dir /user/hive/warehouse/<table> \
> >> --num-mappers 8 \
> >> --username <username> \
> >> --password <password> \
> >> --input-fields-terminated-by ',' \
> >> --verbose
> >>
> >> Behind the scenes, I've found that sqoop export does what you'd expect
> >> it to: it farms out the work to a (num-mappers) number of different
> >> nodes with a NodeManager role, gets about 32MB worth of HDFS data into
> >> a temp file on each of the nodes and sends it along to mysqlimport
> >> which generates a LOAD DATA  LOCAL INFILE for the tempfile into the
> >> MySQL table.
> >>
> >> The following error occurs depending on the level of parallelism used
> >> (via num-mappers), that is, 2 mappers doesn't trigger it but 10
> >> definitely will:
> >>
> >> 14/09/14 17:34:25 INFO mapreduce.Job: map 25% reduce 0% 14/09/14
> >> 17:34:27 INFO mapreduce.Job: Task Id :
> >> attempt_1410271365435_0034_m_000000_0, Status : FAILED Error:
> >> java.io.IOException: mysqlimport terminated with error code 1 at
> >>
> >>
> org.apache.sqoop.mapreduce.MySQLExportMapper.closeExportHandles(MySQLExportMapper.java:313)
> >> at
> >>
> org.apache.sqoop.mapreduce.MySQLExportMapper.run(MySQLExportMapper.java:250)
> >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
> >> org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at
> >> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at
> >> java.security.AccessController.doPrivileged(Native Method) at
> >> javax.security.auth.Subject.doAs(Subject.java:415) at
> >>
> >>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
> >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> >>
> >> I understand there is some limit to the level of parallelism that can
> >> be achieved in the job -- mysqld can get tied up processing too many
> >> things at once etc. but I'd like to know how to turn the debugging on
> >> for the org.apache.sqoop.mapreduce.MySQLExportMapper class so that I
> >> can actually see the mysqlimport error.
> >>
> >> Reading through the following code[0] (not sure if this is the
> >> relevant version BTW), I see that a logger is set up that should be
> >> giving a lot of information[1] about the mysqlimport calls, but I
> >> don't seem to be getting any of this fun in my logs.
> >>
> >> [0]
> >>
> https://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java
> >> [1] `LOG.debug("Starting mysqlimport with arguments:");`
> >>
> >>
> >> Additional info:
> >>
> >> I have log4j.properties setup in the following basic way:
> >>
> >> log4j.rootLogger=${root.logger}
> >> root.logger=INFO,console
> >>
> >> log4j.logger.org.apache.hadoop.mapred=TRACE
> >> log4j.logger.org.apache.sqoop.mapreduce=TRACE
> >> log4j.logger.org.apache.sqoop.mapreduce.MySQLExportMapper=TRACE
> >>
> >> log4j.appender.console=org.apache.log4j.ConsoleAppender
> >> log4j.appender.console.target=System.err
> >> log4j.appender.console.layout=org.apache.log4j.PatternLayout
> >> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss}
> >> %p %c{2}: %m%n
> >>
> >> What I have found is that the `max_allowed_packet` setting in MySQL
> >> seems to affect this behaviour somewhat but I'd rather get more
> >> information about the actual error than attempt to tweak a setting
> >> "blind".
> >>
> >> Relevant versioning:
> >>
> >> Cloudera Hadoop Distribution (5.1.2)
> >> mysqlimport: Ver 3.7 Distrib 5.5.38, for debian-linux-gnu
> >> sqoop version: 1.4.4
> >>
> >> Kind regards,
> >>
> >> Christian Verkerk
> >>
> >> --
> >> Christian Verkerk
> >> Software Engineer, Tubular Labs
> >> [email protected]
> >
> >
>
>
>
> --
> Christian Verkerk
> Software Engineer, Tubular Labs
> [email protected]
>
> On Mon, Sep 15, 2014 at 7:41 AM, pratik khadloya <[email protected]>
> wrote:
> > Is there any reason given for the termination in the jobtracker logs?
> > Also, i see that you have not specified any --split-by and/or
> > --boundary-query option.
> > Does sqoop take time to determine the splits, if yes then specifying
> these
> > settings might help.
> >
> > Also, check what the split sizes are, you may be running into a data skew
> > depending on the splitting column used (generally the primary key of the
> > table).
> > The query is printed in the sqoop logs, try running the same directly on
> > mysql and see how mysql responds.
> >
> > ~Pratik
> >
> > On Mon, Sep 15, 2014 at 7:14 AM, Christian Verkerk
> > <[email protected]> wrote:
> >>
> >> Hi,
> >>
> >> I'm trying to run a sqoop export for a large dataset (at least 1B
> >> rows) with the following sqoop export call:
> >>
> >> sqoop export --direct \
> >> --connect <host> \
> >> --table <table> \
> >> --export-dir /user/hive/warehouse/<table> \
> >> --num-mappers 8 \
> >> --username <username> \
> >> --password <password> \
> >> --input-fields-terminated-by ',' \
> >> --verbose
> >>
> >> Behind the scenes, I've found that sqoop export does what you'd expect
> >> it to: it farms out the work to a (num-mappers) number of different
> >> nodes with a NodeManager role, gets about 32MB worth of HDFS data into
> >> a temp file on each of the nodes and sends it along to mysqlimport
> >> which generates a LOAD DATA  LOCAL INFILE for the tempfile into the
> >> MySQL table.
> >>
> >> The following error occurs depending on the level of parallelism used
> >> (via num-mappers), that is, 2 mappers doesn't trigger it but 10
> >> definitely will:
> >>
> >> 14/09/14 17:34:25 INFO mapreduce.Job: map 25% reduce 0% 14/09/14
> >> 17:34:27 INFO mapreduce.Job: Task Id :
> >> attempt_1410271365435_0034_m_000000_0, Status : FAILED Error:
> >> java.io.IOException: mysqlimport terminated with error code 1 at
> >>
> >>
> org.apache.sqoop.mapreduce.MySQLExportMapper.closeExportHandles(MySQLExportMapper.java:313)
> >> at
> >>
> org.apache.sqoop.mapreduce.MySQLExportMapper.run(MySQLExportMapper.java:250)
> >> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at
> >> org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at
> >> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167) at
> >> java.security.AccessController.doPrivileged(Native Method) at
> >> javax.security.auth.Subject.doAs(Subject.java:415) at
> >>
> >>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
> >> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
> >>
> >> I understand there is some limit to the level of parallelism that can
> >> be achieved in the job -- mysqld can get tied up processing too many
> >> things at once etc. but I'd like to know how to turn the debugging on
> >> for the org.apache.sqoop.mapreduce.MySQLExportMapper class so that I
> >> can actually see the mysqlimport error.
> >>
> >> Reading through the following code[0] (not sure if this is the
> >> relevant version BTW), I see that a logger is set up that should be
> >> giving a lot of information[1] about the mysqlimport calls, but I
> >> don't seem to be getting any of this fun in my logs.
> >>
> >> [0]
> >>
> https://svn.apache.org/repos/asf/sqoop/trunk/src/java/org/apache/sqoop/mapreduce/MySQLExportMapper.java
> >> [1] `LOG.debug("Starting mysqlimport with arguments:");`
> >>
> >>
> >> Additional info:
> >>
> >> I have log4j.properties setup in the following basic way:
> >>
> >> log4j.rootLogger=${root.logger}
> >> root.logger=INFO,console
> >>
> >> log4j.logger.org.apache.hadoop.mapred=TRACE
> >> log4j.logger.org.apache.sqoop.mapreduce=TRACE
> >> log4j.logger.org.apache.sqoop.mapreduce.MySQLExportMapper=TRACE
> >>
> >> log4j.appender.console=org.apache.log4j.ConsoleAppender
> >> log4j.appender.console.target=System.err
> >> log4j.appender.console.layout=org.apache.log4j.PatternLayout
> >> log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss}
> >> %p %c{2}: %m%n
> >>
> >> What I have found is that the `max_allowed_packet` setting in MySQL
> >> seems to affect this behaviour somewhat but I'd rather get more
> >> information about the actual error than attempt to tweak a setting
> >> "blind".
> >>
> >> Relevant versioning:
> >>
> >> Cloudera Hadoop Distribution (5.1.2)
> >> mysqlimport: Ver 3.7 Distrib 5.5.38, for debian-linux-gnu
> >> sqoop version: 1.4.4
> >>
> >> Kind regards,
> >>
> >> Christian Verkerk
> >>
> >> --
> >> Christian Verkerk
> >> Software Engineer, Tubular Labs
> >> [email protected]
> >
> >
>
>
>
> --
> Christian Verkerk
> Software Engineer, Tubular Labs
> [email protected]
>

Reply via email to