I am having an issue getting sqoop to update data correctly using
last-value. After creating the job the first run goes great. The job config
store shows the new updated date. The second run, the job also performs
great doing the update you would expect it to. But at the end when it would
normally write the new last value to the job, it fails with a ClassNotFound
exception. After this, the last-value date never changes.
The classfile that it says is missing is the same as the job name. In the
jar file that gets generated, only a class named codegen_sqooptest.class
exists. I've tried naively specifying the classfile to use the name
"codegen_sqooptest" but this does not change the behavior or the name of
the class it is looking for.
Is there more that I need to do to set this up correctly?
Below please find the description of the mysql table being sqooped and the
sqoop job descriptions spewed from `sqoop job --show sqooptest` before each
run.
Please help me understand what I am missing?
Thank you very kindly in advance!
My Table:
CREATE TABLE `sqooptest` (
`id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
`value` varchar(255) DEFAULT NULL,
`name` varchar(255) DEFAULT NULL,
`last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE
CURRENT_TIMESTAMP,
PRIMARY KEY (`id`),
KEY `name` (`name`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8
Initial sqoop job:
17/01/26 18:04:58 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Job: sqooptest
Tool: import
Options:
----------------------------
verbose = false
incremental.last.value = 1987-05-22 02:02:02
db.connect.string = jdbc:mysql://mydb.notyours.bs/sandbox
codegen.java.classname = codegen_sqooptest
codegen.output.delimiters.escape = 0
codegen.output.delimiters.enclose.required = false
codegen.input.delimiters.field = 0
hbase.create.table = false
hdfs.append.dir = false
db.table = sqooptest
codegen.input.delimiters.escape = 0
accumulo.create.table = false
import.fetch.size = null
codegen.input.delimiters.enclose.required = false
db.username = notyou
reset.onemapper = false
codegen.output.delimiters.record = 10
import.max.inline.lob.size = 16777216
hbase.bulk.load.enabled = false
hcatalog.create.table = false
db.clear.staging.table = false
incremental.col = last_update
codegen.input.delimiters.record = 0
db.password.file = /.sqooppass_khan
enable.compression = false
hive.overwrite.table = false
hive.import = false
codegen.input.delimiters.enclose = 0
accumulo.batch.size = 10240000
hive.drop.delims = false
codegen.output.delimiters.enclose = 0
hdfs.delete-target.dir = false
codegen.output.dir = .
codegen.auto.compile.dir = false
relaxed.isolation = false
mapreduce.num.mappers = 4
accumulo.max.latency = 5000
import.direct.split.size = 0
codegen.output.delimiters.field = 44
export.new.update = UpdateOnly
incremental.mode = DateLastModified
hdfs.file.format = ParquetFile
codegen.compile.dir = /root/sqoop_jobs/
direct.import = false
hdfs.target.dir = /sqooptest
hive.fail.table.exists = false
merge.key.col = id
db.batch = false
The results of the job:
17/01/26 18:07:14 INFO mapred.LocalJobRunner: map task executor complete.
17/01/26 18:07:15 INFO mapreduce.Job: Job job_local146376087_0001 completed
successfully
17/01/26 18:07:16 INFO mapreduce.Job: Counters: 20
File System Counters
FILE: Number of bytes read=72124136
FILE: Number of bytes written=73909624
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=89728
HDFS: Number of bytes written=5442596
HDFS: Number of read operations=626
HDFS: Number of large read operations=0
HDFS: Number of write operations=130
Map-Reduce Framework
Map input records=59269 Map output
records=59269 Input split bytes=433
Spilled Records=0 Failed Shuffles=0
Merged Map outputs=0 GC time elapsed (ms)=345
Total committed heap usage (bytes)=1811939328 File
Input Format Counters Bytes Read=0 File Output
Format Counters
Bytes Written=017/01/26 18:07:16 INFO
mapreduce.ImportJobBase: Transferred 5.1905
MB in 14.2325 seconds (373.4433 KB/sec)17/01/26 18:07:16 INFO
mapreduce.ImportJobBase: Retrieved 59269 records.
17/01/26 18:07:16 INFO tool.ImportTool: Saving incremental import state to
the metastore17/01/26 18:07:16 INFO tool.ImportTool: Updated data for job:
sqoo
ptest
the config after the first run:
17/01/26 18:08:09 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Job: sqooptest
Tool: import
Options:
----------------------------
verbose = false
incremental.last.value = 2017-01-26 18:07:00.0
db.connect.string = jdbc:mysql://mydb.notyours.bs/sandbox
codegen.java.classname = codegen_sqooptest
codegen.output.delimiters.escape = 0
codegen.output.delimiters.enclose.required = false
codegen.input.delimiters.field = 0
hbase.create.table = false
hdfs.append.dir = false
db.table = sqooptest
codegen.input.delimiters.escape = 0
accumulo.create.table = false
import.fetch.size = null
codegen.input.delimiters.enclose.required = false
db.username = notyou
reset.onemapper = false
codegen.output.delimiters.record = 10
import.max.inline.lob.size = 16777216
hbase.bulk.load.enabled = false
hcatalog.create.table = false
db.clear.staging.table = false
incremental.col = last_update
codegen.input.delimiters.record = 0
db.password.file = /.sqooppass_khan
enable.compression = false
hive.overwrite.table = false
hive.import = false
codegen.input.delimiters.enclose = 0
accumulo.batch.size = 10240000
hive.drop.delims = false
codegen.output.delimiters.enclose = 0
hdfs.delete-target.dir = false
codegen.output.dir = .
codegen.auto.compile.dir = false
relaxed.isolation = false
mapreduce.num.mappers = 4
accumulo.max.latency = 5000
import.direct.split.size = 0
codegen.output.delimiters.field = 44
export.new.update = UpdateOnly
incremental.mode = DateLastModified
hdfs.file.format = ParquetFile
codegen.compile.dir = /root/sqoop_jobs/
direct.import = false
hdfs.target.dir = /sqooptest
hive.fail.table.exists = false
merge.key.col = id
db.batch = false
Everything looks right but if add a few records to the table and run the
job again there is a failure in the saving of the job:
17/01/26 18:09:33 INFO mapred.LocalJobRunner: map task executor complete.
17/01/26 18:09:35 INFO mapreduce.Job: Job job_local1796782311_0001
completed successfully
17/01/26 18:09:35 INFO mapreduce.Job: Counters: 20
File System Counters
FILE: Number of bytes read=72124136
FILE: Number of bytes written=73915944
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=89818
HDFS: Number of bytes written=52253
HDFS: Number of read operations=626
HDFS: Number of large read operations=0
HDFS: Number of write operations=130
Map-Reduce Framework
Map input records=100
Map output records=100
Input split bytes=433
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=141
Total committed heap usage (bytes)=1958739968
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
17/01/26 18:09:35 INFO mapreduce.ImportJobBase: Transferred 51.0283 KB in
11.2063 seconds (4.5535 KB/sec)
17/01/26 18:09:35 INFO mapreduce.ImportJobBase: Retrieved 100 records.
17/01/26 18:09:35 ERROR tool.ImportTool: Encountered IOException running
import job: java.io.IOException: Could not load jar
/root/sqoop_jobs/codegen_sqooptest.jar into JVM. (Could not find class
sqooptest.)
at org.apache.sqoop.util.ClassLoaderStack.addJarFile(
ClassLoaderStack.java:92)
at com.cloudera.sqoop.util.ClassLoaderStack.addJarFile(
ClassLoaderStack.java:36)
at org.apache.sqoop.tool.ImportTool.loadJars(ImportTool.java:114)
at org.apache.sqoop.tool.ImportTool.lastModifiedMerge(
ImportTool.java:449)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:506)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.tool.JobTool.execJob(JobTool.java:228)
at org.apache.sqoop.tool.JobTool.run(JobTool.java:283)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
Caused by: java.lang.ClassNotFoundException: sqooptest
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:789)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.sqoop.util.ClassLoaderStack.addJarFile(
ClassLoaderStack.java:88)
... 13 more
The job config did not update the last-value:
17/01/26 18:10:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
Job: sqooptest
Tool: import
Options:
----------------------------
verbose = false
incremental.last.value = 2017-01-26 18:07:00.0
db.connect.string = jdbc:mysql://mydb.notyours.bs/sandbox
codegen.java.classname = codegen_sqooptest
codegen.output.delimiters.escape = 0
codegen.output.delimiters.enclose.required = false
codegen.input.delimiters.field = 0
hbase.create.table = false
hdfs.append.dir = false
db.table = sqooptest
codegen.input.delimiters.escape = 0
accumulo.create.table = false
import.fetch.size = null
codegen.input.delimiters.enclose.required = false
db.username = notyou
reset.onemapper = false
codegen.output.delimiters.record = 10
import.max.inline.lob.size = 16777216
hbase.bulk.load.enabled = false
hcatalog.create.table = false
db.clear.staging.table = false
incremental.col = last_update
codegen.input.delimiters.record = 0
db.password.file = /.sqooppass_khan
enable.compression = false
hive.overwrite.table = false
hive.import = false
codegen.input.delimiters.enclose = 0
accumulo.batch.size = 10240000
hive.drop.delims = false
codegen.output.delimiters.enclose = 0
hdfs.delete-target.dir = false
codegen.output.dir = .
codegen.auto.compile.dir = false
relaxed.isolation = false
mapreduce.num.mappers = 4
accumulo.max.latency = 5000
import.direct.split.size = 0
codegen.output.delimiters.field = 44
export.new.update = UpdateOnly
incremental.mode = DateLastModified
hdfs.file.format = ParquetFile
codegen.compile.dir = /root/sqoop_jobs/
direct.import = false
hdfs.target.dir = /sqooptest
hive.fail.table.exists = false
merge.key.col = id
db.batch = false