Hi
Downloaded and compiled atlas 0.7.
Hive hook is working - create table [tablename] as select * from [src
tablename] is working and data lineage is generated in atlas.
Next I tried sqoop hook and followed
http://atlas.incubator.apache.org/Bridge-Sqoop.html
Command:
sqoop-import --connect jdbc:mysql://mysqlhost/test --table sqoop_test
--split-by id --hive-import -hive-table sqoop_test19 --username margusja --P
creates a new table in Hive and new table is in atlas also but no data
lineage
I see from
http://hortonworks.com/hadoop-tutorial/cross-component-lineage-apache-atlas/
There I can see that extra config parameters are loaded (in picture
https://raw.githubusercontent.com/hortonworks/tutorials/atlas-ranger-tp/assets/cross-component-lineage-with-atlas/8-sqoop-import-finish.png)
and kafka producer creating ouutput but in my command:
sqoop-import --connect jdbc:mysql://mysqlhost/test --table sqoop_test
--split-by id --hive-import -hive-table sqoop_test19 --username margusja --P
there is no extra output only:
Warning: /usr/hdp/2.4.0.0-169/accumulo does not exist! Accumulo imports
will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
16/06/20 21:25:47 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6.2.4.0.0-169
16/06/20 21:25:47 WARN tool.BaseSqoopTool: Setting your password on the
command-line is insecure. Consider using -P instead.
16/06/20 21:25:47 INFO tool.BaseSqoopTool: Using Hive-specific
delimiters for output. You can override
16/06/20 21:25:47 INFO tool.BaseSqoopTool: delimiters with
--fields-terminated-by, etc.
16/06/20 21:25:47 INFO manager.MySQLManager: Preparing to use a MySQL
streaming resultset.
16/06/20 21:25:47 INFO tool.CodeGenTool: Beginning code generation
16/06/20 21:25:47 INFO manager.SqlManager: Executing SQL statement:
SELECT t.* FROM `sqoop_test` AS t LIMIT 1
16/06/20 21:25:47 INFO manager.SqlManager: Executing SQL statement:
SELECT t.* FROM `sqoop_test` AS t LIMIT 1
16/06/20 21:25:47 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is
/usr/hdp/2.4.0.0-169/hadoop-mapreduce
Note:
/tmp/sqoop-root/compile/49b525e14ebd68542d86b68dc399bd84/sqoop_test.java
uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/06/20 21:25:48 INFO orm.CompilationManager: Writing jar file:
/tmp/sqoop-root/compile/49b525e14ebd68542d86b68dc399bd84/sqoop_test.jar
16/06/20 21:25:48 WARN manager.MySQLManager: It looks like you are
importing from mysql.
16/06/20 21:25:48 WARN manager.MySQLManager: This transfer can be
faster! Use the --direct
16/06/20 21:25:48 WARN manager.MySQLManager: option to exercise a
MySQL-specific fast path.
16/06/20 21:25:48 INFO manager.MySQLManager: Setting zero DATETIME
behavior to convertToNull (mysql)
16/06/20 21:25:48 INFO mapreduce.ImportJobBase: Beginning import of
sqoop_test
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in
[jar:file:/usr/hdp/2.4.0.0-169/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in
[jar:file:/usr/hdp/2.4.0.0-169/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
16/06/20 21:25:50 INFO impl.TimelineClientImpl: Timeline service
address: http://bigdata21.webmedia.int:8188/ws/v1/timeline/
16/06/20 21:25:50 INFO client.RMProxy: Connecting to ResourceManager at
bigdata21.webmedia.int/192.168.81.110:8050
16/06/20 21:25:52 INFO db.DBInputFormat: Using read commited transaction
isolation
16/06/20 21:25:52 INFO db.DataDrivenDBInputFormat: BoundingValsQuery:
SELECT MIN(`id`), MAX(`id`) FROM `sqoop_test`
16/06/20 21:25:52 INFO mapreduce.JobSubmitter: number of splits:2
16/06/20 21:25:52 INFO mapreduce.JobSubmitter: Submitting tokens for
job: job_1460979043517_0118
16/06/20 21:25:53 INFO impl.YarnClientImpl: Submitted application
application_1460979043517_0118
16/06/20 21:25:53 INFO mapreduce.Job: The url to track the job:
http://bigdata21.webmedia.int:8088/proxy/application_1460979043517_0118/
16/06/20 21:25:53 INFO mapreduce.Job: Running job: job_1460979043517_0118
16/06/20 21:25:58 INFO mapreduce.Job: Job job_1460979043517_0118 running
in uber mode : false
16/06/20 21:25:58 INFO mapreduce.Job: map 0% reduce 0%
16/06/20 21:26:02 INFO mapreduce.Job: map 50% reduce 0%
16/06/20 21:26:03 INFO mapreduce.Job: map 100% reduce 0%
16/06/20 21:26:03 INFO mapreduce.Job: Job job_1460979043517_0118
completed successfully
16/06/20 21:26:03 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=310818
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=197
HDFS: Number of bytes written=20
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Job Counters
Launched map tasks=2
Other local map tasks=2
Total time spent by all maps in occupied slots (ms)=4353
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=4353
Total vcore-seconds taken by all map tasks=4353
Total megabyte-seconds taken by all map tasks=2785920
Map-Reduce Framework
Map input records=2
Map output records=2
Input split bytes=197
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=70
CPU time spent (ms)=1780
Physical memory (bytes) snapshot=355676160
Virtual memory (bytes) snapshot=4937265152
Total committed heap usage (bytes)=154140672
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=20
16/06/20 21:26:03 INFO mapreduce.ImportJobBase: Transferred 20 bytes in
13.8509 seconds (1.444 bytes/sec)
16/06/20 21:26:03 INFO mapreduce.ImportJobBase: Retrieved 2 records.
16/06/20 21:26:03 INFO manager.SqlManager: Executing SQL statement:
SELECT t.* FROM `sqoop_test` AS t LIMIT 1
16/06/20 21:26:03 INFO hive.HiveImport: Loading uploaded data into Hive
Logging initialized using configuration in
jar:file:/usr/hdp/2.4.0.0-169/hive/lib/hive-common-1.2.1000.2.4.0.0-169.jar!/hive-log4j.properties
OK
Time taken: 2.035 seconds
Loading data to table default.sqoop_test19
Table default.sqoop_test19 stats: [numFiles=4, totalSize=40]
OK
Time taken: 1.043 seconds
I suspect that maybe atlas-application.properties or sqoop-site.xml is
not read during the sqoop import command. How to debug it?
--
Margus (margusja) Roo
http://margus.roo.ee
skype: margusja
+372 51 48 780