Hi, I have some data in text file on HDFS and want to export this data into MySQL database. But I want Sqoop to use "|" as record delimiter instead of default "\n" record delimiter.
So I am specifying ' --input-lines-terminated-by "|" ' option in my Sqoop export command. The export succeeds, but the number of records exported shown is only 1. And when I check in MySQL target table, I see only one record. Looks like only one record before first "|" is getting exported. Here's sample data on HDFS: 1,Hello|2,How|3,Are|4,You|5,I|6,am|7,fine| Sqoop Export command: bin/sqoop export --connect 'jdbc:mysql://localhost/mydb' -password pwd --username usr --table mytable --export-dir data --input-lines-terminated-by "|" Console logs: 12/05/17 03:32:02 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 12/05/17 03:32:02 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 12/05/17 03:32:02 INFO tool.CodeGenTool: Beginning code generation 12/05/17 03:32:02 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `mytable` AS t LIMIT 1 12/05/17 03:32:03 INFO orm.CompilationManager: HADOOP_HOME is /home/tushar/hadoop-0.20.2-cdh3u4 Note: /tmp/sqoop-tushar/compile/fd6d3bfd4c2ed7f2e19a2de418993dfc/mytable.java uses or overrides a deprecated API. Note: Recompile with -Xlint:deprecation for details. 12/05/17 03:32:04 INFO orm.CompilationManager: Writing jar file: /tmp/sqooptushar/compile/fd6d3bfd4c2ed7f2e19a2de418993dfc/mytable.jar 12/05/17 03:32:04 INFO mapreduce.ExportJobBase: Beginning export of mytable 12/05/17 03:32:06 INFO input.FileInputFormat: Total input paths to process : 1 12/05/17 03:32:06 INFO input.FileInputFormat: Total input paths to process : 1 12/05/17 03:32:06 INFO mapred.JobClient: Running job: job_201205110542_0432 12/05/17 03:32:07 INFO mapred.JobClient: map 0% reduce 0% 12/05/17 03:32:13 INFO mapred.JobClient: map 100% reduce 0% 12/05/17 03:32:14 INFO mapred.JobClient: Job complete: job_201205110542_0432 12/05/17 03:32:14 INFO mapred.JobClient: Counters: 16 12/05/17 03:32:14 INFO mapred.JobClient: Job Counters 12/05/17 03:32:14 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=6685 12/05/17 03:32:14 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 12/05/17 03:32:14 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 12/05/17 03:32:14 INFO mapred.JobClient: Launched map tasks=1 12/05/17 03:32:14 INFO mapred.JobClient: Data-local map tasks=1 12/05/17 03:32:14 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 12/05/17 03:32:14 INFO mapred.JobClient: FileSystemCounters 12/05/17 03:32:14 INFO mapred.JobClient: HDFS_BYTES_READ=166 12/05/17 03:32:14 INFO mapred.JobClient: FILE_BYTES_WRITTEN=79082 12/05/17 03:32:14 INFO mapred.JobClient: Map-Reduce Framework 12/05/17 03:32:14 INFO mapred.JobClient: Map input records=1 12/05/17 03:32:14 INFO mapred.JobClient: Physical memory (bytes) snapshot=68677632 12/05/17 03:32:14 INFO mapred.JobClient: Spilled Records=0 12/05/17 03:32:14 INFO mapred.JobClient: CPU time spent (ms)=1130 12/05/17 03:32:14 INFO mapred.JobClient: Total committed heap usage (bytes)=39911424 12/05/17 03:32:14 INFO mapred.JobClient: Virtual memory (bytes) snapshot=392290304 12/05/17 03:32:14 INFO mapred.JobClient: Map output records=1 12/05/17 03:32:14 INFO mapred.JobClient: SPLIT_RAW_BYTES=117 12/05/17 03:32:14 INFO mapreduce.ExportJobBase: Transferred 166 bytes in 9.6013 seconds (17.2893 bytes/sec) 12/05/17 03:32:14 INFO mapreduce.ExportJobBase: Exported 1 records. On MySQL side: mysql> select * from mytable; +------+-------+ | i | name | +------+-------+ | 1 | Hello | +------+-------+ 1 row in set (0.00 sec) Sqoop version is: sqoop-1.4.1-incubating__hadoop-1.0.0 Hadoop Version: CDH3u4 Doen't Sqoop support any other record delimiter than "\n" or am I missing something? Please suggest solution for this. Thanks, Tushar
