Hi All, We are using '<EOL>' string( --hive-delims-replacement '<EOL>') to convert new lines chars in oracle fields while importing data into hive using sqoop. According to sqoop documentation - http://sqoop.apache.org/docs/1.4.3/SqoopUserGuide.html#_large_objects - above parameter should only replace either \n, \r or \01(^A) characters with '<EOL>'. But we seeing that some special characters are also getting replaced to '<EOL>'
Our scenario: Oracle Field Hive Field Notepad ++ Word MEIKICOMPANY,LTD MEIKI<EOL>COMPANY,LTD [Screen capture] MEIKI__COMPANY,LTD AVENTIS@PHARMA AVENTIS<EOL>@PHARMA [Screen capture] AVENTIS_@PHARMA But, some character in above sample which is NOT visible in Oracle is being shown up as 'SOH' in notepad++ and as '_' in word which is being converted into <EOL> by sqoop. Please help us understand this behavior. What does these chars mean to sqoop/hive? Is sqoop expected to replace these chars which doesn't fall under either \n, \r or \01(^A) ? [http://www.cisco.com/web/europe/images/email/signature/logo05.jpg] Vikash Talanki Engineer - Software [email protected] Phone: +1 (408)838 4078 Cisco Systems Limited SJ-J 3 255 W Tasman Dr San Jose CA - 95134 United States Cisco.com<http://www.cisco.com/> [Think before you print.]Think before you print. This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message. For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/index.html
