Hi,

I have an scenario where new line character exists in data. Because of new line 
character, number of records in Target is more than in source. Every record 
that has new line character in the data is broken and it appears as 2 records 
in hive. When I use cat and pipe it to wc -l, I am getting right counts, but 
when I use hadoop streaming to get the counts from HDFS files, I am getting 
more records because of the issue with new line character. Also in Hive 
External table, when I query the counts of records, it is more and the record 
is split has 2 records from the new line position. Is there an workaround in 
Sqoop/Hive to handle this scenario, so hive can ignore new line character if it 
is part of the data.

We are in HDP 2.1 with sqoop 1.4.4 and hive 0.13 version.

Appreciate your help with this.

Thanks,
Karthik

Reply via email to