Perhaps someone here can point me in the direction of an answer.

I loaded a 123.5 M row table onto Hadoop using a SAS data step. After 
completion the reported rows on Hadoop are 212M. With some investigation, the 
additional 89M rows are coming from an embedded ASCII 13 character. If the data 
table is cleaned of "off-keyboard" characters, ASCII < 32 and > 126, the data 
step loads and the correct rows are reported.

We cannot clean 100's of Tb of data. Is there a system parameter on Hadoop that 
could help?

Thanks so much,
Steve

LEGAL DISCLAIMER
The information transmitted is intended solely for the individual or entity to 
which it is addressed and may contain confidential and/or privileged material. 
Any review, retransmission, dissemination or other use of or taking action in 
reliance upon this information by persons or entities other than the intended 
recipient is prohibited. If you have received this email in error please 
contact the sender and delete the material from any computer.
By replying to this e-mail, you consent to SunTrust's monitoring activities of 
all communication that occurs on SunTrust's systems.
SunTrust is a federally registered service mark of SunTrust Banks, Inc.
[ST:XCL]

Reply via email to