As Ravi noted, non-numeric keys are not reliable and can result in both duplicate as well as missing rows. When using a non-numeric key for split-by you should observe a warning in the debug console output.
Markus Kemper Customer Operations Engineer [image: www.cloudera.com] <http://www.cloudera.com> On Fri, Sep 23, 2016 at 10:11 AM, Ravi, Chandramouli < [email protected]> wrote: > It won't work well when Primary key is alpha numeric. I think data will be > skewed or won't come back as expected creating non-balanced split files. > > Specify different numeric index as Split key if numeric primary key is not > present. > > > > *From:* Selvam Raman [mailto:[email protected]] > *Sent:* Friday, September 23, 2016 10:09 AM > *To:* [email protected] > *Subject:* sqoop import for UUID(primary key) > > > > Hi, > > > > In Sqoop If i am having primary key (Number value) and number of parallel > task then it will work (max-min/number of task), to pull the data from > table to hdfs. > > > > suppose if i have the primary key as UUID(alpha numeric value), how the > load will be distributed. > > > > Thank you for your help. > > > > -- > > Selvam Raman > "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து" > > > **NOTICE: This e-mail message, including any attachments hereto, is for > the sole use of the intended recipient(s) and may contain confidential > and/or privileged information. If you are not the intended recipient(s), > any unauthorized review, use, copying, disclosure or distribution is > prohibited. If you are not the intended recipient(s), please contact the > sender by reply e-mail immediately and destroy the original and all copies > (including electronic versions) of this message and any of its attachments. >
