Seeing an odd problem when (very rarely) sqoop (version: 1.4.7.3.1.0.0-78) appears to bring down 1 more record that what is counted in the source. My sqoop code looks like...
{ sqoop import \ -Dmapreduce.map.memory.mb=3144 -Dmapreduce.map.java.opts=-Xmx1048m \ -Dyarn.app.mapreduce.am.log.level=DEBUG \ -Dmapreduce.map.log.level=DEBUG \ -Dmapreduce.reduce.log.level=DEBUG \ -Dmapred.job.name="Ora import table $tablename" \ -Djava.security.egd=file:///dev/urandom \ -Djava.security.egd=file:///dev/urandom \ -Doraoop.timestamp.string=false \ -Dmapreduce.map.max.attempts=10 \ -Dmapreduce.task.timeout=1500000 \ -Dorg.apache.sqoop.splitter.allow_text_splitter=true \ --connect $DBCNXN --username $DBUSER --password $DBPASSWORD \ --as-parquetfile \ --target-dir $importdir \ -query "$sqoop_query" \ --split-by $splitby \ --where "1=1" \ --num-mappers 12 \ --class-name "QueryResult_$tablename" \ --delete-target-dir} || { echo -e "\nFailed to sqoop data from source DB"; exit 255; } The tables in question are overwrites not incremental (though I cannot check the actual data between source and sqoop sink because the source updates every day with more rows). Anyone with more sqoop experience know what could be happening here? Any further debugging tips for looking into this? * Note that for the two tables this is happening to, for one table A the splitby is a non-numerical type (varchar) and the other table B has a composite PK (though my splitby column here is numeric). Both of these issue would complicate sqoop operations under normal circumstances, so makes things even harder to debug in this case, since could be causing other problems (though not totally sure what such side effects <https://community.cloudera.com/t5/Support-Questions/Sqoop-import-composite-primary-key-and-textual-primary-key/td-p/145994> might be <https://community.cloudera.com/t5/Support-Questions/Sqoop-imported-more-records-than-source/td-p/174724> ). -- This electronic message is intended only for the named recipient, and may contain information that is confidential or privileged. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.