Seeing an odd problem when (very rarely) sqoop (version: 1.4.7.3.1.0.0-78)
appears to bring down 1 more record that what is counted in the source. My
sqoop code looks like...

{
sqoop import \
      -Dmapreduce.map.memory.mb=3144 -Dmapreduce.map.java.opts=-Xmx1048m \
      -Dyarn.app.mapreduce.am.log.level=DEBUG \
      -Dmapreduce.map.log.level=DEBUG \
      -Dmapreduce.reduce.log.level=DEBUG \
      -Dmapred.job.name="Ora import table $tablename" \
      -Djava.security.egd=file:///dev/urandom \
      -Djava.security.egd=file:///dev/urandom \
      -Doraoop.timestamp.string=false \
      -Dmapreduce.map.max.attempts=10 \
      -Dmapreduce.task.timeout=1500000 \
      -Dorg.apache.sqoop.splitter.allow_text_splitter=true \
      --connect $DBCNXN --username $DBUSER --password $DBPASSWORD \
      --as-parquetfile \
      --target-dir $importdir \
      -query "$sqoop_query" \
      --split-by $splitby \
      --where "1=1" \
      --num-mappers 12 \
      --class-name "QueryResult_$tablename" \
      --delete-target-dir} || { echo -e "\nFailed to sqoop data from
source DB"; exit 255; }

The tables in question are overwrites not incremental (though I cannot
check the actual data between source and sqoop sink because the source
updates every day with more rows).

Anyone with more sqoop experience know what could be happening here? Any
further debugging tips for looking into this?

* Note that for the two tables this is happening to, for one table A the
splitby is a non-numerical type (varchar) and the other table B has a
composite PK (though my splitby column here is numeric). Both of these
issue would complicate sqoop operations under normal circumstances, so
makes things even harder to debug in this case, since could be causing
other problems (though not totally sure what such side effects
<https://community.cloudera.com/t5/Support-Questions/Sqoop-import-composite-primary-key-and-textual-primary-key/td-p/145994>
 might be
<https://community.cloudera.com/t5/Support-Questions/Sqoop-imported-more-records-than-source/td-p/174724>
).

-- 
This electronic message is intended only for the named 
recipient, and may 
contain information that is confidential or 
privileged. If you are not the 
intended recipient, you are 
hereby notified that any disclosure, copying, 
distribution or 
use of the contents of this message is strictly 
prohibited. If 
you have received this message in error or are not the 
named
recipient, please notify us immediately by contacting the 
sender at 
the electronic mail address noted above, and delete 
and destroy all copies 
of this message. Thank you.

Reply via email to