Yogesh, Is unique_value in this case SAL? I'm a bit confused about your query.
Do you have the option of running this query on a separate database somewhere to find the issue? I think it would be interesting to see the initial state and then the state after running an incremental import. That would tell us how many results are being imported after sqoop has ran and we can validate each step. Also, please use the --verbose flag to get the most out of the logs. -Abe On Mon, Jan 13, 2014 at 4:38 AM, yogesh kumar <[email protected]> wrote: > Hello All, > > I am trying to do incremental import on daily basis and after importing I > am finding huge data loss. > > I have used this script for incremental import from RDBMS to HDFS > > sqoop import -libjars > --driver com.sybase.jdbc3.jdbc.SybDriver \ > --query "select * from > from EMP where \$CONDITIONS and SAL > 201401200 and SAL <= 201401204 \ > --check-column Unique_value \ > --incremental append \ > --last-value 201401200 \ > --split-by DEPT \ > --fields-terminated-by ',' \ > --target-dir ${TARGET_DIR}/${INC} \ > --username ${SYBASE_USERNAME} \ > --password ${SYBASE_PASSWORD} \ > > > now I have imported newly inserted data into RDBMS to HDFS > > but when I do > > *select count(*) , unique_value from EMP group by unique_value (both in > RDBMS and in HIVE)* > > I can find huge data loss. > > 1) in RDBMS > > Count(*) Unique_value > 1000 201401201 > 5000 201401202 > 10000 201401203 > > > 2) in HIVE > > Count(*) Unique_value > 189 201401201 > 421 201401202 > 50 201401203 > > > If I do > > select Unique value from emp ; > > Result : > 201401201 > 201401201 > 201401201 > 201401201 > 201401201 > . > . > 201401202 > . > . > and so on... > > > *Pls help and suggest why is it so * > > > *Many thanks in advance* > > *Yogesh kumar* > > > >
