In my opinion, this is not a Sqoop problem. It is related to the RDBMS and the 
way it handles high-volume updates. Those updates might be coming from Sqoop, 
or they might be coming from a realtime stock market price feed.

I would go ahead and test the system as is. Let Sqoop do all its updates. If 
you actually have a problem with inconsistencies or poor performance, then I 
would deal with it as a purely MySQL issue.

(A low-tech approach... run the sqoop jobs at night??)

Chuck


From: Zoltán Tóth-Czifra [mailto:[email protected]]
Sent: Wednesday, September 12, 2012 10:48 AM
To: [email protected]
Subject: Throttling inserts to avoid replication lags

Hi guys,

We are using Sqoop (cdh3u3) to export Hive tables to relational databases. 
Usually these databases are only used by business intelligence to further 
analyze and filter the data. However, in certain cases we need to export to 
relational databases that are heavily accessed by our products and users.

Our concern is that Sqoop exports would interfere with this random access of 
our users. Tempotal inconsistency of the data can be solved with a staging 
table and an atomic swap, however, we are concerned about the replication lag 
between the master and the slaves.

If we write large data quickly with Sqoop to the master (even to a staging 
table), that takes time to be replicated to the slaves (minutes) and causes an 
inconsistency we can't allow, that is, other writes from our users will be 
queued up. I wonder if any of you had similar problems. We are talking about a 
MySQL cluster by the way.

For what I know, Sqoop doesn't have any built-in throttle funcionality (for 
example a delay between inserts). We have been thinking to solve this with a 
proxy, but the existing solutions on the market are very incomplete.

Any other idea? The more transparent the best.

Thanks!

Reply via email to