Chuck, Zoltán, In Sqoop 2, it has been discussed that connections will allow the specification of a resource policy in that resources will be managed by limiting the total number of physical Connections open at one time and with an option to disable Connections.
More info: https://blogs.apache.org/sqoop/entry/apache_sqoop_highlights_of_sqoop Regards, Kathleen On Wed, Sep 12, 2012 at 8:08 AM, Connell, Chuck <[email protected]> wrote: > In my opinion, this is not a Sqoop problem. It is related to the RDBMS and > the way it handles high-volume updates. Those updates might be coming from > Sqoop, or they might be coming from a realtime stock market price feed. > > > > I would go ahead and test the system as is. Let Sqoop do all its updates. If > you actually have a problem with inconsistencies or poor performance, then I > would deal with it as a purely MySQL issue. > > > > (A low-tech approach… run the sqoop jobs at night??) > > > > Chuck > > > > > > From: Zoltán Tóth-Czifra [mailto:[email protected]] > Sent: Wednesday, September 12, 2012 10:48 AM > To: [email protected] > Subject: Throttling inserts to avoid replication lags > > > > Hi guys, > > > > We are using Sqoop (cdh3u3) to export Hive tables to relational databases. > Usually these databases are only used by business intelligence to further > analyze and filter the data. However, in certain cases we need to export to > relational databases that are heavily accessed by our products and users. > > > > Our concern is that Sqoop exports would interfere with this random access of > our users. Tempotal inconsistency of the data can be solved with a staging > table and an atomic swap, however, we are concerned about the replication > lag between the master and the slaves. > > > > If we write large data quickly with Sqoop to the master (even to a staging > table), that takes time to be replicated to the slaves (minutes) and causes > an inconsistency we can't allow, that is, other writes from our users will > be queued up. I wonder if any of you had similar problems. We are talking > about a MySQL cluster by the way. > > > > For what I know, Sqoop doesn't have any built-in throttle funcionality (for > example a delay between inserts). We have been thinking to solve this with a > proxy, but the existing solutions on the market are very incomplete. > > > > Any other idea? The more transparent the best. > > > > Thanks!
