For MySQL you would either want to use Debezium's connector (which can handle bulk dump + incremental CDC, but requires direct access to the binlog) or the JDBC connector (does an initial bulk dump + incremental queries, but has limitations compared to a "true" CDC solution).
Sqoop and the JDBC connector will have largely similar limitations since they do not look at the database's transaction logs directly. They also get similar benefits as a result: don't need the same access rights, don't need to be colocated, etc. A direct CDC connector gets the opposite set of tradeoffs -- needs direct colocation (though it could be on a replica), but also gets direct access to transactions, doesn't end up with limitations on transaction duration as described in http://docs.confluent.io/3. 1.2/connect/connect-jdbc/docs/source_connector.html#incremental-query-modes, etc. -Ewen On Thu, Jan 26, 2017 at 12:43 PM, Buntu Dev <buntu...@gmail.com> wrote: > I'm looking for ways to bulk/incremental import from MySQL database to > HDFS. Currently I got Sqoop that does the bulk import creating a Hive > table. > > Wanted to know the pros/cons of using JDBC connector instead of Sqoop and > are there any MySQL config changes expected (like binlog configuration in > the case of CDC connectors) to import insert/alter/delete statement. > > > Thanks! >