For MySQL you would either want to use Debezium's connector (which can
handle bulk dump + incremental CDC, but requires direct access to the
binlog) or the JDBC connector (does an initial bulk dump + incremental
queries, but has limitations compared to a "true" CDC solution).

Sqoop and the JDBC connector will have largely similar limitations since
they do not look at the database's transaction logs directly. They also get
similar benefits as a result: don't need the same access rights, don't need
to be colocated, etc. A direct CDC connector gets the opposite set of
tradeoffs -- needs direct colocation (though it could be on a replica), but
also gets direct access to transactions, doesn't end up with limitations on
transaction duration as described in http://docs.confluent.io/3.
1.2/connect/connect-jdbc/docs/source_connector.html#incremental-query-modes,
etc.

-Ewen

On Thu, Jan 26, 2017 at 12:43 PM, Buntu Dev <buntu...@gmail.com> wrote:

> I'm looking for ways to bulk/incremental import from MySQL database to
> HDFS. Currently I got Sqoop that does the bulk import creating a Hive
> table.
>
> Wanted to know the pros/cons of using JDBC connector instead of Sqoop and
> are there any MySQL config changes expected (like binlog configuration in
> the case of CDC connectors) to import insert/alter/delete statement.
>
>
> Thanks!
>

Reply via email to