Hello,

I am working on utilizing Sqoop to export data from HDFS to SQL Server 2012 but 
cannot seem to get very good performance.

Below are some details about what I’m trying to do:

  *   Export 100 million, 3-column records per day (totaling about 4GB)
  *   Records are stored in CSV format in part files of about 20MB
  *   Utilize our current cluster running Sqoop 1.4.2 and Hadoop 0.20
  *   Eventually utilize our new cluster running Sqoop 1.4.5 and Hadoop 2.4 
(Currently doesn’t have YARN, so we have had to run HADOOP_MAPREDUCE_MODE as 
Classic)

While running Sqoop 1.4.2 using the sqljdbc4.jar, Sqoop only transfers about 1 
million records per hour (50MB).
Running Sqoop 1.4.5 using sqljdbc41.jar, Sqoop only transfers about 2 million 
records per hour (100MB).

Below is the Sqoop command I’m executing for this job.  Based on many 
executions, the records.per.statement, and statements.per.transaction, never 
seem to change performance.

sqoop export -D sqoop.export.records.per.statement=100 -D 
sqoop.export.statements.per.transaction=100 --connect 
'jdbc:sqlserver://[SERVER_NAME];username=[USERNAME];password=[PASSWORD];database=[DB
 NAME]' —table=[TABLE_NAME] --export-dir /[DATA DIR] 
--input-fields-terminated-by ',' --input-lines-terminated-by '\n' —verbose

Adding the batch or direct flags have only ever made performance worse so at 
the moment, they are not included.



When running the above statement, we have recorded network captures of the 
Sqoop traffic to SQL server, as well as recorded SQL Server activity via the 
Profiler.  Both of these captures show only single records being inserted at a 
time into SQL Server.

Our SQL Server activity ends up looking like this:

RPC:Completed   exec sp_execute 
116097,N'ffdbb68a84686edd029bde9766192b15',N'north_america',37  Microsoft JDBC 
Driver for SQL Server
RPC:Completed   exec sp_execute 
116097,N'ffdbdc2bb770a13a3b4eb86836341ab3',N'south_america',1   Microsoft JDBC 
Driver for SQL Server
RPC:Completed   exec sp_execute 
116097,N'ffdbfefc63e77861d708f3f5d82086fa',N'eastern_asia',1    Microsoft JDBC 
Driver for SQL Server
RPC:Completed   exec sp_execute 
116097,N'ffdc59a3a42aa2ef0fb60fd3f604a96c',N'southeast_asia',1  Microsoft JDBC 
Driver for SQL Server
RPC:Completed   exec sp_execute 
116097,N'ffdcbd667479a412923d03c6f305eb11',N'eastern_asia',12   Microsoft JDBC 
Driver for SQL Server
RPC:Completed   exec sp_execute 
116097,N'ffdd1cc279a54921c97cef463c6773fb',N'western_europe',37 Microsoft JDBC 
Driver for SQL Server
RPC:Completed   exec sp_execute 
116097,N'ffdd437610d1dc1cb8aa77f04bf21581',N'south_america',10  Microsoft JDBC 
Driver for SQL Server
RPC:Completed   exec sp_execute 
116097,N'ffdd8204335231b2138dab577bc8d906',N'northern_europe',1 Microsoft JDBC 
Driver for SQL Server
RPC:Completed   exec sp_execute 
116097,N'ffde22f7f6a9d2913336f44d9db269e4',N'north_america',1   Microsoft JDBC 
Driver for SQL Server
RPC:Completed   exec sp_execute 
116097,N'ffde2e3ae2a90ad5796e1d6f5fb3e1f2',N'eastern_asia',13   Microsoft JDBC 
Driver for SQL Server
RPC:Completed   exec sp_execute 
116097,N'ffde56a01846c01225145ed043b2f43b',N'north_america',2   Microsoft JDBC 
Driver for SQL Server
RPC:Completed   exec sp_execute 
116097,N'ffde90f6f14a00533327cc39562f086f',N'south_america',1   Microsoft JDBC 
Driver for SQL Server
RPC:Completed   exec sp_execute 
116097,N'ffdea65ab1b1091150924b5d89d441c2',N'eastern_asia',1    Microsoft JDBC 
Driver for SQL Server
RPC:Completed   exec sp_execute 
116097,N'ffdeef3353f992fc74d5cfc54533aa49',N'north_america',1   Microsoft JDBC 
Driver for SQL Server
RPC:Completed   exec sp_execute 
116097,N'ffdefae31deecb4ff80e76bea129e4dd',N'western_europe',1  Microsoft JDBC 
Driver for SQL Server
RPC:Completed   exec sp_execute 
116097,N'ffdf11270af0e9ee7841355e966d99b7',N'south_asia',1      Microsoft JDBC 
Driver for SQL Server
RPC:Completed   exec sp_execute 
116097,N'ffdf71938015bb64e371aaedb9700ab1',N'eastern_europe',1  Microsoft JDBC 
Driver for SQL Server



Now, if I export the same data to MySQL using Sqoop, it exports data at closer 
to 15 million records an hour (roughly 7 times faster).

Is this a known limitation to Sqoop or the Microsoft JDBC driver, or is there 
something I can change on the Sqoop or SQL server side to speed up this 
transfer?

Thanks,

Greg

Reply via email to