Hi, It looks like sqoop will bundle 100 records into a transaction by default. I would try setting sqoop.export.statements.per.transaction down from to see if that works.
Brock On Fri, Mar 30, 2012 at 1:01 PM, Chalcy <[email protected]> wrote: > It is not bad formatting. I checked that. I following is what I do, > > I tried creating the hive table like this, > > > > Create table my table (……….)ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' > LINES TERMINATED BY '\n' STORED AS TEXTFILE; > > > > And then I would add about 95 rows into this hive table and use the sqoop > below to export. It works, so we know data is intact. Modify the number of > rows in the hive table to be 100 and run the same export – fails with the > error below. > > > > sqoop export --connect > 'jdbc:sqlserver://mymserverconnetion;username=myuserid;password=mypwd;database=mydb' > --table mysqltable -m 1 --input-fields-terminated-by '\t' > --input-lines-terminated-by '\n' --export-dir /warehouse/stg_chalcy/ > > > > With 95 rows, if I check the database table, I do get 95 rows with data > intact. It is as if treating all the data as one record(math 22 columns * > 100 > 2100) > > > > Thanks, > > Chalcy > > > On Fri, Mar 30, 2012 at 6:51 AM, Steve Souza <[email protected]> wrote: >> >> I'm no sqoop expert, but could it be bad formatting in your data. Have >> you looked at the row and seen for example if it has embedded commas in the >> row? I have used export successfully on bigger tables. Copy the first 90 >> rows or even the first row many times to make them greater than the original >> tables 2 million. Being as it looks like they are ok, it would be good to >> see if that job succeeds. >> >> On Fri, Mar 30, 2012 at 11:46 AM, Chalcy <[email protected]> wrote: >>> >>> >>> Hello, >>> >>> >>> I am trying to do a sqoop export (data from hdfs hadoop to database). The >>> table I am trying to export has 2 million rows. The table has 20 fields. >>> The sqoop command is successful if I did 10 rows till 95 rows. When I try >>> anything more than 95, the sqoop export fails with the following error. >>> >>> >>> By googling I get that this a dbms limitation. Is there anyway to >>> configure to fix this error? I am surprised that it works for few rows. >>> >>> >>> >>> Any help is appreciated. >>> >>> >>> Thanks, >>> CHalcy >>> >>> >>> >>> 12/03/29 09:00:59 INFO mapred.JobClient: Task Id : >>> attempt_201203230811_0539_m_000000_0, Status : FAILED >>> >>> java.io.IOException: com.microsoft.sqlserver.jdbc.SQLServerException: The >>> incoming tabular data stream (TDS) remote procedure call (RPC) protocol >>> stream is incorrect. Too many parameters were provided in this RPC request. >>> The maximum is 2100. >>> >>> at >>> com.cloudera.sqoop.mapreduce.AsyncSqlRecordWriter.close(AsyncSqlRecordWriter.java:189) >>> >>> at >>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:540) >>> >>> at >>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:649) >>> >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) >>> >>> at org.apache.hadoop.mapred.Child$4.run(Child.java:270) >>> >>> at java.security.AccessController.doPrivileged(Native Method) >>> >>> at javax.security.auth.Subject.doAs(Subject.java:396) >>> >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157) >>> >>> at org.apache.hadoop.mapred.Child.main(Child.java:264) >>> >>> Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The incoming >>> tabular data stream (TDS) remote procedure call (RPC) protocol stream is >>> incorrect. >>> >>> 12/03/29 09:01:05 INFO mapred.JobClient: Task Id : >>> attempt_201203230811_0539_m_000000_1, Status : FAILED >>> >>> java.io.IOException: com.microsoft.sqlserver.jdbc.SQLServerException: The >>> incoming tabular data stream (TDS) remote procedure call (RPC) protocol >>> stream is incorrect. Too many parameters were provided in this RPC request. >>> The maximum is 2100. >>> >>> at >>> com.cloudera.sqoop.mapreduce.AsyncSqlRecordWriter.close(AsyncSqlRecordWriter.java:189) >>> >>> at >>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:540) >>> >>> at >>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:649) >>> >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) >>> >>> at org.apache.hadoop.mapred.Child$4.run(Child.java:270) >>> >>> at java.security.AccessController.doPrivileged(Native Method) >>> >>> at javax.security.auth.Subject.doAs(Subject.java:396) >>> >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157) >>> >>> at org.apache.hadoop.mapred.Child.main(Child.java:264) >>> >>> Caused by: com.microsoft.sqlserver.jdbc.SQLServerException: The incoming >>> tabular data stream (TDS) remote procedure call (RPC) protocol stream is >>> incorrect. >>> >>> 12/03/29 09:01:10 INFO mapred.JobClient: Task Id : >>> attempt_201203230811_0539_m_000000_2, Status : FAILED >>> >>> java.io.IOException: com.microsoft.sqlserver.jdbc.SQLServerException: The >>> incoming tabular data stream (TDS) remote procedure call (RPC) protocol >>> stream is incorrect. Too many parameters were provided in this RPC request. >>> The maximum is 2100. >>> >>> at >>> com.cloudera.sqoop.mapreduce.AsyncSqlRecordWriter.close(AsyncSqlRecordWriter.java:189) >>> >>> at >>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:540) >>> >>> at >>> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:649) >>> >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323) >>> >>> at org.apache.hadoop.mapred.Child$4.run(Child.java:270) >>> >>> at java.security.AccessController.doPrivileged(Native Method) >>> >>> at javax.security.auth.Subject.doAs(Subject.java:396) >>> >>> at >>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157) >>> >>> at org.apache.hadoop.mapred.Child.main(Child.java:264) >>> >>> >> > -- Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/
