Re: sqoop exporting from hdfs to postgresql, using --staging-table option

abhijeet gaikwad Sun, 29 Jan 2012 23:04:34 -0800

As to what I understanf:
It is loading 1000 records because there are 1000 records in your input
file. We cannot relate this to batch size because firing a select count(*)
query will give you the total records inserted in the table - not the
batches in which they were inserted.


Cannot comment on the last statement (#3) in your mail below - specially in
context with Teradata. Open for discussion :)

Thanks,
Abhijeet Gaikwad

On Sat, Jan 28, 2012 at 11:28 PM, Srinivas Surasani <vas...@gmail.com>wrote:

>
> Abhijeet --
>
> 1) By default it is loading 1000 records and as you mentioned earlier this
> can be tweaked using  "sqoop.export.records.per.statement ".
> 2) I  just run select count(*) on teradata table and seen 1000 records
> inserted at one go.
> 3) I believe  setting number of mappers > 1 works for only non-parallel
> databases.
>
> -- Srinivas
>
>
> On Sat, Jan 28, 2012 at 12:47 PM, abhijeet gaikwad <abygaikwa...@gmail.com
> > wrote:
>
>> Hi Srinivas,
>> Export with multiple mappers is allowed in SQOOP. I have exported data
>> into Sql Server as well as MySql using multiple mappers.
>>
>> Regarding the issue you are facing, I have few questions:
>> 1. How did you set the batch size, 1000 you are talking about.
>> 2. Can you share SQOOP logs in detail?
>> 3. Deadlock issue seems to have raised by Teradata. There is an equal
>> probability that a Teradata admin will be able to resolve this issue.
>>
>> Thanks,
>> Abhijeet Gaikwad
>> On 28 Jan 2012 22:16, "Srinivas Surasani" <vas...@gmail.com> wrote:
>>
>>> Hi Abhijeet --,
>>>
>>>  Thanks for the information. I have one more question. Is the exports is
>>> done always with one mapper? ( entering into table deadlocks if number of
>>> mappers set to more than one ).
>>> Also, FYI: I have observed the default number of rows inserted is 1000.
>>>
>>> Thanks,
>>> -- Srinvas
>>>
>>> On Sat, Jan 28, 2012 at 10:12 AM, abhijeet gaikwad <
>>> abygaikwa...@gmail.com> wrote:
>>>
>>>> Hi Srinivas,
>>>> Haven't played with Teradata Connector, but in general there are two
>>>> properties that drive insertions (SQOOP Export) in a table - namely:
>>>>
>>>> 1. "sqoop.export.records.per.statement" : This property is used to
>>>> specify the number of records/rows to be inserted using a single
>>>> INSERT statement. Default value is 100.
>>>> 2. "sqoop.export.statements.per.transaction" : This property is used
>>>> to specify the number the insert statements before a commit is fired -
>>>> which you can call batch size. Default value is 100.
>>>>
>>>> You can use -D hadoop argument to specify these properties at command
>>>> line. E.g. -Dsqoop.export.statements.per.transaction=50
>>>>
>>>> NOTE: Make sure you use this argument(-D) before using any of the
>>>> SQOOP tool specific arguments. See SQOOP User Guide for more details.
>>>>
>>>> Thanks,
>>>> Abhijeet Gaikwad
>>>>
>>>> On 1/26/12, Srinivas Surasani <vas...@gmail.com> wrote:
>>>> > Kathleen,
>>>> >
>>>> > Any information on below request.
>>>> >
>>>> > Hi All,
>>>> >
>>>> > I'm working on Hadoop CDH3 U0 and  Sqoop CDH3 U2.
>>>> >
>>>> > I'm trying to export csv files from HDFS to Teradata, it works well
>>>> with
>>>> > setting mapper to "1" ( with batch loading of 1000 records at a time
>>>> ).
>>>> > when I tried increasing the number of mappers to more than one I'm
>>>> getting
>>>> > the following error. Also, is it possible to configure batch size at
>>>> the
>>>> > time of export ( from the command line)??
>>>> >
>>>> >
>>>> >  sqoop export  --verbose --driver com.teradata.jdbc.TeraDriver
>>>> > --connect jdbc:teradata://xxxx/database=xxxx  --username xxxxx
>>>> --password
>>>> > xxxxx --table xxxx --export-dir /user/surasani/10minutes.txt
>>>> > --fields-terminated-by '|' -m 4 --batch
>>>> >
>>>> > 12/01/24 16:17:21 INFO mapred.JobClient:  map 3% reduce 0%
>>>> > 12/01/24 16:17:48 INFO mapred.JobClient: Task Id :
>>>> > attempt_201112211106_68553_m_000001_2, Status : FAILED
>>>> > *java.io.IOException: java.sql.BatchUpdateException: [Teradata
>>>> Database]
>>>> > [TeraJDBC 13.00.00.07] [Error 2631] [SQLState 40001] Transaction
>>>> ABORTed
>>>> > due to DeadLock*.
>>>> >
>>>> > Srinivas --
>>>> >
>>>> > On Wed, Jan 25, 2012 at 8:01 PM, Kathleen Ting <kathl...@cloudera.com
>>>> >wrote:
>>>> >
>>>> >> Srinivas, as it happens, the Cloudera Connector for Teradata supports
>>>> >> staging tables. It is freely available here:
>>>> >>
>>>> https://ccp.cloudera.com/display/con/Cloudera+Connector+for+Teradata+Download
>>>> >> .
>>>> >>
>>>> >> Regards, Kathleen
>>>> >>
>>>> >> On Wed, Jan 25, 2012 at 3:36 PM, Srinivas Surasani
>>>> >> <vas...@gmail.com>wrote:
>>>> >>
>>>> >>> Hi Kathleen,
>>>> >>>
>>>> >>> Same issue with Teradata.
>>>> >>>
>>>> >>>
>>>> >>> Srinivas --
>>>> >>>
>>>> >>>
>>>> >>> On Mon, Jan 23, 2012 at 8:26 PM, Kathleen Ting
>>>> >>> <kathl...@cloudera.com>wrote:
>>>> >>>
>>>> >>>> Hi Weihua -
>>>> >>>>
>>>> >>>> Unfortunately, the generic jdbc manager does not support staging.
>>>> >>>> As a result, I've filed
>>>> >>>> https://issues.apache.org/jira/browse/SQOOP-431on your behalf.
>>>> >>>>
>>>> >>>> Regards, Kathleen
>>>> >>>>
>>>> >>>>
>>>> >>>> On Mon, Jan 23, 2012 at 3:10 PM, Weihua Zhu <w...@adconion.com>
>>>> wrote:
>>>> >>>>
>>>> >>>>> Hi Guys,
>>>> >>>>>
>>>> >>>>>   Good afternoon!
>>>> >>>>>   I have a question. I was trying to sqoop exporting from hdfs to
>>>> >>>>> postgresql, using --staging-table options due to transactions
>>>> >>>>> consideration. But it gives me error below.
>>>> >>>>>   I am wondering if the staging_able is supported for
>>>> >>>>> GenericJdbcManager? if not, what kind of manager should I use?
>>>> >>>>>   Thanks very much!
>>>> >>>>>
>>>> >>>>>  -Weihua
>>>> >>>>>
>>>> >>>>> error message:
>>>> >>>>>
>>>> >>>>> 12/01/23 15:00:39 ERROR tool.ExportTool: Error during export: The
>>>> >>>>> active connection manager
>>>> (org.apache.sqoop.manager.GenericJdbcManager)
>>>> >>>>> does not support staging of data for export. Please retry without
>>>> >>>>> specifying the --staging-table option.
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>
>>>> >>>
>>>> >>
>>>> >
>>>>
>>>
>>>
>

Re: sqoop exporting from hdfs to postgresql, using --staging-table option

Reply via email to