Re: [sqoop-user] Re: Sqoop export not having reduce jobs , even table is partitions

bejoyks Tue, 09 Aug 2011 10:06:01 -0700

Moving the discussion on apache sqoop mailing list. Please continue it here.


Regards
Bejoy K S

-----Original Message-----
From: bejo...@gmail.com
Date: Tue, 9 Aug 2011 16:54:44 
To: <sqoop-u...@cloudera.org>
Reply-To: bejo...@gmail.com
Subject: Re: [sqoop-user] Re: Sqoop export not having reduce jobs , even table 
is partitions

Yes Sqoop imports and exports are totally on parallel processing/ map only 
processes . No reduce operation required in such scenarios.
 You are not  doing any sort of aggregated operation while performing imports 
and exports, hence reducer do hardly come to play.
SQOOP with a reduce job, I don't have a clue. Are you looking out for some 
specific implementation? If so please share more details.

Regards
Bejoy K S

-----Original Message-----
From: Sonal <imsonalku...@gmail.com>
Date: Tue, 9 Aug 2011 07:52:55 
To: Sqoop Users<sqoop-u...@cloudera.org>
Reply-To: sqoop-u...@cloudera.org
Subject: [sqoop-user] Re: Sqoop export not having reduce jobs , even table is 
partitions

Hi,

Thanks for reply.
So is it sqoop is just parallel processing , even if you have primary
key/unique index/partition on table?

Is there any case in which sqoop can make use of reduce job.?
Is there any way we can set the batchsize/fetchsize in sqoop?

Thanks & Regards,
Sonal Kumar


On Aug 9, 7:44 pm, bejo...@gmail.com wrote:
> Hi Sonal
>         AFAIK Sqoop import and export jobs kicks of map tasks alone, both are 
> map only jobs.
>  In imports the data set to be imported is equally distributed across the 
> mappers and each mapper is responsible for firing its corresponding  SQL 
> query and fetch data to hdfs. Here no reduce operation required as it is just 
>  parallel processing(parallel fetching of data) happening under the hood. 
> Similar case applies for SQOOP export as well, parallel inserts happening 
> under the hood. For parallel processing just map tasks alone is fine no 
> reduce operation needed.
>
> Regards
> Bejoy K S
>
> -----Original Message-----
> From: Sonal <imsonalku...@gmail.com>
> Date: Tue, 9 Aug 2011 04:02:10
> To: Sqoop Users<sqoop-u...@cloudera.org>
>
> Reply-To: sqoop-u...@cloudera.org
> Subject: [sqoop-user] Sqoop export not having reduce jobs , even table is 
> partitions
>
> Hi,
>
> I am trying to load the data into db using sqoop export with following
> command:
> sqoop export --connect jdbc:oracle:thin:@adc2190481.us.oracle.com:
> 45773:dooptry --username sh --password sh --export-dir $ORACLE_HOME/
> work/SALES_input --table SALES_OLH_RANGE -m 4
>
> It is able to insert the data , but it is only map jobs
> 11/08/09 03:57:42 WARN tool.BaseSqoopTool: Setting your password on
> the command-line is insecure. Consider using -P instead.
> 11/08/09 03:57:42 INFO tool.CodeGenTool: Beginning code generation
> 11/08/09 03:57:42 INFO manager.OracleManager: Time zone has been set
> to GMT
> 11/08/09 03:57:42 INFO manager.SqlManager: Executing SQL statement:
> SELECT t.* FROM SALES_OLH_RANGE t
> 11/08/09 03:57:42 WARN manager.SqlManager: SQLException closing
> ResultSet: java.sql.SQLException: Could not commit with auto-commit
> set on
> 11/08/09 03:57:42 INFO manager.SqlManager: Executing SQL statement:
> SELECT t.* FROM SALES_OLH_RANGE t
> 11/08/09 03:57:42 WARN manager.SqlManager: SQLException closing
> ResultSet: java.sql.SQLException: Could not commit with auto-commit
> set on
> 11/08/09 03:57:42 INFO orm.CompilationManager: HADOOP_HOME is /usr/lib/
> hadoop
> 11/08/09 03:57:42 INFO orm.CompilationManager: Found hadoop core jar
> at: /usr/lib/hadoop/hadoop-0.20.2+737-core.jar
> Note: /net/adc2190481/scratch/sonkumar/view_storage/sonkumar_hadooptry/
> work/./SALES_OLH_RANGE.java uses or overrides a deprecated API.
> Note: Recompile with -Xlint:deprecation for details.
> 11/08/09 03:57:43 INFO orm.CompilationManager: Writing jar file: /tmp/
> sqoop/compile/SALES_OLH_RANGE.jar
> 11/08/09 03:57:43 INFO mapreduce.ExportJobBase: Beginning export of
> SALES_OLH_RANGE
> 11/08/09 03:57:44 INFO manager.OracleManager: Time zone has been set
> to GMT
> 11/08/09 03:57:44 INFO manager.SqlManager: Executing SQL statement:
> SELECT t.* FROM SALES_OLH_RANGE t
> 11/08/09 03:57:44 WARN manager.SqlManager: SQLException closing
> ResultSet: java.sql.SQLException: Could not commit with auto-commit
> set on
> 11/08/09 03:57:44 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=JobTracker, sessionId=
> 11/08/09 03:57:44 INFO input.FileInputFormat: Total input paths to
> process : 1
> 11/08/09 03:57:44 INFO input.FileInputFormat: Total input paths to
> process : 1
> 11/08/09 03:57:44 INFO mapred.JobClient: Running job: job_local_0001
> 11/08/09 03:57:45 INFO mapred.JobClient:  map 0% reduce 0%
> 11/08/09 03:57:50 INFO mapred.LocalJobRunner:
> 11/08/09 03:57:51 INFO mapred.JobClient:  map 24% reduce 0%
> 11/08/09 03:57:53 INFO mapred.LocalJobRunner:
> 11/08/09 03:57:54 INFO mapred.JobClient:  map 41% reduce 0%
> 11/08/09 03:57:56 INFO mapred.LocalJobRunner:
> 11/08/09 03:57:57 INFO mapred.JobClient:  map 58% reduce 0%
> 11/08/09 03:57:59 INFO mapred.LocalJobRunner:
> 11/08/09 03:58:00 INFO mapred.JobClient:  map 75% reduce 0%
> 11/08/09 03:58:02 INFO mapred.LocalJobRunner:
> 11/08/09 03:58:02 INFO mapred.JobClient:  map 92% reduce 0%
> 11/08/09 03:58:03 INFO mapreduce.AutoProgressMapper: Auto-progress
> thread is finished. keepGoing=false
> 11/08/09 03:58:03 INFO mapred.Task: Task:attempt_local_0001_m_000000_0
> is done. And is in the process of commiting
> 11/08/09 03:58:03 INFO mapred.LocalJobRunner:
> 11/08/09 03:58:03 INFO mapred.Task: Task
> 'attempt_local_0001_m_000000_0' done.
> 11/08/09 03:58:03 WARN mapred.FileOutputCommitter: Output path is null
> in cleanup
> 11/08/09 03:58:04 INFO mapred.JobClient:  map 100% reduce 0%
> 11/08/09 03:58:04 INFO mapred.JobClient: Job complete: job_local_0001
> 11/08/09 03:58:04 INFO mapred.JobClient: Counters: 6
> 11/08/09 03:58:04 INFO mapred.JobClient:   FileSystemCounters
> 11/08/09 03:58:04 INFO mapred.JobClient:     FILE_BYTES_READ=41209592
> 11/08/09 03:58:04 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=309754
> 11/08/09 03:58:04 INFO mapred.JobClient:   Map-Reduce Framework
> 11/08/09 03:58:04 INFO mapred.JobClient:     Map input records=918843
> 11/08/09 03:58:04 INFO mapred.JobClient:     Spilled Records=0
> 11/08/09 03:58:04 INFO mapred.JobClient:     SPLIT_RAW_BYTES=154
> 11/08/09 03:58:04 INFO mapred.JobClient:     Map output records=918843
> 11/08/09 03:58:04 INFO mapreduce.ExportJobBase: Transferred 0 bytes in
> 20.3677 seconds (0 bytes/sec)
> 11/08/09 03:58:04 INFO mapreduce.ExportJobBase: Exported 918843
> records.
>
> why reduce jobs are not coming up? Do i have to pass some other option
> as well?
>
> Quick reply will be appreciated.
>
> Thanks & Regards,
> Sonal Kumar
>
> --
> NOTE: The mailing list sqoop-u...@cloudera.org is deprecated in favor of 
> Apache Sqoop mailing list sqoop-u...@incubator.apache.org. Please subscribe 
> to it by sending an email to incubator-sqoop-user-subscr...@apache.org.

-- 
NOTE: The mailing list sqoop-u...@cloudera.org is deprecated in favor of Apache 
Sqoop mailing list sqoop-user@incubator.apache.org. Please subscribe to it by 
sending an email to incubator-sqoop-user-subscr...@apache.org.

Re: [sqoop-user] Re: Sqoop export not having reduce jobs , even table is partitions

Reply via email to