Moving the discussion on apache sqoop mailing list. Please continue it here.
Regards Bejoy K S -----Original Message----- From: bejo...@gmail.com Date: Tue, 9 Aug 2011 16:54:44 To: <sqoop-u...@cloudera.org> Reply-To: bejo...@gmail.com Subject: Re: [sqoop-user] Re: Sqoop export not having reduce jobs , even table is partitions Yes Sqoop imports and exports are totally on parallel processing/ map only processes . No reduce operation required in such scenarios. You are not doing any sort of aggregated operation while performing imports and exports, hence reducer do hardly come to play. SQOOP with a reduce job, I don't have a clue. Are you looking out for some specific implementation? If so please share more details. Regards Bejoy K S -----Original Message----- From: Sonal <imsonalku...@gmail.com> Date: Tue, 9 Aug 2011 07:52:55 To: Sqoop Users<sqoop-u...@cloudera.org> Reply-To: sqoop-u...@cloudera.org Subject: [sqoop-user] Re: Sqoop export not having reduce jobs , even table is partitions Hi, Thanks for reply. So is it sqoop is just parallel processing , even if you have primary key/unique index/partition on table? Is there any case in which sqoop can make use of reduce job.? Is there any way we can set the batchsize/fetchsize in sqoop? Thanks & Regards, Sonal Kumar On Aug 9, 7:44 pm, bejo...@gmail.com wrote: > Hi Sonal > AFAIK Sqoop import and export jobs kicks of map tasks alone, both are > map only jobs. > In imports the data set to be imported is equally distributed across the > mappers and each mapper is responsible for firing its corresponding SQL > query and fetch data to hdfs. Here no reduce operation required as it is just > parallel processing(parallel fetching of data) happening under the hood. > Similar case applies for SQOOP export as well, parallel inserts happening > under the hood. For parallel processing just map tasks alone is fine no > reduce operation needed. > > Regards > Bejoy K S > > -----Original Message----- > From: Sonal <imsonalku...@gmail.com> > Date: Tue, 9 Aug 2011 04:02:10 > To: Sqoop Users<sqoop-u...@cloudera.org> > > Reply-To: sqoop-u...@cloudera.org > Subject: [sqoop-user] Sqoop export not having reduce jobs , even table is > partitions > > Hi, > > I am trying to load the data into db using sqoop export with following > command: > sqoop export --connect jdbc:oracle:thin:@adc2190481.us.oracle.com: > 45773:dooptry --username sh --password sh --export-dir $ORACLE_HOME/ > work/SALES_input --table SALES_OLH_RANGE -m 4 > > It is able to insert the data , but it is only map jobs > 11/08/09 03:57:42 WARN tool.BaseSqoopTool: Setting your password on > the command-line is insecure. Consider using -P instead. > 11/08/09 03:57:42 INFO tool.CodeGenTool: Beginning code generation > 11/08/09 03:57:42 INFO manager.OracleManager: Time zone has been set > to GMT > 11/08/09 03:57:42 INFO manager.SqlManager: Executing SQL statement: > SELECT t.* FROM SALES_OLH_RANGE t > 11/08/09 03:57:42 WARN manager.SqlManager: SQLException closing > ResultSet: java.sql.SQLException: Could not commit with auto-commit > set on > 11/08/09 03:57:42 INFO manager.SqlManager: Executing SQL statement: > SELECT t.* FROM SALES_OLH_RANGE t > 11/08/09 03:57:42 WARN manager.SqlManager: SQLException closing > ResultSet: java.sql.SQLException: Could not commit with auto-commit > set on > 11/08/09 03:57:42 INFO orm.CompilationManager: HADOOP_HOME is /usr/lib/ > hadoop > 11/08/09 03:57:42 INFO orm.CompilationManager: Found hadoop core jar > at: /usr/lib/hadoop/hadoop-0.20.2+737-core.jar > Note: /net/adc2190481/scratch/sonkumar/view_storage/sonkumar_hadooptry/ > work/./SALES_OLH_RANGE.java uses or overrides a deprecated API. > Note: Recompile with -Xlint:deprecation for details. > 11/08/09 03:57:43 INFO orm.CompilationManager: Writing jar file: /tmp/ > sqoop/compile/SALES_OLH_RANGE.jar > 11/08/09 03:57:43 INFO mapreduce.ExportJobBase: Beginning export of > SALES_OLH_RANGE > 11/08/09 03:57:44 INFO manager.OracleManager: Time zone has been set > to GMT > 11/08/09 03:57:44 INFO manager.SqlManager: Executing SQL statement: > SELECT t.* FROM SALES_OLH_RANGE t > 11/08/09 03:57:44 WARN manager.SqlManager: SQLException closing > ResultSet: java.sql.SQLException: Could not commit with auto-commit > set on > 11/08/09 03:57:44 INFO jvm.JvmMetrics: Initializing JVM Metrics with > processName=JobTracker, sessionId= > 11/08/09 03:57:44 INFO input.FileInputFormat: Total input paths to > process : 1 > 11/08/09 03:57:44 INFO input.FileInputFormat: Total input paths to > process : 1 > 11/08/09 03:57:44 INFO mapred.JobClient: Running job: job_local_0001 > 11/08/09 03:57:45 INFO mapred.JobClient: map 0% reduce 0% > 11/08/09 03:57:50 INFO mapred.LocalJobRunner: > 11/08/09 03:57:51 INFO mapred.JobClient: map 24% reduce 0% > 11/08/09 03:57:53 INFO mapred.LocalJobRunner: > 11/08/09 03:57:54 INFO mapred.JobClient: map 41% reduce 0% > 11/08/09 03:57:56 INFO mapred.LocalJobRunner: > 11/08/09 03:57:57 INFO mapred.JobClient: map 58% reduce 0% > 11/08/09 03:57:59 INFO mapred.LocalJobRunner: > 11/08/09 03:58:00 INFO mapred.JobClient: map 75% reduce 0% > 11/08/09 03:58:02 INFO mapred.LocalJobRunner: > 11/08/09 03:58:02 INFO mapred.JobClient: map 92% reduce 0% > 11/08/09 03:58:03 INFO mapreduce.AutoProgressMapper: Auto-progress > thread is finished. keepGoing=false > 11/08/09 03:58:03 INFO mapred.Task: Task:attempt_local_0001_m_000000_0 > is done. And is in the process of commiting > 11/08/09 03:58:03 INFO mapred.LocalJobRunner: > 11/08/09 03:58:03 INFO mapred.Task: Task > 'attempt_local_0001_m_000000_0' done. > 11/08/09 03:58:03 WARN mapred.FileOutputCommitter: Output path is null > in cleanup > 11/08/09 03:58:04 INFO mapred.JobClient: map 100% reduce 0% > 11/08/09 03:58:04 INFO mapred.JobClient: Job complete: job_local_0001 > 11/08/09 03:58:04 INFO mapred.JobClient: Counters: 6 > 11/08/09 03:58:04 INFO mapred.JobClient: FileSystemCounters > 11/08/09 03:58:04 INFO mapred.JobClient: FILE_BYTES_READ=41209592 > 11/08/09 03:58:04 INFO mapred.JobClient: FILE_BYTES_WRITTEN=309754 > 11/08/09 03:58:04 INFO mapred.JobClient: Map-Reduce Framework > 11/08/09 03:58:04 INFO mapred.JobClient: Map input records=918843 > 11/08/09 03:58:04 INFO mapred.JobClient: Spilled Records=0 > 11/08/09 03:58:04 INFO mapred.JobClient: SPLIT_RAW_BYTES=154 > 11/08/09 03:58:04 INFO mapred.JobClient: Map output records=918843 > 11/08/09 03:58:04 INFO mapreduce.ExportJobBase: Transferred 0 bytes in > 20.3677 seconds (0 bytes/sec) > 11/08/09 03:58:04 INFO mapreduce.ExportJobBase: Exported 918843 > records. > > why reduce jobs are not coming up? Do i have to pass some other option > as well? > > Quick reply will be appreciated. > > Thanks & Regards, > Sonal Kumar > > -- > NOTE: The mailing list sqoop-u...@cloudera.org is deprecated in favor of > Apache Sqoop mailing list sqoop-u...@incubator.apache.org. Please subscribe > to it by sending an email to incubator-sqoop-user-subscr...@apache.org. -- NOTE: The mailing list sqoop-u...@cloudera.org is deprecated in favor of Apache Sqoop mailing list sqoop-user@incubator.apache.org. Please subscribe to it by sending an email to incubator-sqoop-user-subscr...@apache.org.