reducers and data locality

2012-04-27 Thread mete
Hello folks,

I have a lot of input splits (10k-50k - 128 mb blocks) which contains text
files. I need to process those line by line, then copy the result into
roughly equal size of shards.

So i generate a random key (from a range of [0:numberOfShards]) which is
used to route the map output to different reducers and the size is more
less equal.

I know that this is not really efficient and i was wondering if i could
somehow control how keys are routed.
For example could i generate the randomKeys with hostname prefixes and
control which keys are sent to each reducer? What do you think?

Kind regards
Mete


Re: Hbql with Hbase-0.90.4

2012-04-27 Thread Manu S
Hi,

I am trying to install Hbql on pseudo distributed node. I am not sure how
to build the *hbase-trx-0.90.0-DEV-2.jar* from hbase-transactional package
which was downloaded from *
https://github.com/hbase-trx/hbase-transactional-tableindexed*

Appreciate your help on the same.
-- 

 Thanks  Regards
 
 *Manu S*
 SI Engineer - OpenSource  HPC
 Wipro Infotech
 Mob: +91 8861302855Skype: manuspkd
 www.opensourcetalk.co.in






-- 
Thanks  Regards

*Manu S*
SI Engineer - OpenSource  HPC
Wipro Infotech
Mob: +91 8861302855Skype: manuspkd
www.opensourcetalk.co.in


Re: reducers and data locality

2012-04-27 Thread Bejoy KS
Hi Mete

A custom Paritioner class can control the flow of keys to the desired reducer. 
It gives you more control on which key to which reducer.


Regards
Bejoy KS

Sent from handheld, please excuse typos.

-Original Message-
From: mete efk...@gmail.com
Date: Fri, 27 Apr 2012 09:19:21 
To: common-user@hadoop.apache.org
Reply-To: common-user@hadoop.apache.org
Subject: reducers and data locality

Hello folks,

I have a lot of input splits (10k-50k - 128 mb blocks) which contains text
files. I need to process those line by line, then copy the result into
roughly equal size of shards.

So i generate a random key (from a range of [0:numberOfShards]) which is
used to route the map output to different reducers and the size is more
less equal.

I know that this is not really efficient and i was wondering if i could
somehow control how keys are routed.
For example could i generate the randomKeys with hostname prefixes and
control which keys are sent to each reducer? What do you think?

Kind regards
Mete



Re: Namenode not formatted after format

2012-04-27 Thread Harsh J
Unfortunately in 1.x the format command's prompt is case-sensitive
(Fixed in 2.x):

You had:
Re-format filesystem in /app/hadoop/name ? (Y or N) y
Format aborted in /app/hadoop/name

Answer with a capital Y instead and it won't abort.

On Fri, Apr 27, 2012 at 3:07 PM, Mathias Schnydrig smath...@ee.ethz.ch wrote:
 Hi

 I am setting up a hadoop cluster with 5 slaves and a master, after the
 single node installation it worked fine, but after I went to a multinode
 cluster the namenode prints this message even after formatting the hadoop
 namenode: see below
 I also added the config files.

 The folders I am using exist on all nodes and the hadoop folder is placed
 also on all nodes in the same folder and the config files are all the same.

 I suppose it is some stupid error I did, as I am quite new to hadoop.

 Regards
 Mathias

 hduser@POISN-server:/usr/local/hadoop$ bin/hadoop namenode -format
 Warning: $HADOOP_HOME is deprecated.

 12/04/27 10:39:52 INFO namenode.NameNode: STARTUP_MSG:
 /
 STARTUP_MSG: Starting NameNode
 STARTUP_MSG:   host = POISN-server/127.0.1.1
 STARTUP_MSG:   args = [-format]
 STARTUP_MSG:   version = 1.0.2
 STARTUP_MSG:   build =
 https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0.2 -r
 1304954; compiled by 'hortonfo' on Sat Mar 24 23:58:21 UTC 2012
 /
 Re-format filesystem in /app/hadoop/name ? (Y or N) y
 Format aborted in /app/hadoop/name
 12/04/27 10:39:55 INFO namenode.NameNode: SHUTDOWN_MSG:
 /
 SHUTDOWN_MSG: Shutting down NameNode at POISN-server/127.0.1.1
 /
 hduser@POISN-server:/usr/local/hadoop$ bin/hadoop namenode
 Warning: $HADOOP_HOME is deprecated.

 12/04/27 10:40:04 INFO namenode.NameNode: STARTUP_MSG:
 /
 STARTUP_MSG: Starting NameNode
 STARTUP_MSG:   host = POISN-server/127.0.1.1
 STARTUP_MSG:   args = []
 STARTUP_MSG:   version = 1.0.2
 STARTUP_MSG:   build =
 https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0.2 -r
 1304954; compiled by 'hortonfo' on Sat Mar 24 23:58:21 UTC 2012
 /
 12/04/27 10:40:04 INFO impl.MetricsConfig: loaded properties from
 hadoop-metrics2.properties
 12/04/27 10:40:04 INFO impl.MetricsSourceAdapter: MBean for source
 MetricsSystem,sub=Stats registered.
 12/04/27 10:40:04 INFO impl.MetricsSystemImpl: Scheduled snapshot period at
 10 second(s).
 12/04/27 10:40:04 INFO impl.MetricsSystemImpl: NameNode metrics system
 started
 12/04/27 10:40:05 INFO impl.MetricsSourceAdapter: MBean for source ugi
 registered.
 12/04/27 10:40:05 WARN impl.MetricsSystemImpl: Source name ugi already
 exists!
 12/04/27 10:40:05 INFO impl.MetricsSourceAdapter: MBean for source jvm
 registered.
 12/04/27 10:40:05 INFO impl.MetricsSourceAdapter: MBean for source NameNode
 registered.
 12/04/27 10:40:05 INFO util.GSet: VM type       = 64-bit
 12/04/27 10:40:05 INFO util.GSet: 2% max memory = 17.77875 MB
 12/04/27 10:40:05 INFO util.GSet: capacity      = 2^21 = 2097152 entries
 12/04/27 10:40:05 INFO util.GSet: recommended=2097152, actual=2097152
 12/04/27 10:40:05 INFO namenode.FSNamesystem: fsOwner=hduser
 12/04/27 10:40:05 INFO namenode.FSNamesystem: supergroup=supergroup
 12/04/27 10:40:05 INFO namenode.FSNamesystem: isPermissionEnabled=true
 12/04/27 10:40:05 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
 12/04/27 10:40:05 INFO namenode.FSNamesystem: isAccessTokenEnabled=false
 accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
 12/04/27 10:40:05 INFO namenode.FSNamesystem: Registered
 FSNamesystemStateMBean and NameNodeMXBean
 12/04/27 10:40:05 INFO namenode.NameNode: Caching file names occuring more
 than 10 times
 12/04/27 10:40:05 ERROR namenode.FSNamesystem: FSNamesystem initialization
 failed.
 java.io.IOException: NameNode is not formatted.
    at
 org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:325)
    at
 org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:100)
 ...

 12/04/27 10:40:05 INFO namenode.NameNode: SHUTDOWN_MSG:
 /
 SHUTDOWN_MSG: Shutting down NameNode at POISN-server/127.0.1.1
 /


 hduser@POISN-server:/usr/local/hadoop/conf$ cat *-site.xml

 core-site.xml  hdfs-site.xml  mapred-site.xml

 ?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?

 !-- Put site-specific property overrides in this file. --

 configuration
 property
 namefs.default.name/name
 valuehdfs://ClusterMaster:9000/value
 /property
 /configuration
 ?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?

 !-- Put site-specific property overrides in 

cygwin single node setup

2012-04-27 Thread Onder SEZGIN
Hi,

I am pretty a newbie and i am following the quick start guide for single
node set up on windows using cygwin.

In this step,

$ bin/hadoop fs -put conf input

I am getting the following errors.

I have got no files
under /user/EXT0125622/input/conf/capacity-scheduler.xml. That might be a
reason for the errors i get but why does hadoop look for such directory as
i have not configured anything like that. so supposably, hadoop is making
up and looking for such file and directory?

Any idea and help is welcome.

Cheers
Onder

12/04/27 13:44:37 WARN hdfs.DFSClient: DataStreamer Exception:
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/user/EXT0125622/input/conf/capacity-scheduler.xml could only be replicated
to 0 nodes, instead of 1
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)

at org.apache.hadoop.ipc.Client.call(Client.java:1066)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at $Proxy1.addBlock(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3507)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3370)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2700(DFSClient.java:2586)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2826)

12/04/27 13:44:37 WARN hdfs.DFSClient: Error Recovery for block null bad
datanode[0] nodes == null
12/04/27 13:44:37 WARN hdfs.DFSClient: Could not get block locations.
Source file /user/EXT0125622/input/conf/capacity-scheduler.xml -
Aborting...
put: java.io.IOException: File
/user/EXT0125622/input/conf/capacity-scheduler.xml could only be replicated
to 0 nodes, instead of 1
12/04/27 13:44:37 ERROR hdfs.DFSClient: Exception closing file
/user/EXT0125622/input/conf/capacity-scheduler.xml :
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/user/EXT0125622/input/conf/capacity-scheduler.xml could only be replicated
to 0 nodes, instead of 1
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:563)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1388)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1384)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1382)

org.apache.hadoop.ipc.RemoteException: java.io.IOException: File
/user/EXT0125622/input/conf/capacity-scheduler.xml could only be replicated
to 0 nodes, instead of 1
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1558)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:696)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at 

Re: DFSClient error

2012-04-27 Thread John George
Can you run a regular 'hadoop fs' (put orls or get) command?
If yes, how about a wordcount example?
'path/hadoop jar pathhadoop-*examples*.jar wordcount input output'


-Original Message-
From: Mohit Anchlia mohitanch...@gmail.com
Reply-To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Date: Fri, 27 Apr 2012 14:36:49 -0700
To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Subject: Re: DFSClient error

I even tried to reduce number of jobs but didn't help. This is what I see:

datanode logs:

Initializing secure datanode resources
Successfully obtained privileged resources (streaming port =
ServerSocket[addr=/0.0.0.0,localport=50010] ) (http listener port =
sun.nio.ch.ServerSocketChannelImpl[/0.0.0.0:50075])
Starting regular datanode initialization
26/04/2012 17:06:51 9858 jsvc.exec error: Service exit with a return value
of 143

userlogs:

2012-04-26 19:35:22,801 WARN
org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library is
available
2012-04-26 19:35:22,801 INFO
org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library
loaded
2012-04-26 19:35:22,808 INFO
org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded 
initialized native-zlib library
2012-04-26 19:35:22,903 INFO org.apache.hadoop.hdfs.DFSClient: Failed to
connect to /125.18.62.197:50010, add to deadNodes and continue
java.io.EOFException
at java.io.DataInputStream.readShort(DataInputStream.java:298)
at
org.apache.hadoop.hdfs.DFSClient$RemoteBlockReader.newBlockReader(DFSClien
t.java:1664)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.getBlockReader(DFSClient.j
ava:2383)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java
:2056)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2170)
at java.io.DataInputStream.read(DataInputStream.java:132)
at
org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(Decompr
essorStream.java:97)
at
org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorSt
ream.java:87)
at
org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.j
ava:75)
at java.io.InputStream.read(InputStream.java:85)
at
org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:205)
at org.apache.hadoop.util.LineReader.readLine(LineReader.java:169)
at
org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRe
cordReader.java:114)
at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:109)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordRead
er.nextKeyValue(PigRecordReader.java:187)
at
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapT
ask.java:456)
at
org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.
java:1157)
at org.apache.hadoop.mapred.Child.main(Child.java:264)
2012-04-26 19:35:22,906 INFO org.apache.hadoop.hdfs.DFSClient: Failed to
connect to /125.18.62.204:50010, add to deadNodes and continue
java.io.EOFException

namenode logs:

2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.JobTracker: Job
job_201204261140_0244 added successfully for user 'hadoop' to queue
'default'
2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.JobTracker:
Initializing job_201204261140_0244
2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.AuditLogger:
USER=hadoop  IP=125.18.62.196OPERATION=SUBMIT_JOB
TARGET=job_201204261140_0244RESULT=SUCCESS
2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.JobInProgress:
Initializing job_201204261140_0244
2012-04-26 16:12:53,581 INFO org.apache.hadoop.hdfs.DFSClient: Exception
in
createBlockOutputStream 125.18.62.198:50010 java.io.IOException: Bad
connect ack with firstBadLink as 125.18.62.197:50010
2012-04-26 16:12:53,581 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
block blk_2499580289951080275_22499
2012-04-26 16:12:53,582 INFO org.apache.hadoop.hdfs.DFSClient: Excluding
datanode 125.18.62.197:50010
2012-04-26 16:12:53,594 INFO org.apache.hadoop.mapred.JobInProgress:
jobToken generated and stored with users keys in
/data/hadoop/mapreduce/job_201204261140_0244/jobToken
2012-04-26 16:12:53,598 INFO org.apache.hadoop.mapred.JobInProgress: Input
size for job job_201204261140_0244 = 73808305. Number of splits = 1
2012-04-26 16:12:53,598 INFO org.apache.hadoop.mapred.JobInProgress:

Re: DFSClient error

2012-04-27 Thread Mohit Anchlia
After all the jobs fail I can't run anything. Once I restart the cluster I
am able to run other jobs with no problems, hadoop fs and other io
intensive jobs run just fine.

On Fri, Apr 27, 2012 at 3:12 PM, John George john...@yahoo-inc.com wrote:

 Can you run a regular 'hadoop fs' (put orls or get) command?
 If yes, how about a wordcount example?
 'path/hadoop jar pathhadoop-*examples*.jar wordcount input output'


 -Original Message-
 From: Mohit Anchlia mohitanch...@gmail.com
 Reply-To: common-user@hadoop.apache.org common-user@hadoop.apache.org
 Date: Fri, 27 Apr 2012 14:36:49 -0700
 To: common-user@hadoop.apache.org common-user@hadoop.apache.org
 Subject: Re: DFSClient error

 I even tried to reduce number of jobs but didn't help. This is what I see:
 
 datanode logs:
 
 Initializing secure datanode resources
 Successfully obtained privileged resources (streaming port =
 ServerSocket[addr=/0.0.0.0,localport=50010] ) (http listener port =
 sun.nio.ch.ServerSocketChannelImpl[/0.0.0.0:50075])
 Starting regular datanode initialization
 26/04/2012 17:06:51 9858 jsvc.exec error: Service exit with a return value
 of 143
 
 userlogs:
 
 2012-04-26 19:35:22,801 WARN
 org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library is
 available
 2012-04-26 19:35:22,801 INFO
 org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library
 loaded
 2012-04-26 19:35:22,808 INFO
 org.apache.hadoop.io.compress.zlib.ZlibFactory: Successfully loaded 
 initialized native-zlib library
 2012-04-26 19:35:22,903 INFO org.apache.hadoop.hdfs.DFSClient: Failed to
 connect to /125.18.62.197:50010, add to deadNodes and continue
 java.io.EOFException
 at java.io.DataInputStream.readShort(DataInputStream.java:298)
 at
 org.apache.hadoop.hdfs.DFSClient$RemoteBlockReader.newBlockReader(DFSClien
 t.java:1664)
 at
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.getBlockReader(DFSClient.j
 ava:2383)
 at
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java
 :2056)
 at
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:2170)
 at java.io.DataInputStream.read(DataInputStream.java:132)
 at
 org.apache.hadoop.io.compress.DecompressorStream.getCompressedData(Decompr
 essorStream.java:97)
 at
 org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorSt
 ream.java:87)
 at
 org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.j
 ava:75)
 at java.io.InputStream.read(InputStream.java:85)
 at
 org.apache.hadoop.util.LineReader.readDefaultLine(LineReader.java:205)
 at org.apache.hadoop.util.LineReader.readLine(LineReader.java:169)
 at
 org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRe
 cordReader.java:114)
 at org.apache.pig.builtin.PigStorage.getNext(PigStorage.java:109)
 at
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordRead
 er.nextKeyValue(PigRecordReader.java:187)
 at
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapT
 ask.java:456)
 at
 org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:270)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.
 java:1157)
 at org.apache.hadoop.mapred.Child.main(Child.java:264)
 2012-04-26 19:35:22,906 INFO org.apache.hadoop.hdfs.DFSClient: Failed to
 connect to /125.18.62.204:50010, add to deadNodes and continue
 java.io.EOFException
 
 namenode logs:
 
 2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.JobTracker: Job
 job_201204261140_0244 added successfully for user 'hadoop' to queue
 'default'
 2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.JobTracker:
 Initializing job_201204261140_0244
 2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.AuditLogger:
 USER=hadoop  IP=125.18.62.196OPERATION=SUBMIT_JOB
 TARGET=job_201204261140_0244RESULT=SUCCESS
 2012-04-26 16:12:53,562 INFO org.apache.hadoop.mapred.JobInProgress:
 Initializing job_201204261140_0244
 2012-04-26 16:12:53,581 INFO org.apache.hadoop.hdfs.DFSClient: Exception
 in
 createBlockOutputStream 125.18.62.198:50010 java.io.IOException: Bad
 connect ack with firstBadLink as 125.18.62.197:50010
 2012-04-26 16:12:53,581 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning
 block blk_2499580289951080275_22499
 2012-04-26 16:12:53,582 INFO org.apache.hadoop.hdfs.DFSClient: Excluding
 datanode 125.18.62.197:50010
 2012-04-26 16:12:53,594 INFO 

Node-wide Combiner

2012-04-27 Thread Superymk

Hi all,

I am a newbie in Hadoop and I like the system. I have one question: Is 
there a node-wide combiner or something similar in Hadoop? I think it 
can reduce the number of intermediate results in further. Any hint?


Thanks a lot!

Superymk