Re: migrate cluster to different datacenter

2012-08-07 Thread Patrick Angeles
It would help to know your data ingest and processing patterns (and any
applicable SLAs).

In most cases, you'd only need to move the raw ingested data, then you can
derive the rest in the other cluster. Assuming that you have some sort of
date-based partitioning on the ingest, then it's easy to define a cut-off
point.

Depending on your read SLAs, you could tee writes to both clusters for a
period of time, or just simply switch off to the new one once the majority
of data has been moved.

Finally, you would want to do a consistency check to make sure everything
made it to the other side... maybe run a checksum on derived data on both
clusters and compare. Something like that...

- P


On Fri, Aug 3, 2012 at 5:19 PM, Patai Sangbutsarakum 
silvianhad...@gmail.com wrote:

 thanks for response.
 Physical move is not a choice in this case. Purely looking for copying
 data and how to catch up with the update of a file while it is being
 migrated.

 On Fri, Aug 3, 2012 at 12:40 PM, Chen He airb...@gmail.com wrote:
  sometimes, physically moving hard drives helps.   :)
  On Aug 3, 2012 1:50 PM, Patai Sangbutsarakum silvianhad...@gmail.com
  wrote:
 
  Hi Hadoopers,
 
  We have a plan to migrate Hadoop cluster to a different datacenter
  where we can triple the size of the cluster.
  Currently, our 0.20.2 cluster have around 1PB of data. We use only
  Java/Pig.
 
  I would like to get some input how we gonna handle with transferring
  1PB of data to a new site, and also keep up with
  new files that thrown into cluster all the time.
 
  Happy friday !!
 
  P
 



Re: migrate cluster to different datacenter

2012-08-07 Thread Michael Segel
The OP hasn't provided enough information to even start trying to make a real 
recommendation on how to solve this problem. 

On Aug 4, 2012, at 7:32 AM, Nitin Kesarwani bumble@gmail.com wrote:

 Given the size of data, there can be several approaches here:
 
 1. Moving the boxes
 
 Not possible, as I suppose the data must be needed for 24x7 analytics.
 
 2. Mirroring the data.
 
 This is a good solution. However, if you have data being written/removed
 continuously (if a part of live system), there are chances of losing some
 of the data during mirroring happens, unless
 a) You block writes/updates during that time (if you do so, that would be
 as good as unplugging and moving the machine around), or,
 b) Keep a track of what was modified since you started the mirroring
 process.
 
 I would recommend you to go with 2b) because it minimizes downtime. Here is
 how I think you can do it, by using some of the tools provided by Hadoop
 itself.
 
 a) You can use some fast distributed copying tool to copy large chunks of
 data. Before you kick-off with this, you can create a utility that tracks
 the modification of data made to your live system while copying is going on
 in the background. The utility will log the modifications into an audit
 trail.
 b) Once you're done copying the files,  allow the new data store
 replication to catch up by reading the real-time modifications that were
 made, from your utility's log file. Once sync'ed up you can begin with the
 minimal downtime by switching off the JobTracker in live cluster so that
 new files are not created.
 c) As soon as you reach the last chunk of copying, change the DNS entries
 so that the hostnames referenced by the Hadoop jobs points to the new
 location.
 d) Turn on the JobTracker for the new cluster.
 e) Enjoy a drink with the money you saved by not using other paid third
 party solutions and pat your back! ;)
 
 The key of the above solution is to make data copying of step a) as fast as
 possible. Lesser the time, lesser the contents in audit trail, lesser the
 overall downtime.
 
 You can develop some in house solution for this, or use DistCp, provided by
 Hadoop that uses copies over the data using Map/Reduce.
 
 
 On Sat, Aug 4, 2012 at 3:27 AM, Michael Segel 
 michael_se...@hotmail.comwrote:
 
 Sorry at 1PB of disk... compression isn't going to really help a whole
 heck of a lot. Your networking bandwidth will be your bottleneck.
 
 So lets look at the problem.
 
 How much down time can you afford?
 What does your hardware look like?
 How much space do you have in your current data center?
 
 You have 1PB of data. OK, what does the access pattern look like?
 
 There are a couple of ways to slice and dice this. How many trucks do you
 have?
 
 On Aug 3, 2012, at 4:24 PM, Harit Himanshu harit.subscripti...@gmail.com
 wrote:
 
 Moving 1 PB of data would take loads of time,
 - check if this new data center provides something similar to
 http://aws.amazon.com/importexport/
 - Consider multi part uploading of data
 - consider compressing the data
 
 
 On Aug 3, 2012, at 2:19 PM, Patai Sangbutsarakum wrote:
 
 thanks for response.
 Physical move is not a choice in this case. Purely looking for copying
 data and how to catch up with the update of a file while it is being
 migrated.
 
 On Fri, Aug 3, 2012 at 12:40 PM, Chen He airb...@gmail.com wrote:
 sometimes, physically moving hard drives helps.   :)
 On Aug 3, 2012 1:50 PM, Patai Sangbutsarakum 
 silvianhad...@gmail.com
 wrote:
 
 Hi Hadoopers,
 
 We have a plan to migrate Hadoop cluster to a different datacenter
 where we can triple the size of the cluster.
 Currently, our 0.20.2 cluster have around 1PB of data. We use only
 Java/Pig.
 
 I would like to get some input how we gonna handle with transferring
 1PB of data to a new site, and also keep up with
 new files that thrown into cluster all the time.
 
 Happy friday !!
 
 P
 
 
 
 



Re: Basic Question

2012-08-07 Thread Harsh J
Each write call registers (writes) a KV pair to the output. The output
collector does not look for similarities nor does it try to de-dupe
it, and even if the object is the same, its value is copied so that
doesn't matter.

So you will get two KV pairs in your output - since duplication is
allowed and is normal in several MR cases. Think of wordcount, where a
map() call may emit lots of (is, 1) pairs if there are multiple is
in the line it processes, and can use set() calls to its benefit to
avoid too many object creation.

On Tue, Aug 7, 2012 at 11:56 PM, Mohit Anchlia mohitanch...@gmail.com wrote:
 In Mapper I often use a Global Text object and througout the map processing
 I just call set on it. My question is, what happens if collector receives
 similar byte array value. Does the last one overwrite the value in
 collector? So if I did

 Text zip = new Text();
 zip.set(9099);
 collector.write(zip,value);
 zip.set(9099);
 collector.write(zip,value1);

 Should I expect to receive both values in reducer or just one?



-- 
Harsh J


Re: Basic Question

2012-08-07 Thread Mohit Anchlia
On Tue, Aug 7, 2012 at 11:33 AM, Harsh J ha...@cloudera.com wrote:

 Each write call registers (writes) a KV pair to the output. The output
 collector does not look for similarities nor does it try to de-dupe
 it, and even if the object is the same, its value is copied so that
 doesn't matter.

 So you will get two KV pairs in your output - since duplication is
 allowed and is normal in several MR cases. Think of wordcount, where a
 map() call may emit lots of (is, 1) pairs if there are multiple is
 in the line it processes, and can use set() calls to its benefit to
 avoid too many object creation.


Thanks!


 On Tue, Aug 7, 2012 at 11:56 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:
  In Mapper I often use a Global Text object and througout the map
 processing
  I just call set on it. My question is, what happens if collector
 receives
  similar byte array value. Does the last one overwrite the value in
  collector? So if I did
 
  Text zip = new Text();
  zip.set(9099);
  collector.write(zip,value);
  zip.set(9099);
  collector.write(zip,value1);
 
  Should I expect to receive both values in reducer or just one?



 --
 Harsh J



Setting Configuration for local file:///

2012-08-07 Thread Mohit Anchlia
I am trying to write a test on local file system but this test keeps taking
xml files in the path even though I am setting a different Configuration
object. Is there a way for me to override it? I thought the way I am doing
overwrites the configuration but doesn't seem to be working:

 @Test
 public void testOnLocalFS() throws Exception{
  Configuration conf = new Configuration();
  conf.set(fs.default.name, file:///);
  conf.set(mapred.job.tracker, local);
  Path input = new Path(geoinput/geo.dat);
  Path output = new Path(geooutput/);
  FileSystem fs = FileSystem.getLocal(conf);
  fs.delete(output, true);

  log.info(Here);
  GeoLookupConfigRunner configRunner = new GeoLookupConfigRunner();
  configRunner.setConf(conf);
  int exitCode = configRunner.run(new String[]{input.toString(),
output.toString()});
  Assert.assertEquals(exitCode, 0);
 }


Re: Setting Configuration for local file:///

2012-08-07 Thread Harsh J
What is GeoLookupConfigRunner and how do you utilize the setConf(conf)
object within it?

On Wed, Aug 8, 2012 at 1:10 AM, Mohit Anchlia mohitanch...@gmail.com wrote:
 I am trying to write a test on local file system but this test keeps taking
 xml files in the path even though I am setting a different Configuration
 object. Is there a way for me to override it? I thought the way I am doing
 overwrites the configuration but doesn't seem to be working:

  @Test
  public void testOnLocalFS() throws Exception{
   Configuration conf = new Configuration();
   conf.set(fs.default.name, file:///);
   conf.set(mapred.job.tracker, local);
   Path input = new Path(geoinput/geo.dat);
   Path output = new Path(geooutput/);
   FileSystem fs = FileSystem.getLocal(conf);
   fs.delete(output, true);

   log.info(Here);
   GeoLookupConfigRunner configRunner = new GeoLookupConfigRunner();
   configRunner.setConf(conf);
   int exitCode = configRunner.run(new String[]{input.toString(),
 output.toString()});
   Assert.assertEquals(exitCode, 0);
  }



-- 
Harsh J


Re: Setting Configuration for local file:///

2012-08-07 Thread Mohit Anchlia
On Tue, Aug 7, 2012 at 12:50 PM, Harsh J ha...@cloudera.com wrote:

 What is GeoLookupConfigRunner and how do you utilize the setConf(conf)
 object within it?


Thanks for the pointer I wasn't setting my JobConf object with the conf
that I passed. Just one more related question, if I use JobConf conf = new
JobConf(getConf()) and I don't pass in any configuration then does the data
from xml files in the path used? I want this to work for all the scenarios.



 On Wed, Aug 8, 2012 at 1:10 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:
  I am trying to write a test on local file system but this test keeps
 taking
  xml files in the path even though I am setting a different Configuration
  object. Is there a way for me to override it? I thought the way I am
 doing
  overwrites the configuration but doesn't seem to be working:
 
   @Test
   public void testOnLocalFS() throws Exception{
Configuration conf = new Configuration();
conf.set(fs.default.name, file:///);
conf.set(mapred.job.tracker, local);
Path input = new Path(geoinput/geo.dat);
Path output = new Path(geooutput/);
FileSystem fs = FileSystem.getLocal(conf);
fs.delete(output, true);
 
log.info(Here);
GeoLookupConfigRunner configRunner = new GeoLookupConfigRunner();
configRunner.setConf(conf);
int exitCode = configRunner.run(new String[]{input.toString(),
  output.toString()});
Assert.assertEquals(exitCode, 0);
   }



 --
 Harsh J



Local jobtracker in test env?

2012-08-07 Thread Mohit Anchlia
I just wrote a test where fs.default.name is file:/// and
mapred.job.tracker is set to local. The test ran fine, I also see mapper
and reducer were invoked but what I am trying to understand is that how did
this run without specifying the job tracker port and which port task
tracker connected with job tracker. It's not clear from the output:

Also what's the difference between this and bringing up miniDFS cluster?

INFO  org.apache.hadoop.mapred.FileInputFormat [main]: Total input paths to
proc
ess : 1
INFO  org.apache.hadoop.mapred.JobClient [main]: Running job: job_local_0001
INFO  org.apache.hadoop.mapred.Task [Thread-11]:  Using
ResourceCalculatorPlugin
 : null
INFO  org.apache.hadoop.mapred.MapTask [Thread-11]: numReduceTasks: 1
INFO  org.apache.hadoop.mapred.MapTask [Thread-11]: io.sort.mb = 100
INFO  org.apache.hadoop.mapred.MapTask [Thread-11]: data buffer =
79691776/99614
720
INFO  org.apache.hadoop.mapred.MapTask [Thread-11]: record buffer =
262144/32768
0
INFO  com.i.cg.services.dp.analytics.hadoop.mapred.GeoLookup [Thread-11]: z
ip 92127
INFO  com.i.cg.services.dp.analytics.hadoop.mapred.GeoLookup [Thread-11]: z
ip 1
INFO  com.i.cg.services.dp.analytics.hadoop.mapred.GeoLookup [Thread-11]: z
ip 92127
INFO  com.i.cg.services.dp.analytics.hadoop.mapred.GeoLookup [Thread-11]: z
ip 1
INFO  org.apache.hadoop.mapred.MapTask [Thread-11]: Starting flush of map
output
INFO  org.apache.hadoop.mapred.MapTask [Thread-11]: Finished spill 0
INFO  org.apache.hadoop.mapred.Task [Thread-11]:
Task:attempt_local_0001_m_0
0_0 is done. And is in the process of commiting
INFO  org.apache.hadoop.mapred.LocalJobRunner [Thread-11]:
file:/c:/upb/dp/manch
lia-dp/depot/services/data-platform/trunk/analytics/geoinput/geo.dat:0+18
INFO  org.apache.hadoop.mapred.Task [Thread-11]: Task
'attempt_local_0001_m_
00_0' done.
INFO  org.apache.hadoop.mapred.Task [Thread-11]:  Using
ResourceCalculatorPlugin
 : null
INFO  org.apache.hadoop.mapred.LocalJobRunner [Thread-11]:
INFO  org.apache.hadoop.mapred.Merger [Thread-11]: Merging 1 sorted segments
INFO  org.apache.hadoop.mapred.Merger [Thread-11]: Down to the last
merge-pass,
with 1 segments left of total size: 26 bytes
INFO  org.apache.hadoop.mapred.LocalJobRunner [Thread-11]:
INFO  com.i.cg.services.dp.analytics.hadoop.mapred.GeoLookup [Thread-11]: I
nside reduce
INFO  com.i.cg.services.dp.analytics.hadoop.mapred.GeoLookup [Thread-11]: O
utside reduce
INFO  org.apache.hadoop.mapred.Task [Thread-11]:
Task:attempt_local_0001_r_0
0_0 is done. And is in the process of commiting
INFO  org.apache.hadoop.mapred.LocalJobRunner [Thread-11]:
INFO  org.apache.hadoop.mapred.Task [Thread-11]: Task
attempt_local_0001_r_0
0_0 is allowed to commit now
INFO  org.apache.hadoop.mapred.FileOutputCommitter [Thread-11]: Saved
output of
task 'attempt_local_0001_r_00_0' to
file:/c:/upb/dp/manchlia-dp/depot/servic
es/data-platform/trunk/analytics/geooutput
INFO  org.apache.hadoop.mapred.LocalJobRunner [Thread-11]: reduce  reduce
INFO  org.apache.hadoop.mapred.Task [Thread-11]: Task
'attempt_local_0001_r_
00_0' done.
INFO  org.apache.hadoop.mapred.JobClient [main]:  map 100% reduce 100%
INFO  org.apache.hadoop.mapred.JobClient [main]: Job complete:
job_local_0001
INFO  org.apache.hadoop.mapred.JobClient [main]: Counters: 15
INFO  org.apache.hadoop.mapred.JobClient [main]:   FileSystemCounters
INFO  org.apache.hadoop.mapred.JobClient [main]: FILE_BYTES_READ=458
INFO  org.apache.hadoop.mapred.JobClient [main]:
FILE_BYTES_WRITTEN=96110
INFO  org.apache.hadoop.mapred.JobClient [main]:   Map-Reduce Framework
INFO  org.apache.hadoop.mapred.JobClient [main]: Map input records=2
INFO  org.apache.hadoop.mapred.JobClient [main]: Reduce shuffle bytes=0
INFO  org.apache.hadoop.mapred.JobClient [main]: Spilled Records=4
INFO  org.apache.hadoop.mapred.JobClient [main]: Map output bytes=20
INFO  org.apache.hadoop.mapred.JobClient [main]: Total committed heap
usage
(bytes)=321527808
INFO  org.apache.hadoop.mapred.JobClient [main]: Map input bytes=18
INFO  org.apache.hadoop.mapred.JobClient [main]: SPLIT_RAW_BYTES=142
INFO  org.apache.hadoop.mapred.JobClient [main]: Combine input records=0
INFO  org.apache.hadoop.mapred.JobClient [main]: Reduce input records=2
INFO  org.apache.hadoop.mapred.JobClient [main]: Reduce input groups=1
INFO  org.apache.hadoop.mapred.JobClient [main]: Combine output
records=0
INFO  org.apache.hadoop.mapred.JobClient [main]: Reduce output records=1
INFO  org.apache.hadoop.mapred.JobClient [main]: Map output records=2
INFO  com.i.cg.services.dp.analytics.hadoop.mapred.GeoLookup [main]: Inside
 reduce
INFO  com.i.cg.services.dp.analytics.hadoop.mapred.GeoLookup [main]: Outsid
e reduce
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 4.547 sec
Results :
Tests run: 4, Failures: 0, Errors: 0, Skipped: 0


Re: Setting Configuration for local file:///

2012-08-07 Thread Harsh J
If you instantiate the JobConf with your existing conf object, then
you needn't have that fear.

On Wed, Aug 8, 2012 at 1:40 AM, Mohit Anchlia mohitanch...@gmail.com wrote:
 On Tue, Aug 7, 2012 at 12:50 PM, Harsh J ha...@cloudera.com wrote:

 What is GeoLookupConfigRunner and how do you utilize the setConf(conf)
 object within it?


 Thanks for the pointer I wasn't setting my JobConf object with the conf
 that I passed. Just one more related question, if I use JobConf conf = new
 JobConf(getConf()) and I don't pass in any configuration then does the data
 from xml files in the path used? I want this to work for all the scenarios.



 On Wed, Aug 8, 2012 at 1:10 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:
  I am trying to write a test on local file system but this test keeps
 taking
  xml files in the path even though I am setting a different Configuration
  object. Is there a way for me to override it? I thought the way I am
 doing
  overwrites the configuration but doesn't seem to be working:
 
   @Test
   public void testOnLocalFS() throws Exception{
Configuration conf = new Configuration();
conf.set(fs.default.name, file:///);
conf.set(mapred.job.tracker, local);
Path input = new Path(geoinput/geo.dat);
Path output = new Path(geooutput/);
FileSystem fs = FileSystem.getLocal(conf);
fs.delete(output, true);
 
log.info(Here);
GeoLookupConfigRunner configRunner = new GeoLookupConfigRunner();
configRunner.setConf(conf);
int exitCode = configRunner.run(new String[]{input.toString(),
  output.toString()});
Assert.assertEquals(exitCode, 0);
   }



 --
 Harsh J




-- 
Harsh J


Re: Local jobtracker in test env?

2012-08-07 Thread Harsh J
It used the local mode of operation: org.apache.hadoop.mapred.LocalJobRunner

A JobTracker (via MiniMRCluster) is only required for simulating
distributed tests.

On Wed, Aug 8, 2012 at 2:27 AM, Mohit Anchlia mohitanch...@gmail.com wrote:
 I just wrote a test where fs.default.name is file:/// and
 mapred.job.tracker is set to local. The test ran fine, I also see mapper
 and reducer were invoked but what I am trying to understand is that how did
 this run without specifying the job tracker port and which port task
 tracker connected with job tracker. It's not clear from the output:

 Also what's the difference between this and bringing up miniDFS cluster?

 INFO  org.apache.hadoop.mapred.FileInputFormat [main]: Total input paths to
 proc
 ess : 1
 INFO  org.apache.hadoop.mapred.JobClient [main]: Running job: job_local_0001
 INFO  org.apache.hadoop.mapred.Task [Thread-11]:  Using
 ResourceCalculatorPlugin
  : null
 INFO  org.apache.hadoop.mapred.MapTask [Thread-11]: numReduceTasks: 1
 INFO  org.apache.hadoop.mapred.MapTask [Thread-11]: io.sort.mb = 100
 INFO  org.apache.hadoop.mapred.MapTask [Thread-11]: data buffer =
 79691776/99614
 720
 INFO  org.apache.hadoop.mapred.MapTask [Thread-11]: record buffer =
 262144/32768
 0
 INFO  com.i.cg.services.dp.analytics.hadoop.mapred.GeoLookup [Thread-11]: z
 ip 92127
 INFO  com.i.cg.services.dp.analytics.hadoop.mapred.GeoLookup [Thread-11]: z
 ip 1
 INFO  com.i.cg.services.dp.analytics.hadoop.mapred.GeoLookup [Thread-11]: z
 ip 92127
 INFO  com.i.cg.services.dp.analytics.hadoop.mapred.GeoLookup [Thread-11]: z
 ip 1
 INFO  org.apache.hadoop.mapred.MapTask [Thread-11]: Starting flush of map
 output
 INFO  org.apache.hadoop.mapred.MapTask [Thread-11]: Finished spill 0
 INFO  org.apache.hadoop.mapred.Task [Thread-11]:
 Task:attempt_local_0001_m_0
 0_0 is done. And is in the process of commiting
 INFO  org.apache.hadoop.mapred.LocalJobRunner [Thread-11]:
 file:/c:/upb/dp/manch
 lia-dp/depot/services/data-platform/trunk/analytics/geoinput/geo.dat:0+18
 INFO  org.apache.hadoop.mapred.Task [Thread-11]: Task
 'attempt_local_0001_m_
 00_0' done.
 INFO  org.apache.hadoop.mapred.Task [Thread-11]:  Using
 ResourceCalculatorPlugin
  : null
 INFO  org.apache.hadoop.mapred.LocalJobRunner [Thread-11]:
 INFO  org.apache.hadoop.mapred.Merger [Thread-11]: Merging 1 sorted segments
 INFO  org.apache.hadoop.mapred.Merger [Thread-11]: Down to the last
 merge-pass,
 with 1 segments left of total size: 26 bytes
 INFO  org.apache.hadoop.mapred.LocalJobRunner [Thread-11]:
 INFO  com.i.cg.services.dp.analytics.hadoop.mapred.GeoLookup [Thread-11]: I
 nside reduce
 INFO  com.i.cg.services.dp.analytics.hadoop.mapred.GeoLookup [Thread-11]: O
 utside reduce
 INFO  org.apache.hadoop.mapred.Task [Thread-11]:
 Task:attempt_local_0001_r_0
 0_0 is done. And is in the process of commiting
 INFO  org.apache.hadoop.mapred.LocalJobRunner [Thread-11]:
 INFO  org.apache.hadoop.mapred.Task [Thread-11]: Task
 attempt_local_0001_r_0
 0_0 is allowed to commit now
 INFO  org.apache.hadoop.mapred.FileOutputCommitter [Thread-11]: Saved
 output of
 task 'attempt_local_0001_r_00_0' to
 file:/c:/upb/dp/manchlia-dp/depot/servic
 es/data-platform/trunk/analytics/geooutput
 INFO  org.apache.hadoop.mapred.LocalJobRunner [Thread-11]: reduce  reduce
 INFO  org.apache.hadoop.mapred.Task [Thread-11]: Task
 'attempt_local_0001_r_
 00_0' done.
 INFO  org.apache.hadoop.mapred.JobClient [main]:  map 100% reduce 100%
 INFO  org.apache.hadoop.mapred.JobClient [main]: Job complete:
 job_local_0001
 INFO  org.apache.hadoop.mapred.JobClient [main]: Counters: 15
 INFO  org.apache.hadoop.mapred.JobClient [main]:   FileSystemCounters
 INFO  org.apache.hadoop.mapred.JobClient [main]: FILE_BYTES_READ=458
 INFO  org.apache.hadoop.mapred.JobClient [main]:
 FILE_BYTES_WRITTEN=96110
 INFO  org.apache.hadoop.mapred.JobClient [main]:   Map-Reduce Framework
 INFO  org.apache.hadoop.mapred.JobClient [main]: Map input records=2
 INFO  org.apache.hadoop.mapred.JobClient [main]: Reduce shuffle bytes=0
 INFO  org.apache.hadoop.mapred.JobClient [main]: Spilled Records=4
 INFO  org.apache.hadoop.mapred.JobClient [main]: Map output bytes=20
 INFO  org.apache.hadoop.mapred.JobClient [main]: Total committed heap
 usage
 (bytes)=321527808
 INFO  org.apache.hadoop.mapred.JobClient [main]: Map input bytes=18
 INFO  org.apache.hadoop.mapred.JobClient [main]: SPLIT_RAW_BYTES=142
 INFO  org.apache.hadoop.mapred.JobClient [main]: Combine input records=0
 INFO  org.apache.hadoop.mapred.JobClient [main]: Reduce input records=2
 INFO  org.apache.hadoop.mapred.JobClient [main]: Reduce input groups=1
 INFO  org.apache.hadoop.mapred.JobClient [main]: Combine output
 records=0
 INFO  org.apache.hadoop.mapred.JobClient [main]: Reduce output records=1
 INFO  org.apache.hadoop.mapred.JobClient [main]: Map output records=2
 INFO  com.i.cg.services.dp.analytics.hadoop.mapred.GeoLookup [main]: 

Re: [ANNOUNCE] - New user@ mailing list for hadoop users in-lieu of (common,hdfs,mapreduce)-user@

2012-08-07 Thread Arun C Murthy
Apologies (again) for the cross-post, I've filed 
https://issues.apache.org/jira/browse/INFRA-5123 to close down (common, hdfs, 
mapreduce)-user@ since user@ is functional now.

thanks,
Arun

On Aug 4, 2012, at 9:59 PM, Arun C Murthy wrote:

 All,
 
  Given our recent discussion (http://s.apache.org/hv), the new 
 u...@hadoop.apache.org mailing list has been created and all existing users 
 in (common,hdfs,mapreduce)-user@ have been migrated over.
 
  I'm in the process of changing the website to reflect this (HADOOP-8652). 
 
  Henceforth, please use the new mailing list for all user-related discussions.
 
 thanks,
 Arun
 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/