NullPointerException running jps

2009-06-17 Thread Richa Khandelwal
Hi,

I am getting a NullPointerException  trying to run the jps command, which is
kind of weird. Anyone has any idea on this?

Thanks,

-- 
Richa Khandelwal
University of California,
Santa Cruz
CA


Re: org.apache.hadoop.ipc.client : trying connect to server failed

2009-06-16 Thread Richa Khandelwal
Hi,
I faced the same problem. Try deleting the hadoop pids from the logs
directory. That worked for me.

Thanks,
Richa

On Mon, Jun 15, 2009 at 10:28 PM, ashish pareek  wrote:

> HI ,
> I am trying to step up a hadoop cluster on 3GB machine and using hadoop
> 0.18.3 and  have followed procedure given in  apache hadoop site for hadoop
> cluster.
> In conf/slaves I have added two datanode i.e including the namenode
> vitrual machine and other machine virtual machine (datanode)  . and
> have
> set up passwordless ssh between both virtual machines . But now problem
> is when I run command :
>
> bin/hadoop start-all.sh
>
> It start only one datanode on the same namenode vitrual machine but it
> doesn't start the datanode on other machine.
>
> in logs/hadoop-datanode.log  i get message
>
>
>  INFO org.apache.hadoop.ipc.Client: Retrying
>  connect to server: hadoop1/192.168.1.28:9000. Already
>
>  tried 1 time(s).
>
>  2009-05-09 18:35:14,266 INFO org.apache.hadoop.ipc.Client: Retrying
>  connect to server: hadoop1/192.168.1.28:9000. Already tried 2 time(s).
>
>  2009-05-09 18:35:14,266 INFO org.apache.hadoop.ipc.Client: Retrying
>  connect to server: hadoop1/192.168.1.28:9000. Already tried 3 time(s).
>
>
> .
> .
> .
> .
> .
> .
> .
> .
> .
>
> .
> .
>
> .
>
> I have tried formatting and start the cluster again .....but still I
> get the same error.
>
> So can any one help in solving this problem. :)
>
> Thanks
>
> Regards
>
> Ashish Pareek
>



-- 
Richa Khandelwal
University of California,
Santa Cruz
CA


Numbers of mappers and reducers

2009-03-17 Thread Richa Khandelwal
Hi,
I was going through FAQs on Hadoop to optimize the performance of
map/reduce. There is a suggestion to set the number of reducers to a prime
number closest to the number of nodes and number of mappers a prime number
closest to several times the number of nodes in the cluster.
What performance advantages do these numbers give? Obviously doing so
improved the performance of my map reduce jobs considerably. Interested to
know the principles behind it.

Thanks,
Richa Khandelwal


University Of California,
Santa Cruz.
Ph:425-241-7763


Re: Changing logging level

2009-03-13 Thread Richa Khandelwal
Two ways:
In hadoop-site.xml- add:


  mapred.task.profile
  true
  Set profiling option to true.
  



  mapred.task.profile.maps
  1
  Profiling level of maps.
  



  mapred.task.profile.reduces
  1
  Profiling level of reducers.
  



Or in your code add

JobConf.setProfieEnabled(true);
JobConf.setProfileTaskRange(true,0-2);

Cheers,
Richa


On Fri, Mar 13, 2009 at 11:08 AM, Amandeep Khurana  wrote:

> I am using DistributedFileSystem class to read data from the HDFS (with
> some
> source code of HDFS modified by me). When I read a file, I'm getting all
> debug level log messages onto the stdout on the client that I wrote. How
> can
> I change the level to info? I havent mentioned the debug level anywhere. on
> a different hadoop instance(which has no code modification from 0.19.0), it
> runs with info level by default.
>
> Amandeep
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>



-- 
Richa Khandelwal


University Of California,
Santa Cruz.
Ph:425-241-7763


Re: null value output from map...

2009-03-13 Thread Richa Khandelwal
You can initialize IntWritable with an empty constructor.
IntWritable i=new IntWritable();

On Fri, Mar 13, 2009 at 2:21 PM, Andy Sautins
wrote:

>
>
>   In writing a Map/Reduce job I ran across something I found a little
> strange.  I have a situation where I don't need a value output from map.
> If I set the value of the value of OutputCollector to
> null I get the following exception:
>
>
>
> java.lang.NullPointerException
>
>   at
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:56
> 2)
>
>
>
>Looking at the code in MapTask.java ( Hadoop .19.1 ) it makes sense
> why it would throw the exception:
>
>
>
>  if (value.getClass() != valClass) {
>
>throw new IOException("Type mismatch in value from map: expected
> "
>
>  + valClass.getName() + ", recieved "
>
>  + value.getClass().getName());
>
>  }
>
>
>
>  I guess my question is as follows: is it a bad idea/not normal to
> collect a null value in map?  Outputting from reduce through
> TextOutputFormat with a null value as I expect.  If the value is null
> only they key and newline are output.
>
>
>
>   Any thoughts would be appreciated.
>
>
>
>
>
>
>
>


-- 
Richa Khandelwal


University Of California,
Santa Cruz.
Ph:425-241-7763


Child Nodes processing jobs?

2009-03-12 Thread Richa Khandelwal
Hi,
I am running a cluster of map/reduce jobs. How do I confirm that slaves are
actually executing the map/reduce job spawned by the JobTracker at the
master. All the slaves are running the datanodes and tasktracker fine.

Thanks,
Richa Khandelwal


University Of California,
Santa Cruz.
Ph:425-241-7763


Re: Why is large number of [(heavy) keys , (light) value] faster than (light)key , (heavy) value

2009-03-12 Thread Richa Khandelwal
I am running the same test and job that completes in 10 mins for (hk,lv)
case takes  is still running after 30mins have passed for (sk,hv) case.
Would be interesting to pinpoint the reason behind it.
On Wed, Mar 11, 2009 at 1:27 PM, Gyanit  wrote:

>
> Here are exact numbers:
> # of (k,v) pairs = 1.2 Mil this is same.
> # of unique k = 1000, k is integer.
> # of  unique v = 1Mil, v is a big big string.
> For a given k, cumulative size of all v associated to it is about 30 Mb.
> (That is each v is about 25~30Kb)
> # of Mappers = 30
> # of Reducers = 10
>
> (v,k) is atleast 4/5 times faster than (k,v).
>
> -Gyanit
>
>
> Scott Carey wrote:
> >
> > Well if the smaller keys are producing fewer unique values, there should
> > be some more significant differences.
> >
> > I had assumed that your test produced the same number of unique values.
> >
> > I'm still not sure why there would be that significant of a difference as
> > long as the total number of unique values in the small key test is a good
> > deal larger than the number of reducers and there is not too much skew in
> > the bucket sizes.  If there are a small subset of keys in the small key
> > test that contain a large subset of the values, then the reducers will
> > have very skewed work sizes and this could explain your observation.
> >
> >
> > On 3/11/09 11:50 AM, "Gyanit"  wrote:
> >
> >
> >
> > I notices one more thing. Lighter keys tend to make smaller number of
> > unique
> > keys.
> > For example (key,value) pairs may be 10Mil, but if key is lighter unique
> > keys might be just 1000.
> > In other case if keys are heavier unique keys might be 5 mil.
> > I think this might have something to do with it.
> > Bottom line: If your reduce is simple dump and no combining, the put data
> > in
> > keys than values.
> >
> > I need to put data in values. Any suggestions on how to make it faster.
> >
> > -Gyanit.
> >
> >
> > Scott Carey wrote:
> >>
> >> That is a fascinating question.  I would also love to know the reason
> >> behind this.
> >>
> >> If I were to guess I would have thought that smaller keys and heavier
> >> values would slightly outperform, rather than significantly
> underperform.
> >> (assuming total pair count at each phase is the same).   Perhaps there
> is
> >> room for optimization here?
> >>
> >>
> >>
> >> On 3/10/09 6:44 PM, "Gyanit"  wrote:
> >>
> >>
> >>
> >> I have large number of key,value pairs. I don't actually care if data
> >> goes
> >> in
> >> value or key. Let me be more exact.
> >> (k,v) pair after combiner is about 1 mil. I have approx 1kb data for
> each
> >> pair. I can put it in keys or values.
> >> I have experimented with both options (heavy key , light value)  vs
> >> (light
> >> key, heavy value). It turns out that hk,lv option is much much better
> >> than
> >> (lk,hv).
> >> Has someone else also noticed this?
> >> Is there a way to make things faster in light key , heavy value option.
> >> As
> >> some application will need that also.
> >> Remember in both cases we are talking about atleast dozen or so million
> >> pairs.
> >> There is a difference of time in shuffle phase. Which is weird as amount
> >> of
> >> data transferred is same.
> >>
> >> -gyanit
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/Why-is-large-number-of---%28heavy%29-keys-%2C-%28light%29-value--faster-than-%28light%29key-%2C-%28heavy%29-value-tp22447877p22447877.html
> >> Sent from the Hadoop core-user mailing list archive at Nabble.com.
> >>
> >>
> >>
> >>
> >
> > --
> > View this message in context:
> >
> http://www.nabble.com/Why-is-large-number-of---%28heavy%29-keys-%2C-%28light%29-value--faster-than-%28light%29key-%2C-%28heavy%29-value-tp22447877p22463049.html
> > Sent from the Hadoop core-user mailing list archive at Nabble.com.
> >
> >
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Why-is-large-number-of---%28heavy%29-keys-%2C-%28light%29-value--faster-than-%28light%29key-%2C-%28heavy%29-value-tp22447877p22463784.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>


-- 
Richa Khandelwal


University Of California,
Santa Cruz.
Ph:425-241-7763


Profiling Map/Reduce Tasks

2009-03-08 Thread Richa Khandelwal
Hi,
Does Map/Reduce profiles jobs down to milliseconds. From what I can see in
the logs, there is no time specified for the job. Although CPU TIME is an
information that should be present in the logs, it was not profiled and the
response time can only be noted in down to seconds from the runtime progress
of the jobs.

Does someone know how to efficiently profile map reduce jobs?

Thanks,
Richa Khandelwal


University Of California,
Santa Cruz.
Ph:425-241-7763


Re: mapred.input.file returns null

2009-03-06 Thread Richa Khandelwal
I changed "mapred.input.file" to "map.input.file" based on a post by JIRA
which claimed that its a typo

On Fri, Mar 6, 2009 at 11:21 PM, Richa Khandelwal wrote:

> Here's a snippet of my code:
>  private static String inputFile;
>
>   public void configure(JobConf job)
>   {
> inputFile=job.get("map.input.file");
> System.out.println("File "+inputFile);
>   }
>
>
>
> On Fri, Mar 6, 2009 at 11:19 PM, Amandeep Khurana wrote:
>
>> How are you using it?
>>
>>
>> Amandeep Khurana
>> Computer Science Graduate Student
>> University of California, Santa Cruz
>>
>>
>> On Fri, Mar 6, 2009 at 11:18 PM, Richa Khandelwal > >wrote:
>>
>> > Hi All,
>> > I am trying to retrieve the names of files for each record that I am
>> > processing. Using "mapred.input.file" returns null. Does anyone know the
>> > workaround or the fix to this?
>> >
>> > Thanks,
>> > Richa Khandelwal
>> >
>> >
>> > University Of California,
>> > Santa Cruz.
>> > Ph:425-241-7763
>> >
>>
>
>
>
> --
> Richa Khandelwal
>
>
> University Of California,
> Santa Cruz.
> Ph:425-241-7763
>



-- 
Richa Khandelwal


University Of California,
Santa Cruz.
Ph:425-241-7763


Re: mapred.input.file returns null

2009-03-06 Thread Richa Khandelwal
Here's a snippet of my code:
 private static String inputFile;

  public void configure(JobConf job)
  {
inputFile=job.get("map.input.file");
System.out.println("File "+inputFile);
  }



On Fri, Mar 6, 2009 at 11:19 PM, Amandeep Khurana  wrote:

> How are you using it?
>
>
> Amandeep Khurana
> Computer Science Graduate Student
> University of California, Santa Cruz
>
>
> On Fri, Mar 6, 2009 at 11:18 PM, Richa Khandelwal  >wrote:
>
> > Hi All,
> > I am trying to retrieve the names of files for each record that I am
> > processing. Using "mapred.input.file" returns null. Does anyone know the
> > workaround or the fix to this?
> >
> > Thanks,
> > Richa Khandelwal
> >
> >
> > University Of California,
> > Santa Cruz.
> > Ph:425-241-7763
> >
>



-- 
Richa Khandelwal


University Of California,
Santa Cruz.
Ph:425-241-7763


mapred.input.file returns null

2009-03-06 Thread Richa Khandelwal
Hi All,
I am trying to retrieve the names of files for each record that I am
processing. Using "mapred.input.file" returns null. Does anyone know the
workaround or the fix to this?

Thanks,
Richa Khandelwal


University Of California,
Santa Cruz.
Ph:425-241-7763


Batch processing map reduce jobs

2009-03-05 Thread Richa Khandelwal
Hi All,
Does anyone know how to run map reduce jobs using pipes or batch process map
reduce jobs?

Thanks,
Richa Khandelwal


University Of California,
Santa Cruz.
Ph:425-241-7763


Re: Hadoop AMI for EC2

2009-03-05 Thread Richa Khandelwal
Hi All,
I am trying trying to log map reduce jobs in HADOOP_LOG_DIR by setting its
value in hadoop-env.sh. But the directory has no log records when the job
finishes running. I am adding JobConf.setProfileEnabled(true) in my job. Can
anyone point out how to log in hadoop?

Thanks,
Richa

On Thu, Mar 5, 2009 at 8:20 AM, Richa Khandelwal  wrote:

> Thats pretty cool. Thanks
>
>
> On Thu, Mar 5, 2009 at 8:17 AM, tim robertson 
> wrote:
>
>> Yeps,
>>
>> A good starting read: http://wiki.apache.org/hadoop/AmazonEC2
>>
>> These are the AMIs:
>>
>> $ ec2-describe-images -a | grep hadoop
>> IMAGE   ami-245db94dcloudbase-1.1-hadoop-fc64/image.manifest.xml
>>  247610401714available   public  x86_64  machine
>> IMAGE   ami-791ffb10
>>  cloudbase-hadoop-fc64/cloudbase-hadoop-fc64.manifest.xml
>>  247610401714available   public  x86_64  machine
>> IMAGE   ami-f73adf9ecs345-hadoop-EC2-0.15.3/hadoop-0.15.3.manifest.xml
>>  825431212034available   public  i386machine
>> IMAGE   ami-c55db8acfedora8-hypertable-hadoop-kfs/image.manifest.xml
>>  291354417104available   public  x86_64  machine
>> aki-b51cf9dcari-b31cf9da
>> IMAGE   ami-ce6b8fa7hachero-hadoop/hadoop-0.19.0-i386.manifest.xml
>>  118946012109available   public  i386machine
>> aki-a71cf9ceari-a51cf9cc
>> IMAGE   ami-dd48acb4hachero-hadoop/hadoop-0.19.0-x86_64.manifest.xml
>>  118946012109available   public  x86_64  machine
>> aki-b51cf9dcari-b31cf9da
>> IMAGE   ami-ee53b687hadoop-ec2-images/hadoop-0.17.0-i386.manifest.xml
>>   111560892610available   public  i386machine
>> aki-a71cf9ceari-a51cf9cc
>> IMAGE   ami-f853b691
>>  hadoop-ec2-images/hadoop-0.17.0-x86_64.manifest.xml 111560892610
>>  available   public  x86_64  machine aki-b51cf9dc
>>  ari-b31cf9da
>> IMAGE   ami-65987c0chadoop-images/hadoop-0.17.1-i386.manifest.xml
>> 914733919441available   public  i386machine aki-a71cf9ce
>>ari-a51cf9cc
>> IMAGE   ami-4b987c22hadoop-images/hadoop-0.17.1-x86_64.manifest.xml
>> 914733919441available   public  x86_64  machine aki-b51cf9dc
>>ari-b31cf9da
>> IMAGE   ami-b0fe1ad9hadoop-images/hadoop-0.18.0-i386.manifest.xml
>> 914733919441available   public  i386machine aki-a71cf9ce
>>ari-a51cf9cc
>> IMAGE   ami-90fe1af9hadoop-images/hadoop-0.18.0-x86_64.manifest.xml
>> 914733919441available   public  x86_64  machine aki-b51cf9dc
>>ari-b31cf9da
>> IMAGE   ami-ea36d283hadoop-images/hadoop-0.18.1-i386.manifest.xml
>> 914733919441available   public  i386machine aki-a71cf9ce
>>ari-a51cf9cc
>> IMAGE   ami-fe37d397hadoop-images/hadoop-0.18.1-x86_64.manifest.xml
>> 914733919441available   public  x86_64  machine aki-b51cf9dc
>>ari-b31cf9da
>> IMAGE   ami-fa6a8e93hadoop-images/hadoop-0.19.0-i386.manifest.xml
>> 914733919441available   public  i386machine aki-a71cf9ce
>>ari-a51cf9cc
>> IMAGE   ami-cd6a8ea4hadoop-images/hadoop-0.19.0-x86_64.manifest.xml
>> 914733919441available   public  x86_64  machine aki-b51cf9dc
>>ari-b31cf9da
>> IMAGE   ami-15e80f7c
>>  hadoop-images/hadoop-base-20090210-i386.manifest.xml914733919441
>>  available   public  i386machine aki-a71cf9ce
>>  ari-a51cf9cc
>> IMAGE   ami-1ee80f77
>>  hadoop-images/hadoop-base-20090210-x86_64.manifest.xml  914733919441
>>  available   public  x86_64  machine aki-b51cf9dc
>>  ari-b31cf9da
>> IMAGE   ami-4de30724
>>  hbase-ami/hbase-0.2.0-hadoop-0.17.1-i386.manifest.xml   834125115996
>>  available   public      i386machine aki-a71cf9ce
>>  ari-a51cf9cc
>> IMAGE   ami-fe7c9997radlab-hadoop-4-large/image.manifest.xml
>>  117716615155available   public  x86_64  machine
>> IMAGE   ami-7f7f9a16radlab-hadoop-4/image.manifest.xml
>>  117716615155available   public  i386machine
>> $
>>
>> Cheers,
>>
>> Tim
>>
>>
>>
>> On Thu, Mar 5, 2009 at 5:13 PM, Richa Khandelwal 
>> wrote:
>> > Hi All,
>> > Is there an existing Hadoop AMI for EC2 which had Hadaoop setup on it?
>> >
>> > Thanks,
>> > Richa Khandelwal
>> >
>> >
>> > University Of California,
>> > Santa Cruz.
>> > Ph:425-241-7763
>> >
>>
>
>
>
> --
> Richa Khandelwal
>
>
> University Of California,
> Santa Cruz.
> Ph:425-241-7763
>



-- 
Richa Khandelwal


University Of California,
Santa Cruz.
Ph:425-241-7763


Re: Hadoop AMI for EC2

2009-03-05 Thread Richa Khandelwal
Thats pretty cool. Thanks

On Thu, Mar 5, 2009 at 8:17 AM, tim robertson wrote:

> Yeps,
>
> A good starting read: http://wiki.apache.org/hadoop/AmazonEC2
>
> These are the AMIs:
>
> $ ec2-describe-images -a | grep hadoop
> IMAGE   ami-245db94dcloudbase-1.1-hadoop-fc64/image.manifest.xml
>  247610401714available   public  x86_64  machine
> IMAGE   ami-791ffb10
>  cloudbase-hadoop-fc64/cloudbase-hadoop-fc64.manifest.xml
>  247610401714available   public  x86_64  machine
> IMAGE   ami-f73adf9ecs345-hadoop-EC2-0.15.3/hadoop-0.15.3.manifest.xml
>  825431212034available   public  i386machine
> IMAGE   ami-c55db8acfedora8-hypertable-hadoop-kfs/image.manifest.xml
>  291354417104available   public  x86_64  machine
> aki-b51cf9dcari-b31cf9da
> IMAGE   ami-ce6b8fa7hachero-hadoop/hadoop-0.19.0-i386.manifest.xml
>  118946012109available   public  i386machine
> aki-a71cf9ceari-a51cf9cc
> IMAGE   ami-dd48acb4hachero-hadoop/hadoop-0.19.0-x86_64.manifest.xml
>  118946012109available   public  x86_64  machine
> aki-b51cf9dcari-b31cf9da
> IMAGE   ami-ee53b687hadoop-ec2-images/hadoop-0.17.0-i386.manifest.xml
> 111560892610available   public  i386machine
> aki-a71cf9ceari-a51cf9cc
> IMAGE   ami-f853b691hadoop-ec2-images/hadoop-0.17.0-x86_64.manifest.xml
> 111560892610available   public  x86_64  machine
> aki-b51cf9dcari-b31cf9da
> IMAGE   ami-65987c0chadoop-images/hadoop-0.17.1-i386.manifest.xml
> 914733919441available   public  i386machine aki-a71cf9ce
>ari-a51cf9cc
> IMAGE   ami-4b987c22hadoop-images/hadoop-0.17.1-x86_64.manifest.xml
> 914733919441available   public  x86_64  machine aki-b51cf9dc
>ari-b31cf9da
> IMAGE   ami-b0fe1ad9hadoop-images/hadoop-0.18.0-i386.manifest.xml
> 914733919441available   public  i386machine aki-a71cf9ce
>ari-a51cf9cc
> IMAGE   ami-90fe1af9hadoop-images/hadoop-0.18.0-x86_64.manifest.xml
> 914733919441available   public  x86_64  machine aki-b51cf9dc
>ari-b31cf9da
> IMAGE   ami-ea36d283hadoop-images/hadoop-0.18.1-i386.manifest.xml
> 914733919441available   public  i386machine aki-a71cf9ce
>ari-a51cf9cc
> IMAGE   ami-fe37d397hadoop-images/hadoop-0.18.1-x86_64.manifest.xml
> 914733919441available   public  x86_64  machine aki-b51cf9dc
>ari-b31cf9da
> IMAGE   ami-fa6a8e93hadoop-images/hadoop-0.19.0-i386.manifest.xml
> 914733919441available   public  i386machine aki-a71cf9ce
>ari-a51cf9cc
> IMAGE   ami-cd6a8ea4hadoop-images/hadoop-0.19.0-x86_64.manifest.xml
> 914733919441available   public  x86_64  machine aki-b51cf9dc
>ari-b31cf9da
> IMAGE   ami-15e80f7c
>  hadoop-images/hadoop-base-20090210-i386.manifest.xml914733919441
>  available   public  i386machine aki-a71cf9ce
>  ari-a51cf9cc
> IMAGE   ami-1ee80f77
>  hadoop-images/hadoop-base-20090210-x86_64.manifest.xml  914733919441
>  available   public  x86_64  machine aki-b51cf9dc
>  ari-b31cf9da
> IMAGE   ami-4de30724
>  hbase-ami/hbase-0.2.0-hadoop-0.17.1-i386.manifest.xml   834125115996
>  available   public  i386machine aki-a71cf9ce
>  ari-a51cf9cc
> IMAGE   ami-fe7c9997radlab-hadoop-4-large/image.manifest.xml
>  117716615155available   public  x86_64  machine
> IMAGE   ami-7f7f9a16radlab-hadoop-4/image.manifest.xml
>  117716615155available   public  i386    machine
> $
>
> Cheers,
>
> Tim
>
>
>
> On Thu, Mar 5, 2009 at 5:13 PM, Richa Khandelwal 
> wrote:
> > Hi All,
> > Is there an existing Hadoop AMI for EC2 which had Hadaoop setup on it?
> >
> > Thanks,
> > Richa Khandelwal
> >
> >
> > University Of California,
> > Santa Cruz.
> > Ph:425-241-7763
> >
>



-- 
Richa Khandelwal


University Of California,
Santa Cruz.
Ph:425-241-7763


Repartitioned Joins

2009-03-04 Thread Richa Khandelwal
Hi All,
Does anyone know of tweaking in map-reduce joins that will optimize it
further in terms of the moving only those tuples to reduce phase that join
in the two tables? There are replicated joins and semi-join strategies but
they are more of databases than map-reduce.

Thanks,
Richa Khandelwal
University Of California,
Santa Cruz.
Ph:425-241-7763