NullPointerException running jps
Hi, I am getting a NullPointerException trying to run the jps command, which is kind of weird. Anyone has any idea on this? Thanks, -- Richa Khandelwal University of California, Santa Cruz CA
Re: org.apache.hadoop.ipc.client : trying connect to server failed
Hi, I faced the same problem. Try deleting the hadoop pids from the logs directory. That worked for me. Thanks, Richa On Mon, Jun 15, 2009 at 10:28 PM, ashish pareek wrote: > HI , > I am trying to step up a hadoop cluster on 3GB machine and using hadoop > 0.18.3 and have followed procedure given in apache hadoop site for hadoop > cluster. > In conf/slaves I have added two datanode i.e including the namenode > vitrual machine and other machine virtual machine (datanode) . and > have > set up passwordless ssh between both virtual machines . But now problem > is when I run command : > > bin/hadoop start-all.sh > > It start only one datanode on the same namenode vitrual machine but it > doesn't start the datanode on other machine. > > in logs/hadoop-datanode.log i get message > > > INFO org.apache.hadoop.ipc.Client: Retrying > connect to server: hadoop1/192.168.1.28:9000. Already > > tried 1 time(s). > > 2009-05-09 18:35:14,266 INFO org.apache.hadoop.ipc.Client: Retrying > connect to server: hadoop1/192.168.1.28:9000. Already tried 2 time(s). > > 2009-05-09 18:35:14,266 INFO org.apache.hadoop.ipc.Client: Retrying > connect to server: hadoop1/192.168.1.28:9000. Already tried 3 time(s). > > > . > . > . > . > . > . > . > . > . > > . > . > > . > > I have tried formatting and start the cluster again .....but still I > get the same error. > > So can any one help in solving this problem. :) > > Thanks > > Regards > > Ashish Pareek > -- Richa Khandelwal University of California, Santa Cruz CA
Numbers of mappers and reducers
Hi, I was going through FAQs on Hadoop to optimize the performance of map/reduce. There is a suggestion to set the number of reducers to a prime number closest to the number of nodes and number of mappers a prime number closest to several times the number of nodes in the cluster. What performance advantages do these numbers give? Obviously doing so improved the performance of my map reduce jobs considerably. Interested to know the principles behind it. Thanks, Richa Khandelwal University Of California, Santa Cruz. Ph:425-241-7763
Re: Changing logging level
Two ways: In hadoop-site.xml- add: mapred.task.profile true Set profiling option to true. mapred.task.profile.maps 1 Profiling level of maps. mapred.task.profile.reduces 1 Profiling level of reducers. Or in your code add JobConf.setProfieEnabled(true); JobConf.setProfileTaskRange(true,0-2); Cheers, Richa On Fri, Mar 13, 2009 at 11:08 AM, Amandeep Khurana wrote: > I am using DistributedFileSystem class to read data from the HDFS (with > some > source code of HDFS modified by me). When I read a file, I'm getting all > debug level log messages onto the stdout on the client that I wrote. How > can > I change the level to info? I havent mentioned the debug level anywhere. on > a different hadoop instance(which has no code modification from 0.19.0), it > runs with info level by default. > > Amandeep > > Amandeep Khurana > Computer Science Graduate Student > University of California, Santa Cruz > -- Richa Khandelwal University Of California, Santa Cruz. Ph:425-241-7763
Re: null value output from map...
You can initialize IntWritable with an empty constructor. IntWritable i=new IntWritable(); On Fri, Mar 13, 2009 at 2:21 PM, Andy Sautins wrote: > > > In writing a Map/Reduce job I ran across something I found a little > strange. I have a situation where I don't need a value output from map. > If I set the value of the value of OutputCollector to > null I get the following exception: > > > > java.lang.NullPointerException > > at > org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:56 > 2) > > > >Looking at the code in MapTask.java ( Hadoop .19.1 ) it makes sense > why it would throw the exception: > > > > if (value.getClass() != valClass) { > >throw new IOException("Type mismatch in value from map: expected > " > > + valClass.getName() + ", recieved " > > + value.getClass().getName()); > > } > > > > I guess my question is as follows: is it a bad idea/not normal to > collect a null value in map? Outputting from reduce through > TextOutputFormat with a null value as I expect. If the value is null > only they key and newline are output. > > > > Any thoughts would be appreciated. > > > > > > > > -- Richa Khandelwal University Of California, Santa Cruz. Ph:425-241-7763
Child Nodes processing jobs?
Hi, I am running a cluster of map/reduce jobs. How do I confirm that slaves are actually executing the map/reduce job spawned by the JobTracker at the master. All the slaves are running the datanodes and tasktracker fine. Thanks, Richa Khandelwal University Of California, Santa Cruz. Ph:425-241-7763
Re: Why is large number of [(heavy) keys , (light) value] faster than (light)key , (heavy) value
I am running the same test and job that completes in 10 mins for (hk,lv) case takes is still running after 30mins have passed for (sk,hv) case. Would be interesting to pinpoint the reason behind it. On Wed, Mar 11, 2009 at 1:27 PM, Gyanit wrote: > > Here are exact numbers: > # of (k,v) pairs = 1.2 Mil this is same. > # of unique k = 1000, k is integer. > # of unique v = 1Mil, v is a big big string. > For a given k, cumulative size of all v associated to it is about 30 Mb. > (That is each v is about 25~30Kb) > # of Mappers = 30 > # of Reducers = 10 > > (v,k) is atleast 4/5 times faster than (k,v). > > -Gyanit > > > Scott Carey wrote: > > > > Well if the smaller keys are producing fewer unique values, there should > > be some more significant differences. > > > > I had assumed that your test produced the same number of unique values. > > > > I'm still not sure why there would be that significant of a difference as > > long as the total number of unique values in the small key test is a good > > deal larger than the number of reducers and there is not too much skew in > > the bucket sizes. If there are a small subset of keys in the small key > > test that contain a large subset of the values, then the reducers will > > have very skewed work sizes and this could explain your observation. > > > > > > On 3/11/09 11:50 AM, "Gyanit" wrote: > > > > > > > > I notices one more thing. Lighter keys tend to make smaller number of > > unique > > keys. > > For example (key,value) pairs may be 10Mil, but if key is lighter unique > > keys might be just 1000. > > In other case if keys are heavier unique keys might be 5 mil. > > I think this might have something to do with it. > > Bottom line: If your reduce is simple dump and no combining, the put data > > in > > keys than values. > > > > I need to put data in values. Any suggestions on how to make it faster. > > > > -Gyanit. > > > > > > Scott Carey wrote: > >> > >> That is a fascinating question. I would also love to know the reason > >> behind this. > >> > >> If I were to guess I would have thought that smaller keys and heavier > >> values would slightly outperform, rather than significantly > underperform. > >> (assuming total pair count at each phase is the same). Perhaps there > is > >> room for optimization here? > >> > >> > >> > >> On 3/10/09 6:44 PM, "Gyanit" wrote: > >> > >> > >> > >> I have large number of key,value pairs. I don't actually care if data > >> goes > >> in > >> value or key. Let me be more exact. > >> (k,v) pair after combiner is about 1 mil. I have approx 1kb data for > each > >> pair. I can put it in keys or values. > >> I have experimented with both options (heavy key , light value) vs > >> (light > >> key, heavy value). It turns out that hk,lv option is much much better > >> than > >> (lk,hv). > >> Has someone else also noticed this? > >> Is there a way to make things faster in light key , heavy value option. > >> As > >> some application will need that also. > >> Remember in both cases we are talking about atleast dozen or so million > >> pairs. > >> There is a difference of time in shuffle phase. Which is weird as amount > >> of > >> data transferred is same. > >> > >> -gyanit > >> -- > >> View this message in context: > >> > http://www.nabble.com/Why-is-large-number-of---%28heavy%29-keys-%2C-%28light%29-value--faster-than-%28light%29key-%2C-%28heavy%29-value-tp22447877p22447877.html > >> Sent from the Hadoop core-user mailing list archive at Nabble.com. > >> > >> > >> > >> > > > > -- > > View this message in context: > > > http://www.nabble.com/Why-is-large-number-of---%28heavy%29-keys-%2C-%28light%29-value--faster-than-%28light%29key-%2C-%28heavy%29-value-tp22447877p22463049.html > > Sent from the Hadoop core-user mailing list archive at Nabble.com. > > > > > > > > > > -- > View this message in context: > http://www.nabble.com/Why-is-large-number-of---%28heavy%29-keys-%2C-%28light%29-value--faster-than-%28light%29key-%2C-%28heavy%29-value-tp22447877p22463784.html > Sent from the Hadoop core-user mailing list archive at Nabble.com. > > -- Richa Khandelwal University Of California, Santa Cruz. Ph:425-241-7763
Profiling Map/Reduce Tasks
Hi, Does Map/Reduce profiles jobs down to milliseconds. From what I can see in the logs, there is no time specified for the job. Although CPU TIME is an information that should be present in the logs, it was not profiled and the response time can only be noted in down to seconds from the runtime progress of the jobs. Does someone know how to efficiently profile map reduce jobs? Thanks, Richa Khandelwal University Of California, Santa Cruz. Ph:425-241-7763
Re: mapred.input.file returns null
I changed "mapred.input.file" to "map.input.file" based on a post by JIRA which claimed that its a typo On Fri, Mar 6, 2009 at 11:21 PM, Richa Khandelwal wrote: > Here's a snippet of my code: > private static String inputFile; > > public void configure(JobConf job) > { > inputFile=job.get("map.input.file"); > System.out.println("File "+inputFile); > } > > > > On Fri, Mar 6, 2009 at 11:19 PM, Amandeep Khurana wrote: > >> How are you using it? >> >> >> Amandeep Khurana >> Computer Science Graduate Student >> University of California, Santa Cruz >> >> >> On Fri, Mar 6, 2009 at 11:18 PM, Richa Khandelwal > >wrote: >> >> > Hi All, >> > I am trying to retrieve the names of files for each record that I am >> > processing. Using "mapred.input.file" returns null. Does anyone know the >> > workaround or the fix to this? >> > >> > Thanks, >> > Richa Khandelwal >> > >> > >> > University Of California, >> > Santa Cruz. >> > Ph:425-241-7763 >> > >> > > > > -- > Richa Khandelwal > > > University Of California, > Santa Cruz. > Ph:425-241-7763 > -- Richa Khandelwal University Of California, Santa Cruz. Ph:425-241-7763
Re: mapred.input.file returns null
Here's a snippet of my code: private static String inputFile; public void configure(JobConf job) { inputFile=job.get("map.input.file"); System.out.println("File "+inputFile); } On Fri, Mar 6, 2009 at 11:19 PM, Amandeep Khurana wrote: > How are you using it? > > > Amandeep Khurana > Computer Science Graduate Student > University of California, Santa Cruz > > > On Fri, Mar 6, 2009 at 11:18 PM, Richa Khandelwal >wrote: > > > Hi All, > > I am trying to retrieve the names of files for each record that I am > > processing. Using "mapred.input.file" returns null. Does anyone know the > > workaround or the fix to this? > > > > Thanks, > > Richa Khandelwal > > > > > > University Of California, > > Santa Cruz. > > Ph:425-241-7763 > > > -- Richa Khandelwal University Of California, Santa Cruz. Ph:425-241-7763
mapred.input.file returns null
Hi All, I am trying to retrieve the names of files for each record that I am processing. Using "mapred.input.file" returns null. Does anyone know the workaround or the fix to this? Thanks, Richa Khandelwal University Of California, Santa Cruz. Ph:425-241-7763
Batch processing map reduce jobs
Hi All, Does anyone know how to run map reduce jobs using pipes or batch process map reduce jobs? Thanks, Richa Khandelwal University Of California, Santa Cruz. Ph:425-241-7763
Re: Hadoop AMI for EC2
Hi All, I am trying trying to log map reduce jobs in HADOOP_LOG_DIR by setting its value in hadoop-env.sh. But the directory has no log records when the job finishes running. I am adding JobConf.setProfileEnabled(true) in my job. Can anyone point out how to log in hadoop? Thanks, Richa On Thu, Mar 5, 2009 at 8:20 AM, Richa Khandelwal wrote: > Thats pretty cool. Thanks > > > On Thu, Mar 5, 2009 at 8:17 AM, tim robertson > wrote: > >> Yeps, >> >> A good starting read: http://wiki.apache.org/hadoop/AmazonEC2 >> >> These are the AMIs: >> >> $ ec2-describe-images -a | grep hadoop >> IMAGE ami-245db94dcloudbase-1.1-hadoop-fc64/image.manifest.xml >> 247610401714available public x86_64 machine >> IMAGE ami-791ffb10 >> cloudbase-hadoop-fc64/cloudbase-hadoop-fc64.manifest.xml >> 247610401714available public x86_64 machine >> IMAGE ami-f73adf9ecs345-hadoop-EC2-0.15.3/hadoop-0.15.3.manifest.xml >> 825431212034available public i386machine >> IMAGE ami-c55db8acfedora8-hypertable-hadoop-kfs/image.manifest.xml >> 291354417104available public x86_64 machine >> aki-b51cf9dcari-b31cf9da >> IMAGE ami-ce6b8fa7hachero-hadoop/hadoop-0.19.0-i386.manifest.xml >> 118946012109available public i386machine >> aki-a71cf9ceari-a51cf9cc >> IMAGE ami-dd48acb4hachero-hadoop/hadoop-0.19.0-x86_64.manifest.xml >> 118946012109available public x86_64 machine >> aki-b51cf9dcari-b31cf9da >> IMAGE ami-ee53b687hadoop-ec2-images/hadoop-0.17.0-i386.manifest.xml >> 111560892610available public i386machine >> aki-a71cf9ceari-a51cf9cc >> IMAGE ami-f853b691 >> hadoop-ec2-images/hadoop-0.17.0-x86_64.manifest.xml 111560892610 >> available public x86_64 machine aki-b51cf9dc >> ari-b31cf9da >> IMAGE ami-65987c0chadoop-images/hadoop-0.17.1-i386.manifest.xml >> 914733919441available public i386machine aki-a71cf9ce >>ari-a51cf9cc >> IMAGE ami-4b987c22hadoop-images/hadoop-0.17.1-x86_64.manifest.xml >> 914733919441available public x86_64 machine aki-b51cf9dc >>ari-b31cf9da >> IMAGE ami-b0fe1ad9hadoop-images/hadoop-0.18.0-i386.manifest.xml >> 914733919441available public i386machine aki-a71cf9ce >>ari-a51cf9cc >> IMAGE ami-90fe1af9hadoop-images/hadoop-0.18.0-x86_64.manifest.xml >> 914733919441available public x86_64 machine aki-b51cf9dc >>ari-b31cf9da >> IMAGE ami-ea36d283hadoop-images/hadoop-0.18.1-i386.manifest.xml >> 914733919441available public i386machine aki-a71cf9ce >>ari-a51cf9cc >> IMAGE ami-fe37d397hadoop-images/hadoop-0.18.1-x86_64.manifest.xml >> 914733919441available public x86_64 machine aki-b51cf9dc >>ari-b31cf9da >> IMAGE ami-fa6a8e93hadoop-images/hadoop-0.19.0-i386.manifest.xml >> 914733919441available public i386machine aki-a71cf9ce >>ari-a51cf9cc >> IMAGE ami-cd6a8ea4hadoop-images/hadoop-0.19.0-x86_64.manifest.xml >> 914733919441available public x86_64 machine aki-b51cf9dc >>ari-b31cf9da >> IMAGE ami-15e80f7c >> hadoop-images/hadoop-base-20090210-i386.manifest.xml914733919441 >> available public i386machine aki-a71cf9ce >> ari-a51cf9cc >> IMAGE ami-1ee80f77 >> hadoop-images/hadoop-base-20090210-x86_64.manifest.xml 914733919441 >> available public x86_64 machine aki-b51cf9dc >> ari-b31cf9da >> IMAGE ami-4de30724 >> hbase-ami/hbase-0.2.0-hadoop-0.17.1-i386.manifest.xml 834125115996 >> available public i386machine aki-a71cf9ce >> ari-a51cf9cc >> IMAGE ami-fe7c9997radlab-hadoop-4-large/image.manifest.xml >> 117716615155available public x86_64 machine >> IMAGE ami-7f7f9a16radlab-hadoop-4/image.manifest.xml >> 117716615155available public i386machine >> $ >> >> Cheers, >> >> Tim >> >> >> >> On Thu, Mar 5, 2009 at 5:13 PM, Richa Khandelwal >> wrote: >> > Hi All, >> > Is there an existing Hadoop AMI for EC2 which had Hadaoop setup on it? >> > >> > Thanks, >> > Richa Khandelwal >> > >> > >> > University Of California, >> > Santa Cruz. >> > Ph:425-241-7763 >> > >> > > > > -- > Richa Khandelwal > > > University Of California, > Santa Cruz. > Ph:425-241-7763 > -- Richa Khandelwal University Of California, Santa Cruz. Ph:425-241-7763
Re: Hadoop AMI for EC2
Thats pretty cool. Thanks On Thu, Mar 5, 2009 at 8:17 AM, tim robertson wrote: > Yeps, > > A good starting read: http://wiki.apache.org/hadoop/AmazonEC2 > > These are the AMIs: > > $ ec2-describe-images -a | grep hadoop > IMAGE ami-245db94dcloudbase-1.1-hadoop-fc64/image.manifest.xml > 247610401714available public x86_64 machine > IMAGE ami-791ffb10 > cloudbase-hadoop-fc64/cloudbase-hadoop-fc64.manifest.xml > 247610401714available public x86_64 machine > IMAGE ami-f73adf9ecs345-hadoop-EC2-0.15.3/hadoop-0.15.3.manifest.xml > 825431212034available public i386machine > IMAGE ami-c55db8acfedora8-hypertable-hadoop-kfs/image.manifest.xml > 291354417104available public x86_64 machine > aki-b51cf9dcari-b31cf9da > IMAGE ami-ce6b8fa7hachero-hadoop/hadoop-0.19.0-i386.manifest.xml > 118946012109available public i386machine > aki-a71cf9ceari-a51cf9cc > IMAGE ami-dd48acb4hachero-hadoop/hadoop-0.19.0-x86_64.manifest.xml > 118946012109available public x86_64 machine > aki-b51cf9dcari-b31cf9da > IMAGE ami-ee53b687hadoop-ec2-images/hadoop-0.17.0-i386.manifest.xml > 111560892610available public i386machine > aki-a71cf9ceari-a51cf9cc > IMAGE ami-f853b691hadoop-ec2-images/hadoop-0.17.0-x86_64.manifest.xml > 111560892610available public x86_64 machine > aki-b51cf9dcari-b31cf9da > IMAGE ami-65987c0chadoop-images/hadoop-0.17.1-i386.manifest.xml > 914733919441available public i386machine aki-a71cf9ce >ari-a51cf9cc > IMAGE ami-4b987c22hadoop-images/hadoop-0.17.1-x86_64.manifest.xml > 914733919441available public x86_64 machine aki-b51cf9dc >ari-b31cf9da > IMAGE ami-b0fe1ad9hadoop-images/hadoop-0.18.0-i386.manifest.xml > 914733919441available public i386machine aki-a71cf9ce >ari-a51cf9cc > IMAGE ami-90fe1af9hadoop-images/hadoop-0.18.0-x86_64.manifest.xml > 914733919441available public x86_64 machine aki-b51cf9dc >ari-b31cf9da > IMAGE ami-ea36d283hadoop-images/hadoop-0.18.1-i386.manifest.xml > 914733919441available public i386machine aki-a71cf9ce >ari-a51cf9cc > IMAGE ami-fe37d397hadoop-images/hadoop-0.18.1-x86_64.manifest.xml > 914733919441available public x86_64 machine aki-b51cf9dc >ari-b31cf9da > IMAGE ami-fa6a8e93hadoop-images/hadoop-0.19.0-i386.manifest.xml > 914733919441available public i386machine aki-a71cf9ce >ari-a51cf9cc > IMAGE ami-cd6a8ea4hadoop-images/hadoop-0.19.0-x86_64.manifest.xml > 914733919441available public x86_64 machine aki-b51cf9dc >ari-b31cf9da > IMAGE ami-15e80f7c > hadoop-images/hadoop-base-20090210-i386.manifest.xml914733919441 > available public i386machine aki-a71cf9ce > ari-a51cf9cc > IMAGE ami-1ee80f77 > hadoop-images/hadoop-base-20090210-x86_64.manifest.xml 914733919441 > available public x86_64 machine aki-b51cf9dc > ari-b31cf9da > IMAGE ami-4de30724 > hbase-ami/hbase-0.2.0-hadoop-0.17.1-i386.manifest.xml 834125115996 > available public i386machine aki-a71cf9ce > ari-a51cf9cc > IMAGE ami-fe7c9997radlab-hadoop-4-large/image.manifest.xml > 117716615155available public x86_64 machine > IMAGE ami-7f7f9a16radlab-hadoop-4/image.manifest.xml > 117716615155available public i386 machine > $ > > Cheers, > > Tim > > > > On Thu, Mar 5, 2009 at 5:13 PM, Richa Khandelwal > wrote: > > Hi All, > > Is there an existing Hadoop AMI for EC2 which had Hadaoop setup on it? > > > > Thanks, > > Richa Khandelwal > > > > > > University Of California, > > Santa Cruz. > > Ph:425-241-7763 > > > -- Richa Khandelwal University Of California, Santa Cruz. Ph:425-241-7763
Repartitioned Joins
Hi All, Does anyone know of tweaking in map-reduce joins that will optimize it further in terms of the moving only those tuples to reduce phase that join in the two tables? There are replicated joins and semi-join strategies but they are more of databases than map-reduce. Thanks, Richa Khandelwal University Of California, Santa Cruz. Ph:425-241-7763