Re: hadoop datanode is dead but cannot stop it

2012-04-19 Thread Harsh J
What distro/version of Hadoop are you using? This was a bug fixed quite a while ago. On Fri, Apr 20, 2012 at 7:29 AM, Johnson Chengwu wrote: > I have encountered when there is a disk IO error in a datanode machine, the > datanode will be dead, but the in the dead datanode, the datanode daemon is

Re: Help me with architecture of a somewhat non-trivial mapreduce implementation

2012-04-19 Thread Michael Segel
If the file is small enough you could read it in to a java object like a list and write your own input format that takes a list object as its input and then lets you specify the number of mappers. On Apr 19, 2012, at 11:34 PM, Sky wrote: > My file for the input to mapper is very small - as all

Re: Help me with architecture of a somewhat non-trivial mapreduce implementation

2012-04-19 Thread Sky
My file for the input to mapper is very small - as all it has is urls to list of manifests. The task for mappers is to fetch each manifest, and then fetch files using urls from the manifests and then process them. Besides passing around lists of files, I am not really accessing the disk. It sho

Re: Help me with architecture of a somewhat non-trivial mapreduce implementation

2012-04-19 Thread Michael Segel
How 'large' or rather in this case small is your file? If you're on a default system, the block sizes are 64MB. So if your file ~<= 64MB, you end up with 1 block, and you will only have 1 mapper. On Apr 19, 2012, at 10:10 PM, Sky wrote: > Thanks for your reply. After I sent my email, I foun

Re: Help me with architecture of a somewhat non-trivial mapreduce implementation

2012-04-19 Thread Sky
Thanks for your reply. After I sent my email, I found a fundamental defect - in my understanding of how MR is distributed. I discovered that even though I was firing off 15 COREs, the map job - which is the most expensive part of my processing was run only on 1 core. To start my map job, I wa

hadoop datanode is dead but cannot stop it

2012-04-19 Thread Johnson Chengwu
I have encountered when there is a disk IO error in a datanode machine, the datanode will be dead, but the in the dead datanode, the datanode daemon is still alive, and I cannot stop it to restart it the datanode. When I check the process , it seems that the linux command "du -sk path/to/datadir" i

Re: Multiple data centre in Hadoop

2012-04-19 Thread Edward Capriolo
Hive is beginning to implement Region support where one metastore will manage multiple filesystems and jobtrackers. When a query creates a table it will then be copied to one ore more datacenters. In addition the query planner will intelligently attempt to run queries in regions only where all the

Re: Multiple data centre in Hadoop

2012-04-19 Thread Robert Evans
If you want to start an open source project for this I am sure that there are others with the same problem that might be very wiling to help out. :) --Bobby Evans On 4/19/12 4:31 PM, "Michael Segel" wrote: I don't know of any open source solution in doing this... And yeah its something one can

Re: Multiple data centre in Hadoop

2012-04-19 Thread Michael Segel
I don't know of any open source solution in doing this... And yeah its something one can't talk about ;-) On Apr 19, 2012, at 4:28 PM, Robert Evans wrote: > Where I work we have done some things like this, but none of them are open > source, and I have not really been directly involved w

Re: Multiple data centre in Hadoop

2012-04-19 Thread Robert Evans
Where I work we have done some things like this, but none of them are open source, and I have not really been directly involved with the details of it. I can guess about what it would take, but that is all it would be at this point. --Bobby On 4/17/12 5:46 PM, "Abhishek Pratap Singh" wrote:

Re: Help me with architecture of a somewhat non-trivial mapreduce implementation

2012-04-19 Thread Robert Evans
>From what I can see your implementation seems OK, especially from a >performance perspective. Depending on what storage: is it is likely to be your >bottlekneck, not the hadoop computations. Because you are writing files directly instead of relying on Hadoop to do it for you, you may need to d

Re: Pre-requisites for hadoop 0.23/CDH4

2012-04-19 Thread praveenesh kumar
Thanks Arun, Will try out those settings. Is there a good documentation on "configuring/playing with hadoop 0.23" apart from the apache hadoop-0.23 page. I have already looked into that page.. Just wondering is there something more that I don't know. Regards, Praveenesh On Fri, Apr 20, 2012 at 12

Re: Pre-requisites for hadoop 0.23/CDH4

2012-04-19 Thread Arun C Murthy
You are better of trying hadoop-0.23.1 or even hadoop-0.23.2-rc0 since CDH4's version of YARN is very incomplete and you might get nasty surprises there. Settings: # Run 1 NodeManager with yarn.nodemanager.resource.memory-mb -> 1024 # Use CapacityScheduler (significantly better tested) by settin

Troubleshoot job failures after upgrade to 1.0.2

2012-04-19 Thread Filippo Diotalevi
Hi, I have an application on Cassandra 1.0.8 + Hadoop. Previously running on the cloudera distribution hadoop-0.20.2-cdh3u1, I tried today to upgrade to Hadoop 1.0.2 but stumbled in issues with consistent job failures The hadoop userlogs seem quite clear: 2012-04-19 18:21:09.837 java[57837:190

JobControl run

2012-04-19 Thread Juan Pino
Hi, I wonder why when I call run() on a JobControl object, it loops forever. Namely, this code doesn't work: JobControl jobControl = new JobControl("Name"); // some stuff here (add jobs and dependencies) jobControl.run(); This code works but looks a bit ugly: JobControl jobControl = new JobCont

Algorithms used in fairscheduler 0.20.205

2012-04-19 Thread Merto Mertek
I could find that the closest doc matching the current implementation of the fairscheduler could be find in this documentfrom Matei Zaharia et al.. Another documented from delay scheduling can be found from year 2010.. a) I am inte

Re: How to rebuild NameNode from DataNode.

2012-04-19 Thread Michel Segel
Switch to MapR M5? :-) Just kidding. Simple way of solving this pre CDH4... NFS mount a directory from your SN and add it to your list of checkpoint directories. You may lose some data, but you should be able rebuild. There is more to this, but its the basic idea on how to get a copy of your m

Re: Hadoop and Ubuntu / Java

2012-04-19 Thread madhu phatak
As per Oracle, going forward openjdk will be official oracle jdk for linux . Which means openjdk will be same as the official one. On Tue, Dec 20, 2011 at 9:12 PM, hadoopman wrote: > > http://www.omgubuntu.co.uk/**2011/12/java-to-be-removed-** > from-ubuntu-uninstalled-from-**user-machines/