Re: missing job history and strange MR job output

2012-01-16 Thread Ioan Eugen Stan
Pe 13.01.2012 06:00, Harsh J a scris: Perhaps you aren't writing it properly? Its hard to tell what your problem may be without looking at some code snippets (sensitive/irrelevant parts may be cut out, or even pseudocode typed up is fine), etc.. Hello Harsh and others, It's fixed. After

Re: hadoop filesystem cache

2012-01-16 Thread Rita
Thanks. I believe this is a good feature to have for clients especially if you are reading the same large file over and over. On Sun, Jan 15, 2012 at 7:33 PM, Todd Lipcon t...@cloudera.com wrote: There is some work being done in this area by some folks over at UC Berkeley's AMP Lab in

Re: Username on Hadoop 20.2

2012-01-16 Thread Eli Finkelshteyn
Hi Folks, I'm still lost on this. Has no one wanted or needed to connect to a Hadoop cluster from a client machine under a name other than the client's whoami before? Eli On 1/13/12 11:00 AM, Eli Finkelshteyn wrote: I tried this, and it doesn't seem to work. Specifically, the way I tested

Re: Username on Hadoop 20.2

2012-01-16 Thread Joey Echeverria
(-common-user, +cdh-user) I'm moving the discussion since this is CDH specific issue. Setting user.name works for plain 0.20.2, but not the CDH version as it's been modified to support enabling Kerberos security. You'll need to modify your code to use something like this:

Re: hadoop filesystem cache

2012-01-16 Thread Edward Capriolo
The challenges of this design is people accessing the same data over and over again is the uncommon usecase for hadoop. Hadoop's bread and butter is all about streaming through large datasets that do not fit in memory. Also your shuffle-sort-spill is going to play havoc on and file system based

failed to build trunk, what's wrong?

2012-01-16 Thread smith jack
mvn compile and failed:( jdk version is 1.6.0_23 maven version is Apache Maven 3.0.3 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-antrun-plugin:1.6:run (compile-proto) on project hadoop-common: An Ant BuildException has occured: exec returned: 127 - [Help 1]

Re: failed to build trunk, what's wrong?

2012-01-16 Thread Ronald Petty
Hello, If you type protoc on the command line is it found? Kindest regards. Ron On Sat, Jan 14, 2012 at 5:52 PM, smith jack jameslor...@gmail.com wrote: mvn compile and failed:( jdk version is 1.6.0_23 maven version is Apache Maven 3.0.3 [ERROR] Failed to execute goal

effect on data after topology change

2012-01-16 Thread rk vishu
Hello All, If i change the rackid for some nodes and restart namenode, will data be rearranged accordingly? Do i need to run rebalancer? Any information on this would be appreciated. Thanks and Regards Ravi

small files problem in hdfs

2012-01-16 Thread rk vishu
Hello All, Could any one give me some information how flume handles small files? If flume agents are setup for text log files, how will flume ensure that there are not many small files?. I believe waiting for fixed time before pumping to HDFS may not guarantee the block sized files. I am trying

Re: small files problem in hdfs

2012-01-16 Thread W.P. McNeill
Write a Hadoop job that uses the default mapper and reducer. Specify the number of reducers when you run it, and it will produce that many output files, grouping input files together as necessary.

Can you unset a mapred.input.dir configuration value?

2012-01-16 Thread W.P. McNeill
It is possible to unset a configuration value? I think the answer is no, but I want to be sure. I know that you can set a configuration value to the empty string, but I have a scenario in which that is not an option. I have a top level Hadoop Tool that launches a series of other Hadoop jobs in

Re: Can you unset a mapred.input.dir configuration value?

2012-01-16 Thread Joey Echeverria
You can useĀ  FileInptuFormat.setInputPaths(configuration, job1-output). This will overwrite the old input path(s). -Joey On Mon, Jan 16, 2012 at 7:16 PM, W.P. McNeill bill...@gmail.com wrote: It is possible to unset a configuration value? I think the answer is no, but I want to be sure. I

Re: How to find out whether a node is Overloaded from Cpu utilization ?

2012-01-16 Thread Amandeep Khurana
Arun, I don't think you'll hear a fixed number. Having said that, I have seen CPU being pegged at 95% during jobs and the cluster working perfectly fine. On the slaves, if you have nothing else going on, Hadoop only has TaskTrackers and DataNodes. Those two daemons are relatively light weight in

Best practices to recover from Corrupt Namenode

2012-01-16 Thread praveenesh kumar
Hi guys, I just faced a weird situation, in which one of my hard disks on DN went down. Due to which when I restarted namenode, some of the blocks went missing and it was saying my namenode is CORRUPT and in safe mode, which doesn't allow you to add or delete any files on HDFS. I know , we can

Re: Best practices to recover from Corrupt Namenode

2012-01-16 Thread Harsh J
You ran into a corrupt files issue, not a namenode corruption (which generally refers to the fsimage or edits getting corrupted). Did your files not have adequate replication that they could not withstand the loss of one DN's disk? What exactly did fsck output? Did all block replicas go

Re: Best practices to recover from Corrupt Namenode

2012-01-16 Thread praveenesh kumar
I have a replication factor of 2, because of the reason that I can not afford 3 replicas on my cluster. fsck output was saying block replicas missing for some files that was making Namenode is corrupt I don't have the output with me. but issue was block replicas were missing. How can we tackle