MutableCounterLong and MutableCounterLong class difference in metrics v2

2013-08-08 Thread lei liu
I use hadoop-2.0.5, there are MutableCounterLong and MutableCounterLong class in metrics v2. I am studing metrics v2 code. What are difference MutableCounterLong and MutableCounterLong class ? I find the MutableCounterLong is used to calculate throughput, is that right? How does the metrics

Re: alternative to $HADOOP_HOME/lib

2013-08-08 Thread Harsh J
John, I assume you do not wish to be using the DistributedCache (or a HDFS location for DistributedCache), which is the most ideal way to ship jars. You can place your jars onto the TT classpaths by placing them at an arbitrary location such as /opt/jars, and editing the TT's hadoop-env.sh to ext

Re: problem about DN&NM directory design

2013-08-08 Thread Harsh J
This is appropriate. You are making use of both disk mounts and have the directories for each service isolated as well. On Fri, Aug 9, 2013 at 7:37 AM, ch huang wrote: > hi,all: >i plan to put DN together with NM, i want to use 2*1TB disk , one > disk mount on /data/1 and another mount on

Re: issue about resource manager HA

2013-08-08 Thread Harsh J
You are partially incorrect - NameNode is not an SPOF any longer in 2.x releases. Please look at the docs that cover HA in the release you use, such as http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithQJM.html. RM HA is not yet present, but is incoming. S

problem about DN&NM directory design

2013-08-08 Thread ch huang
hi,all: i plan to put DN together with NM, i want to use 2*1TB disk , one disk mount on /data/1 and another mount on /data/2 and i set dfs.datanode.data.dir as "/data/1/hadoopdataspace ,/data/2/hadoopdataspace", set yarn.nodemanager.local-dirs as "/data/1/yarn/local ,/data/2/yarn/local" , s

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Harsh J
I suppose I should have been clearer. There's no problem out of box if people stick to the libraries we offer :) Yes the LRW was marked synchronized at some point over 8 years ago [1] in support for multi-threaded maps, but the framework has changed much since then. The MultithreadedMapper/etc. AP

Re: alternative to $HADOOP_HOME/lib

2013-08-08 Thread Sanjeev Verma
On 08/08/2013 09:23 PM, John Hancock wrote: Where else might one put .jar files that a map/reduce job will need? Why do you need an alternative location? Is there a constraint on being able to place your library jars under $HADOOP_HOME/lib?

Re: Mapreduce for beginner

2013-08-08 Thread Shahab Yunus
Given that your questions are very broad and at high level, I would suggest that you should pick up a book or such to go through that. The Hadoop: Definitive Guide by Tom White is a great book to start with it. Meanwhile some links to start with: http://hadoop.apache.org/docs/stable/mapred_tutoria

alternative to $HADOOP_HOME/lib

2013-08-08 Thread John Hancock
Where else might one put .jar files that a map/reduce job will need?

issue about resource manager HA

2013-08-08 Thread ch huang
hi,all: like name node resource manager is also a SPOF ,is there any solution for RM HA?

Re: is RM require a lot of memory?

2013-08-08 Thread ch huang
so ,from performance aspect i need seperate RN with NN,because each of them is memory hungry On Fri, Aug 9, 2013 at 8:00 AM, Marcos Luis Ortiz Valmaseda < marcosluis2...@gmail.com> wrote: > Remember that in YARN, the two main responsibilities of the JobTracker is > divided in two different compon

Fwd: Mapreduce for beginner

2013-08-08 Thread Olivier Austina
Hi, I start learning about mapreduce with Hadoop by wordcount example. I am a bit confused about the frontier between the map and reduce program. Is there a standard format for the map output and the reduce input? Is there a full explanation of java classes used somewhere? I also appreciate to lea

Re: is RM require a lot of memory?

2013-08-08 Thread Marcos Luis Ortiz Valmaseda
Remember that in YARN, the two main responsibilities of the JobTracker is divided in two different components: - Resource Management by ResourceManager (this is a global component) - Job scheduling and monitoring by the NodeManager (this is a per-node component) - Resource negotiation and task exec

Converting a Path to a full URI String and preserving special characters

2013-08-08 Thread Public Network Services
Is there a reliable way of converting an HDFS Path object into a String? Invoking path.toUri().toString() does not work with special characters (e.g., if there are spaces in the original path string). So, for instance, in the following example String address = ...; // Path string without the hdfs

scripts for mapred.healthChecker option?

2013-08-08 Thread Jeff Kubina
Are there any standard or recommended scripts for the mapred.healthChecker options in the mapred-site.xml configuration file for a linux box? -Jeff

Re: Hosting Hadoop

2013-08-08 Thread Dhaval Shah
Thanks for the list Marcos. I will go through the slides/links. I think that's helpful   Regards, Dhaval From: Marcos Luis Ortiz Valmaseda To: Dhaval Shah Cc: user@hadoop.apache.org Sent: Thursday, 8 August 2013 4:50 PM Subject: Re: Hosting Hadoop Well, a

Re: Hosting Hadoop

2013-08-08 Thread Marcos Luis Ortiz Valmaseda
Well, all depends, because many companies use Cloud Computing platforms like Amazon EMR. Vmware, Rackscpace Cloud for Hadoop hosting: http://aws.amazon.com/elasticmapreduce http://www.vmware.com/company/news/releases/vmw-mapr-hadoop-062013.html http://bitrefinery.com/services/hadoop-hosting http://

Hosting Hadoop

2013-08-08 Thread Dhaval Shah
We are exploring the possibility of hosting Hadoop outside of our data centers. I am aware that Hadoop in general isn't exactly designed to run on virtual hardware. So a few questions: 1. Are there any providers out there who would host Hadoop on dedicated physical hardware?  2. Has anyone had s

Re: why FairScheduler prefer to schedule MR jobs into the same node?

2013-08-08 Thread Sandy Ryza
Hi devdoer, What version are you using? -Sandy On Thu, Aug 8, 2013 at 4:25 AM, devdoer bird wrote: > HI: > > I configure the FairScheduler with default settings and my job has 19 > reduce tasks. I found that all the reduce tasks are schedule to run in one > node. > > While with default FIFO

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Azuryy Yu
sequence writer is also synchronized, I dont think this is bad. if you call HDFS api to write concurrently, then its necessary. On Aug 8, 2013 7:53 PM, "Jay Vyas" wrote: > Then is this a bug? Synchronization in absence of any race condition is > normally considered "bad". > > In any case id lik

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Niels Basjes
I would say yes make this a Jira. The actual change can fall (as proposed by Jay) in two directions: Put in synchronization in all implementations OR take it out of all implementations. I think the first thing to determine is why the synchronization was put into the LineRecordWriter in the first

Re: issue about hadoop hardware choose

2013-08-08 Thread Mirko Kämpf
Hello Ch Huang, Do you know this book? "Hadoop Operations" http://shop.oreilly.com/product/0636920025085.do I think, it answers most of the questions in detail. For a production cluster you should consider MRv1. And I suggest you, to go with more hard drives per slave node to have a higher IO b

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Sathwik B P
Hi Harsh, Do you want me to raise a Jira on this. regards, sathwik On Thu, Aug 8, 2013 at 5:23 PM, Jay Vyas wrote: > Then is this a bug? Synchronization in absence of any race condition is > normally considered "bad". > > In any case id like to know why this writer is synchronized whereas the

Re: Datanode doesn't connect to Namenode

2013-08-08 Thread Felipe Gutierrez
Thanks for the hints Shekhar. My cluster is running well. Felipe On Thu, Aug 8, 2013 at 8:56 AM, Shekhar Sharma wrote: > keep the configuration same in the datanodes as well for the time > being..Only thing that data node or slave machine should know is Masters > files ( that means who is the m

Re: Datanode doesn't connect to Namenode

2013-08-08 Thread Shekhar Sharma
keep the configuration same in the datanodes as well for the time being..Only thing that data node or slave machine should know is Masters files ( that means who is the master) and you need to tell the slave machine where is your namenode running, which you need to specify in the property fs.defaul

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Jay Vyas
Then is this a bug? Synchronization in absence of any race condition is normally considered "bad". In any case id like to know why this writer is synchronized whereas the other one are not.. That is, I think, then point at issue: either other writers should be synchronized or else this one sho

Re: Datanode doesn't connect to Namenode

2013-08-08 Thread Shekhar Sharma
if you have removed this property from the slave machines then your DN information will be created under /tmp folder and once you reboot your data node machines, the information will be lost.. Sorry i have not seen the logs..but you dont have play around the properties.. ...see datanode will not c

Re: Datanode doesn't connect to Namenode

2013-08-08 Thread Felipe Gutierrez
Thanks, at all files I changed to master (cloud6) and I take off this property hadoop.tmp.dir. Felipe On Wed, Aug 7, 2013 at 3:20 PM, Shekhar Sharma wrote: > Disable the firewall on data node and namenode machines.. > Regards, > Som Shekhar Sharma > +91-8197243810 > > > On Wed, Aug 7, 2013 at 1

why FairScheduler prefer to schedule MR jobs into the same node?

2013-08-08 Thread devdoer bird
HI: I configure the FairScheduler with default settings and my job has 19 reduce tasks. I found that all the reduce tasks are schedule to run in one node. While with default FIFO schedule, the 19 reduce tasks are scheduled into diffrent nodes. How can I configure FairSchedule to load more balan

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Niels Basjes
I may be nitpicking here but if "perhaps the answer is no" then I conclude: Perhaps the other implementations of RecordWriter are a race condition/file corruption ready to occur. On Thu, Aug 8, 2013 at 12:50 PM, Harsh J wrote: > While we don't fork by default, we do provide a MultithreadedMappe

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Harsh J
While we don't fork by default, we do provide a MultithreadedMapper implementation that would require such synchronization. But if you are asking is it necessary, then perhaps the answer is no. On Aug 8, 2013 3:43 PM, "Azuryy Yu" wrote: > its not hadoop forked threads, we may create a line record

Re: Oozie ssh action error

2013-08-08 Thread Kasa V Varun Tej
*logs:* * * * 2013-08-08 06:03:51,627 INFO org.apache.oozie.command.wf.ActionStartXCommand: USER[root] GROUP[-] TOKEN[] APP[clickstream-wf] JOB[044-130719141217337-oozie-oozi-W] ACTION[044-130719141217337-oozie-oozi-W@:start:] Start action [044-130719141217337-oozie-oozi-W@:start:] with

Re: Oozie ssh action error

2013-08-08 Thread Kasa V Varun Tej
Hey Jitendra, I ensured those two things you mentioned, still i'm facing the same issue. Regards, Kasa On Wed, Aug 7, 2013 at 7:32 PM, Jitendra Yadav wrote: > Hi, > > I hope below points might help you. > > *Approach 1#* > > You need to change the sshd_config file in the remote server (probab

Re: issue about hadoop hardware choose

2013-08-08 Thread Azuryy Yu
if you want HA, then do you want to deploy journal node on the DN? On Aug 8, 2013 5:09 PM, "ch huang" wrote: > hi,all: > My company need build a 10 node hadoop cluster (2 namenode and > 8 datanode & node manager ,for both data storage and data analysis ) ,we > have hbase ,hive on the

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Azuryy Yu
its not hadoop forked threads, we may create a line record writer, then call this writer concurrently. On Aug 8, 2013 4:00 PM, "Sathwik B P" wrote: > Hi, > Thanks for your reply. > May I know where does hadoop fork multiple threads to use a single > RecordWriter. > > regards, > sathwik > > On Thu

issue about hadoop hardware choose

2013-08-08 Thread ch huang
hi,all: My company need build a 10 node hadoop cluster (2 namenode and 8 datanode & node manager ,for both data storage and data analysis ) ,we have hbase ,hive on the hadoop cluster, 10G data increment per day. we use CDH4.3 ( for dual - namenode HA),my plan is

RE: is it ok? build hadoop cluster on kvm on product envionment?

2013-08-08 Thread Sourygna Luangsay
Hi, In my company we sometimes uses KVM to launch test or small demo clusters. Every developer also has a KVM pseudo distributed cluster on its computer. Nonetheless, I would not recommend using KVM for production clusters. Check this link about all the theory of Hadoop with virtualizat

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Sathwik B P
Hi, Thanks for your reply. May I know where does hadoop fork multiple threads to use a single RecordWriter. regards, sathwik On Thu, Aug 8, 2013 at 7:06 AM, Azuryy Yu wrote: > because we may use multi-threads to write a single file. > On Aug 8, 2013 2:54 PM, "Sathwik B P" wrote: > >> Hi, >> >>

Re: Why LineRecordWriter.write(..) is synchronized

2013-08-08 Thread Azuryy Yu
because we may use multi-threads to write a single file. On Aug 8, 2013 2:54 PM, "Sathwik B P" wrote: > Hi, > > LineRecordWriter.write(..) is synchronized. I did not find any other > RecordWriter implementations define the write as synchronized. > Any specific reason for this. > > regards, > sath