Re: advice

2012-11-27 Thread Mahesh Balija
Hi Jamal, Please follow the inline answers, On Wed, Nov 28, 2012 at 10:47 AM, jamal sasha wrote: > Hi, > Lately, I have been writing alot of algorithms in map reduce abstraction > in python (hadoop streaming). > I have got a hang of it (I think)... > I have couple of questions:

Re: Hadoop cluster configuration

2012-11-27 Thread Visioner Sadak
could you tell me which version of hadoop r u using... On Wed, Nov 28, 2012 at 1:45 AM, Michael Namaiandeh < mnamaian...@healthcit.com> wrote: > Hi Hadoop user community, > > ** ** > > I am trying to setup my first Hadoop cluster and I’ve found most of the > instructions a little confusing.

Re: Assigning reduce tasks to specific nodes

2012-11-27 Thread Hiroyuki Yamada
Hi Harsh, Thank you for the information. I understand the current circumstances. How about for mappers ? As far as I tested, location information in InputSplit is ignored in 0.20.2, so there seems no easy way for assigning mappers to specific nodes. (I before checked the source and noticed that l

advice

2012-11-27 Thread jamal sasha
Hi, Lately, I have been writing alot of algorithms in map reduce abstraction in python (hadoop streaming). I have got a hang of it (I think)... I have couple of questions: 1) By not using java libraries, what power of hadoop am I missing? 2) I know that this is just the tip of the iceberg, can so

Re: Assigning reduce tasks to specific nodes

2012-11-27 Thread Harsh J
This is not supported/available currently even in MR2, but take a look at https://issues.apache.org/jira/browse/MAPREDUCE-199. On Wed, Nov 28, 2012 at 9:34 AM, Hiroyuki Yamada wrote: > Hi, > > I am wondering how I can assign reduce tasks to specific nodes. > What I want to do is, for example,

Assigning reduce tasks to specific nodes

2012-11-27 Thread Hiroyuki Yamada
Hi, I am wondering how I can assign reduce tasks to specific nodes. What I want to do is, for example, assigning reducer which produces part-0 to node xxx000, and part-1 to node xxx001 and so on. I think it's abount task assignment scheduling but I am not sure where to customize to achie

Re: Failed to call hadoop API

2012-11-27 Thread ugiwgh
I want to code a program to get hadoop job information, such as jobid, jobname, owner, running nodes. -- Original -- From: "Mahesh Balija"; Date: Tue, Nov 27, 2012 10:02 PM To: "user"; Subject: Re: Failed to call hadoop API Hi Hui,

Re: HDFS Shell documentation 404

2012-11-27 Thread Andy Isaacson
On Tue, Nov 27, 2012 at 1:35 PM, Uri Laserson wrote: > This URL gives me a 404 > http://hadoop.apache.org/docs/current/file_system_shell.html The 2.0 docs are not correctly being generated, currently. https://issues.apache.org/jira/browse/HADOOP-8427 As a workaround you can refer to a previous v

HDFS Shell documentation 404

2012-11-27 Thread Uri Laserson
This URL gives me a 404 http://hadoop.apache.org/docs/current/file_system_shell.html Uri -- Uri Laserson, PhD Data Scientist, Cloudera Twitter/GitHub: @laserson +1 617 910 0447 laser...@cloudera.com

Re: Hadoop cluster configuration

2012-11-27 Thread Dino Kečo
Hi Michael, There is good guide how to setup multinode hadoop cluster. http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ Hope it helps. Regards, Dino Kečo msn: xdi...@hotmail.com mail: dino.k...@gmail.com skype: dino.keco phone: +387 61 507 851 On Tue, N

Re: Hadoop cluster configuration

2012-11-27 Thread Mohammad Tariq
Hello Michael, You can use any port of your choice. But I would suggest not to use 50030 or 50070 as they are the default ports for Hdfs and MR webUI. Also, if you are planning to create a distributed cluster (4 nodes as specified by you), do not use localhost anywhere. Instead use appropriate ho

Re: Hadoop cluster configuration

2012-11-27 Thread Harsh J
Hi Michael, I'd suggest following Michael Noll's write-up on this topic, at http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/. It should clear some of your confusion, and also get you up and running quickly. P.s. Though the title mentions "Ubuntu", there's v

Hadoop cluster configuration

2012-11-27 Thread Michael Namaiandeh
Hi Hadoop user community, I am trying to setup my first Hadoop cluster and I've found most of the instructions a little confusing. I've seen how-to's that say "core-site.xml" should have hdfs://localhost:8020 and others say hdfs://localhost:50030". Which one is correct? Can someone please help

Re: block-size vs split-size

2012-11-27 Thread Mohammad Tariq
Harsh, You ruined my hard work ;) I had written a big reply for Andy and suddenly I got a call from a client. Meantime you posted your reply. No worries it was almost same as yours. Anyways, thank you for the detailed explanation. :) Regards, Mohammad Tariq On Wed, Nov 28, 2012 at 1:20

Re: question about ChecksumFSInputChecker

2012-11-27 Thread Yin Huai
Is there anyone who can answer my question? Thansk, Yin On Fri, Nov 16, 2012 at 4:15 PM, Yin Huai wrote: > Hi All, > > I have a question about the method read in CheckSumFSInputChecker. Can you > let me know why the file will be opened and closed for every read operation? > > thanks, > > Yin >

RE: block-size vs split-size

2012-11-27 Thread Kartashov, Andy
Thanks Harsh. I totally forgot about the locality thing. I take it, for the best perfomance it is better leave the split size property alone and let the framework handle the splits on the basis of the block size. p.s. There were meant to be only 5 questions. Rgds, AK47 -Original Message--

Re: block-size vs split-size

2012-11-27 Thread Harsh J
Hi, Response inline. On Tue, Nov 27, 2012 at 8:35 PM, Kartashov, Andy wrote: > Guys, > > I understand that if not specified, default block size of HDFs is 64Mb. You > can control this value by altering dfs.block.size property and increasing to > value to 64Mb x 2 or 64Mb x 4.. Every time we make

RE: Complex MapReduce applications with the streaming API

2012-11-27 Thread Zoltán Tóth-Czifra
Hi, Thanks, the self-referencing subworkflow is a good idea, it never occured to me. However, I'm still expecting something that is more light-weight, with no Oozie or external tools. My best idea now is simply abstracting the exec call in my script that submits the job (hadoop jar hadoop-strea

Re: Complex MapReduce applications with the streaming API

2012-11-27 Thread Alejandro Abdelnur
> Using Oozie seems to be an overkilling for this application, besides, it doesn't support "loops" > so the recusrsion can't really be implemented. Correct, Oozie does not support loops, this is a restriction by design (early prototypes supported loops). The idea was that you didn't want never end

Re: Failed To Start SecondaryNameNode in Secure Mode

2012-11-27 Thread Arpit Gupta
Hi AC, Do you have the following property defined in your hdfs-site.xml dfs.secondary.namenode.kerberos.internal.spnego.principal HTTP/_HOST@REALM and this principal needs to be available in your /etc/hadoop/hadoop.keytab. From the logs it looks like you only have the following configured "d

RE: Failed To Start SecondaryNameNode in Secure Mode

2012-11-27 Thread Kartashov, Andy
I have never experienced this problem yet so do not know a straight answer. But this is what I would have looked at. Are you running Hadoop in Psedo or Fully-distributed? What is your topology, where are you running SNN? - Is the SNN daemon running? (# jps) - If it is on a se

Re: Hadoop 1.0.4 Performance Problem

2012-11-27 Thread Jie Li
Jon: This is interesting and helpful! How did you figure out the cause? And how much time did you spend? Could you share some experience of performance diagnosis? Jie On Tuesday, November 27, 2012, Harsh J wrote: > Hi Amit, > > The default scheduler is FIFO, and may not work for all forms of >

RE: MapReduce APIs

2012-11-27 Thread Kartashov, Andy
Super! I did miss the fact that Job class indeed inherits from JobContext. This clarifies my issue. Thanks Dave and Harsh. From: Dave Beech [mailto:d...@paraliatech.com] Sent: Tuesday, November 27, 2012 9:51 AM To: user@hadoop.apache.org Subject: Re: MapReduce APIs AK - look again at that javado

block-size vs split-size

2012-11-27 Thread Kartashov, Andy
Guys, I understand that if not specified, default block size of HDFs is 64Mb. You can control this value by altering dfs.block.size property and increasing to value to 64Mb x 2 or 64Mb x 4.. Every time we make the change to this property we must reimport the data for the changes to take effect

Re: MapReduce APIs

2012-11-27 Thread Dave Beech
AK - look again at that javadoc. Job does a have getConfiguration() method. You may have missed it the first time because it's inherited from a parent class, JobContext. On 27 November 2012 14:23, Kartashov, Andy wrote: > Thank man for the response. Much appreciated. > > > > Why? Because Job o

RE: ClassNotFoundException: org.jdom2.JDOMException

2012-11-27 Thread Kartashov, Andy
The common problem users have are adding values to the classpath AFTER the daemons had been started. If you are getting ClassNotFound exception and are 100% sure you have correctly specified path to the jar files and your jar files actually have the compiled class then simply restart the daemons

RE: MapReduce APIs

2012-11-27 Thread Kartashov, Andy
Thank man for the response. Much appreciated. Why? Because Job object doesn't have the below method getConfiguration(). See for yourself under mapreduce.Job: http://hadoop.apache.org/docs/r0.20.2/api/index.html or http://hadoop.apache.org/docs/current/api/index.html So, back to my original ques

Failed To Start SecondaryNameNode in Secure Mode

2012-11-27 Thread a...@hsk.hk
Hi, Please help! I tried to start SecondaryNameNode in secure mode by the command: {$HADOOP_HOME}bin/hadoop-daemon.sh start secondarynamenode 1) from the log, I saw "Login successful" / 2012-11-27 22:05:23,120 INFO or

Re: ClassNotFoundException: org.jdom2.JDOMException

2012-11-27 Thread Mahesh Balija
Hi Bharath, You may need to copy the jars across all the nodes in your Hadoop cluster. Or another best way to handle this is package all the required jars with your application.jar which you pass to your Hadoop commandline. This will take care of caching it to a

Re: Failed to call hadoop API

2012-11-27 Thread Mahesh Balija
Hi Hui, JobID constructor is not a public constructor it has default visibility so you have to create the instance within the same package. Usually you cannot create a JobID rather you can get one from the JOB instance by invoking getJobID(). If this

Re: ClassNotFoundException: org.jdom2.JDOMException

2012-11-27 Thread dyuti a
Hi Bharath, yes i have added all those jars. Thanks, dti On Tue, Nov 27, 2012 at 6:35 PM, bharath vissapragada < bharathvissapragada1...@gmail.com> wrote: > Hi, > > Did you add JDOM jar [1] to your class path ? > > [1] http://www.jdom.org/downloads/index.html > > Thanks, > > > On Tue, Nov 27, 20

Re: ClassNotFoundException: org.jdom2.JDOMException

2012-11-27 Thread bharath vissapragada
Hi, Did you add JDOM jar [1] to your class path ? [1] http://www.jdom.org/downloads/index.html Thanks, On Tue, Nov 27, 2012 at 6:30 PM, dyuti a wrote: > Hi All, > am working on XML processing in hadoop , followed the steps from the blog > http://xmlandhadoop.blogspot.in/.I have added all jar

Failed to call hadoop API

2012-11-27 Thread GHui
I call the sentence "JobID id = new JobID()" of hadoop API with JNI. But when my program run to this sentence, it exits. And no errors output. I don't make any sense of this. The hadoop is hadoop-core-1.0.3.jar. The Java sdk is jdk1.6.0-34. Any help will be appreciated. -GHui

Complex MapReduce applications with the streaming API

2012-11-27 Thread Zoltán Tóth-Czifra
Hi everyone, Thanks in advance for the support. My problem is the following: I'm trying to develop a fairly complex MapReduce application using the streaming API (for demonstation purposes, so unfortunately the "use Java" answer doesn't work :-( ). I can get one single MapReduce phase running f

Re: To get JobID

2012-11-27 Thread ugiwgh
Thanks very much. It worked. -GHui -- Original -- From: "Harsh J"; Date: Mon, Nov 12, 2012 01:32 PM To: "user"; Subject: Re: To get JobID You can get "TaskCompletionEvent" objects and extract the task tracker URLs out of it. Use the array of TaskCom

Re: Hadoop 1.0.4 Performance Problem

2012-11-27 Thread Harsh J
Hi Amit, The default scheduler is FIFO, and may not work for all forms of workloads. Read the multiple schedulers available to see if they have features that may benefit your workload: Capacity Scheduler: http://hadoop.apache.org/docs/stable/capacity_scheduler.html FairScheduler: http://h

Re: Hadoop 1.0.4 Performance Problem

2012-11-27 Thread Jon Allen
Yes, it's a fair scheduler thing - I don't think it's right to call it a problem, it's just a change in behaviour that came as a surprise. My reason for using the fair scheduler is because we have multiple users running simultaneous jobs and the default scheduler doesn't allow this. My choice of

Re: Hadoop 1.0.4 Performance Problem

2012-11-27 Thread Jon Allen
Amit, As Harsh says, I was referring to the mapred.fairscheduler.assignmultiple config property. This only shows as an issue if you're running significantly fewer reduce tasks than available slots. We ran the terasort as a simple smoke test of the upgrade rather than as a useful performance meas

Re: Hadoop 1.0.4 Performance Problem

2012-11-27 Thread Amit Sela
So this is a FairScheduler problem ? We are using the default Hadoop scheduler. Is there a reason to use the Fair Scheduler if most of the time we don't have more than 4 jobs running simultaneously ? On Tue, Nov 27, 2012 at 12:00 PM, Harsh J wrote: > Hi Amit, > > He means the mapred.fairschedule

Re: Hadoop 1.0.4 Performance Problem

2012-11-27 Thread Harsh J
Hi Amit, He means the mapred.fairscheduler.assignmultiple FairScheduler property. It is true by default, which works well for most workloads if not benchmark style workloads. I would not usually trust that as a base perf. measure of everything that comes out of an upgrade. The other JIRA, MAPREDU

Re: Hadoop 1.0.4 Performance Problem

2012-11-27 Thread Amit Sela
Hi Jon, I recently upgraded our cluster from Hadoop 0.20.3-append to Hadoop 1.0.4 and I haven't noticed any performance issues. By "multiple assignment feature" do you mean speculative execution (mapred.map.tasks.speculative.execution and mapred.reduce.tasks.speculative.execution) ? On Mon, Nov