Hi Jamal,
Please follow the inline answers,
On Wed, Nov 28, 2012 at 10:47 AM, jamal sasha wrote:
> Hi,
> Lately, I have been writing alot of algorithms in map reduce abstraction
> in python (hadoop streaming).
> I have got a hang of it (I think)...
> I have couple of questions:
could you tell me which version of hadoop r u using...
On Wed, Nov 28, 2012 at 1:45 AM, Michael Namaiandeh <
mnamaian...@healthcit.com> wrote:
> Hi Hadoop user community,
>
> ** **
>
> I am trying to setup my first Hadoop cluster and I’ve found most of the
> instructions a little confusing.
Hi Harsh,
Thank you for the information.
I understand the current circumstances.
How about for mappers ?
As far as I tested, location information in InputSplit is ignored in 0.20.2,
so there seems no easy way for assigning mappers to specific nodes.
(I before checked the source and noticed that
l
Hi,
Lately, I have been writing alot of algorithms in map reduce abstraction
in python (hadoop streaming).
I have got a hang of it (I think)...
I have couple of questions:
1) By not using java libraries, what power of hadoop am I missing?
2) I know that this is just the tip of the iceberg, can so
This is not supported/available currently even in MR2, but take a look at
https://issues.apache.org/jira/browse/MAPREDUCE-199.
On Wed, Nov 28, 2012 at 9:34 AM, Hiroyuki Yamada wrote:
> Hi,
>
> I am wondering how I can assign reduce tasks to specific nodes.
> What I want to do is, for example,
Hi,
I am wondering how I can assign reduce tasks to specific nodes.
What I want to do is, for example, assigning reducer which produces
part-0 to node xxx000,
and part-1 to node xxx001 and so on.
I think it's abount task assignment scheduling but
I am not sure where to customize to achie
I want to code a program to get hadoop job information, such as jobid, jobname,
owner, running nodes.
-- Original --
From: "Mahesh Balija";
Date: Tue, Nov 27, 2012 10:02 PM
To: "user";
Subject: Re: Failed to call hadoop API
Hi Hui,
On Tue, Nov 27, 2012 at 1:35 PM, Uri Laserson wrote:
> This URL gives me a 404
> http://hadoop.apache.org/docs/current/file_system_shell.html
The 2.0 docs are not correctly being generated, currently.
https://issues.apache.org/jira/browse/HADOOP-8427
As a workaround you can refer to a previous v
This URL gives me a 404
http://hadoop.apache.org/docs/current/file_system_shell.html
Uri
--
Uri Laserson, PhD
Data Scientist, Cloudera
Twitter/GitHub: @laserson
+1 617 910 0447
laser...@cloudera.com
Hi Michael,
There is good guide how to setup multinode hadoop cluster.
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
Hope it helps.
Regards,
Dino Kečo
msn: xdi...@hotmail.com
mail: dino.k...@gmail.com
skype: dino.keco
phone: +387 61 507 851
On Tue, N
Hello Michael,
You can use any port of your choice. But I would suggest not to use 50030
or 50070 as they are the default ports for Hdfs and MR webUI. Also, if you
are planning to create a distributed cluster (4 nodes as specified by you),
do not use localhost anywhere. Instead use appropriate ho
Hi Michael,
I'd suggest following Michael Noll's write-up on this topic, at
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/.
It should clear some of your confusion, and also get you up and running
quickly.
P.s. Though the title mentions "Ubuntu", there's v
Hi Hadoop user community,
I am trying to setup my first Hadoop cluster and I've found most of the
instructions a little confusing. I've seen how-to's that say "core-site.xml"
should have hdfs://localhost:8020 and others say hdfs://localhost:50030". Which
one is correct? Can someone please help
Harsh,
You ruined my hard work ;)
I had written a big reply for Andy and suddenly I got a call from a client.
Meantime you posted your reply. No worries it was almost same as yours.
Anyways, thank you for the detailed explanation. :)
Regards,
Mohammad Tariq
On Wed, Nov 28, 2012 at 1:20
Is there anyone who can answer my question?
Thansk,
Yin
On Fri, Nov 16, 2012 at 4:15 PM, Yin Huai wrote:
> Hi All,
>
> I have a question about the method read in CheckSumFSInputChecker. Can you
> let me know why the file will be opened and closed for every read operation?
>
> thanks,
>
> Yin
>
Thanks Harsh. I totally forgot about the locality thing.
I take it, for the best perfomance it is better leave the split size property
alone and let the framework handle the splits on the basis of the block size.
p.s. There were meant to be only 5 questions.
Rgds,
AK47
-Original Message--
Hi,
Response inline.
On Tue, Nov 27, 2012 at 8:35 PM, Kartashov, Andy wrote:
> Guys,
>
> I understand that if not specified, default block size of HDFs is 64Mb. You
> can control this value by altering dfs.block.size property and increasing to
> value to 64Mb x 2 or 64Mb x 4.. Every time we make
Hi,
Thanks, the self-referencing subworkflow is a good idea, it never occured to me.
However, I'm still expecting something that is more light-weight, with no Oozie
or external tools.
My best idea now is simply abstracting the exec call in my script that submits
the job (hadoop jar hadoop-strea
> Using Oozie seems to be an overkilling for this application, besides, it
doesn't support "loops"
> so the recusrsion can't really be implemented.
Correct, Oozie does not support loops, this is a restriction by design
(early prototypes supported loops). The idea was that you didn't want never
end
Hi AC,
Do you have the following property defined in your hdfs-site.xml
dfs.secondary.namenode.kerberos.internal.spnego.principal
HTTP/_HOST@REALM
and this principal needs to be available in your /etc/hadoop/hadoop.keytab.
From the logs it looks like you only have the following configured
"d
I have never experienced this problem yet so do not know a straight answer. But
this is what I would have looked at.
Are you running Hadoop in Psedo or Fully-distributed? What is your topology,
where are you running SNN?
- Is the SNN daemon running? (# jps)
- If it is on a se
Jon:
This is interesting and helpful! How did you figure out the cause? And how
much time did you spend? Could you share some experience of performance
diagnosis?
Jie
On Tuesday, November 27, 2012, Harsh J wrote:
> Hi Amit,
>
> The default scheduler is FIFO, and may not work for all forms of
>
Super! I did miss the fact that Job class indeed inherits from JobContext. This
clarifies my issue. Thanks Dave and Harsh.
From: Dave Beech [mailto:d...@paraliatech.com]
Sent: Tuesday, November 27, 2012 9:51 AM
To: user@hadoop.apache.org
Subject: Re: MapReduce APIs
AK - look again at that javado
Guys,
I understand that if not specified, default block size of HDFs is 64Mb. You can
control this value by altering dfs.block.size property and increasing to value
to 64Mb x 2 or 64Mb x 4.. Every time we make the change to this property we
must reimport the data for the changes to take effect
AK - look again at that javadoc. Job does a have getConfiguration() method.
You may have missed it the first time because it's inherited from a parent
class, JobContext.
On 27 November 2012 14:23, Kartashov, Andy wrote:
> Thank man for the response. Much appreciated.
>
>
>
> Why? Because Job o
The common problem users have are adding values to the classpath AFTER the
daemons had been started. If you are getting ClassNotFound exception and are
100% sure you have correctly specified path to the jar files and your jar files
actually have the compiled class then simply restart the daemons
Thank man for the response. Much appreciated.
Why? Because Job object doesn't have the below method getConfiguration(). See
for yourself under mapreduce.Job:
http://hadoop.apache.org/docs/r0.20.2/api/index.html or
http://hadoop.apache.org/docs/current/api/index.html
So, back to my original ques
Hi,
Please help!
I tried to start SecondaryNameNode in secure mode by the command:
{$HADOOP_HOME}bin/hadoop-daemon.sh start secondarynamenode
1) from the log, I saw "Login successful"
/
2012-11-27 22:05:23,120 INFO
or
Hi Bharath,
You may need to copy the jars across all the nodes in your
Hadoop cluster.
Or another best way to handle this is package all the required
jars with your application.jar which you pass to your Hadoop commandline.
This will take care of caching it to a
Hi Hui,
JobID constructor is not a public constructor it has
default visibility so you have to create the instance within the same
package.
Usually you cannot create a JobID rather you can get one
from the JOB instance by invoking getJobID().
If this
Hi Bharath,
yes i have added all those jars.
Thanks,
dti
On Tue, Nov 27, 2012 at 6:35 PM, bharath vissapragada <
bharathvissapragada1...@gmail.com> wrote:
> Hi,
>
> Did you add JDOM jar [1] to your class path ?
>
> [1] http://www.jdom.org/downloads/index.html
>
> Thanks,
>
>
> On Tue, Nov 27, 20
Hi,
Did you add JDOM jar [1] to your class path ?
[1] http://www.jdom.org/downloads/index.html
Thanks,
On Tue, Nov 27, 2012 at 6:30 PM, dyuti a wrote:
> Hi All,
> am working on XML processing in hadoop , followed the steps from the blog
> http://xmlandhadoop.blogspot.in/.I have added all jar
I call the sentence "JobID id = new JobID()" of hadoop API with JNI. But when
my program run to this sentence, it exits. And no errors output. I don't make
any sense of this.
The hadoop is hadoop-core-1.0.3.jar.
The Java sdk is jdk1.6.0-34.
Any help will be appreciated.
-GHui
Hi everyone,
Thanks in advance for the support. My problem is the following:
I'm trying to develop a fairly complex MapReduce application using the
streaming API (for demonstation purposes, so unfortunately the "use Java"
answer doesn't work :-( ). I can get one single MapReduce phase running f
Thanks very much. It worked.
-GHui
-- Original --
From: "Harsh J";
Date: Mon, Nov 12, 2012 01:32 PM
To: "user";
Subject: Re: To get JobID
You can get "TaskCompletionEvent" objects and extract the task tracker
URLs out of it.
Use the array of TaskCom
Hi Amit,
The default scheduler is FIFO, and may not work for all forms of
workloads. Read the multiple schedulers available to see if they have
features that may benefit your workload:
Capacity Scheduler: http://hadoop.apache.org/docs/stable/capacity_scheduler.html
FairScheduler: http://h
Yes, it's a fair scheduler thing - I don't think it's right to call it a
problem, it's just a change in behaviour that came as a surprise.
My reason for using the fair scheduler is because we have multiple users
running simultaneous jobs and the default scheduler doesn't allow this. My
choice of
Amit,
As Harsh says, I was referring to the
mapred.fairscheduler.assignmultiple config
property. This only shows as an issue if you're running significantly
fewer reduce tasks than available slots. We ran the terasort as a simple
smoke test of the upgrade rather than as a useful performance meas
So this is a FairScheduler problem ?
We are using the default Hadoop scheduler. Is there a reason to use the
Fair Scheduler if most of the time we don't have more than 4 jobs
running simultaneously ?
On Tue, Nov 27, 2012 at 12:00 PM, Harsh J wrote:
> Hi Amit,
>
> He means the mapred.fairschedule
Hi Amit,
He means the mapred.fairscheduler.assignmultiple FairScheduler
property. It is true by default, which works well for most workloads
if not benchmark style workloads. I would not usually trust that as a
base perf. measure of everything that comes out of an upgrade.
The other JIRA, MAPREDU
Hi Jon,
I recently upgraded our cluster from Hadoop 0.20.3-append to Hadoop 1.0.4
and I haven't noticed any performance issues. By "multiple assignment
feature" do you mean speculative execution
(mapred.map.tasks.speculative.execution
and mapred.reduce.tasks.speculative.execution) ?
On Mon, Nov
41 matches
Mail list logo