Mark N wrote:
I want to show the status of M/R jobs on user interface , should i read the
default hadoop counters to display some kind of
map/ reduce tasks?
I could read the status of map/reduce task using Jobclient ( hadoop
default counters ) . I can then have a java websevice exposing
Johannens Zillmann wrote:
Hi there,
i directed Yair to this list because the exception make me think i could
be a problem of using hadoop ipc versus using hadoop ipc in a servlet
container like tomcat. Thought maybe a problem with static variables.
To explain, katta uses plain hadoop ipc for
Johannens Zillmann wrote:
Hi Steve,
in the meantime Yair posted logs with hadoop debug log level.
--
09/11/17 01:31:59 DEBUG ipc.Client: IPC Client (47) connection to
qa-hadoop005.ascitest.net/10.12.2.205:2 from root: starting, having
connections 2
09/11/17 01:31:59
I see that the mapred.local.dir is served up round robin, as with the
dfs.data.dir values. But there's no awareness of the possibility that
the same disk partition is used for mapred local data and for datanode
blocks.
What do people do here?
* keep their fingers crossed that if the MR job
kvorion wrote:
Hi All,
I have been trying to set up a hadoop cluster on a number of machines, a few
of which are multicore machines. I have been wondering whether the hadoop
pseudo distribution is something that can help me take advantage of the
multiple cores on my machines. All the tutorials
Tom Wheeler wrote:
Based on what I've seen on the list, larger installations tend to use
RedHat Enterprise Linux or one of its clones like CentOS.
One other thing to add is that a large cluster is not the place to learn
linux or solaris or whatever -it helps to have a working knowledge of
shwitzu wrote:
Thanks for Responding,
I read about HDFS and understood how it works and I also installed hadoop in
my windows using cygwin and tried a sample driver code and made sure it
works.
But my concern is, given the problem statement how should I proceed
Could you please give me some
Allen Wittenauer wrote:
A bit more specific:
At Yahoo!, we had either every server as a DNS slave or a DNS caching
server.
In the case of LinkedIn, we're running Solaris so nscd is significantly
better than its Linux counterpart. However, we still seem to be blowing out
the cache too much.
Edward Capriolo wrote:
I know there is a Jira open to add life cycle methods to each hadoop
component that can be polled for progress. I dont know the # off hand.
HDFS-326 https://issues.apache.org/jira/browse/HDFS-326 the code has its
own branch.
This is still something I'm working on,
Brian Bockelman wrote:
Hey Alex,
In order to lower cost, you'll probably want to order the worker nodes
without hard drives then buy them separately. HDFS provides a
software-level RAID, so most of the reasonings behind buying hard drives
from Dell/HP are irrelevant - you are just paying an
Smith Stan wrote:
Hey Cloudera genius guys .
Sorry, not cloudera. I speak for myself.
I read this
Via Cloudera, Hadoop is currently used by most of the giants in the
space including Google, Yahoo, Facebook (we wrote about Facebookâs use
of Cloudera here), Amazon, AOL, Baidu and more.
I
Isabel Drost wrote:
On Mon, 05 Oct 2009 10:28:58 +0100
Steve Loughran ste...@apache.org wrote:
2. Even LGPL and GPL say no need to contribute back if you dont
distribute the code
Sorry in advance about the nitpicking: IANAL - but AFAIK even LGPL and
GPL do not force you to contribute back
Stas Oskin wrote:
Hi.
Could you share the way in which it didn't quite work? Would be valuable
information for the community.
The idea is to have a Xen machine dedicated to NN, and maybe to SNN, which
would be running over DRBD, as described here:
http://www.drbd.org/users-guide/ch-xen.html
Stas Oskin wrote:
Hi.
The HA service (heartbeat) is running on Dom0, and when the primary
node is down, it basically just starts the VM on the other node. So
there not supposed to be any time issues.
Can you explain a bit more about your approach, how to automate it for example?
* You need
Kevin Sweeney wrote:
I really appreciate everyone's input. We've been going back and forth on the
server size issue here. There are a few reasons we shot for the $1k price,
one because we wanted to be able to compare our datacenter costs vs. the
cloud costs. Another is that we have spec'd out a
Ryan Smith wrote:
I have a question that i feel i should ask on this thread. Lets say you
want to build a cluster where you will be doing very little map/reduce,
storage and replication of data only on hdfs. What would the hardware
requirements be? No quad core? less ram?
Servers with more
Todd Lipcon wrote:
Yep, this is a common problem. The fix that Brian outlined helps a lot, but
if you are *really* strapped for random bits, you'll still block. This is
because even if you've set the random source, it still uses the real
/dev/random to grab a seed for the prng, at least on my
Brian Bockelman wrote:
On Sep 30, 2009, at 4:24 AM, Steve Loughran wrote:
Todd Lipcon wrote:
Yep, this is a common problem. The fix that Brian outlined helps a
lot, but
if you are *really* strapped for random bits, you'll still block.
This is
because even if you've set the random source
Paul Smith wrote:
On 25/09/2009, at 3:57 PM, Allen Wittenauer wrote:
On 9/24/09 7:38 PM, Paul Smith psm...@aconex.com wrote:
I think this could be one of these If we build it, they will come
issues. most of the Hadoop committers are working in large scale
homogenous environments (lucky
Paul Smith wrote:
On 25/09/2009, at 8:55 PM, Steve Loughran wrote:
I'd love to see more direct Log4J/Hadoop integration, such as a
standardised log4j-in-hadoop format that was easily readable, included
stack traces on exceptions, etc, and came with some sample mapreducer
or pig scripts
Brian Bockelman wrote:
;) Unfortunately, I'm going to go out on a limb and guess that we don't
want to add OpenGL to the dependency list for the namenode... The viz
application actually doesn't depend on the namenode, it uses the datanodes.
Here's the source:
Oliver Senn wrote:
Hi,
Thanks for your answer.
I used these parameters. But they seem to limit only the number of
parallel maps and parallel reduces separately. They do not prevent the
scheduler from schedule one map and one reduce on the same task tracker
in parallel.
But that's the
Jeff Zhang wrote:
My cluster has running for several months.
Nice.
Is this a bug of hadoop? I think hadoop is supposed to run for long time.
I'm doing work in HDFS-326 on making it easier to start/stop the various
hadoop services; once the lifecycle stuff is in I'll worry more about
the
brien colwell wrote:
Our cygwin/windows nodes are picky about the machines they work on. On
some they are unreliable. On some they work perfectly.
We've had two main issues with cygwin nodes.
Hadoop resolves paths in strange ways, so for example /dir is
interpreted as c:/dir not
Edward Capriolo wrote:
On a somewhat related topic I was showing a co-worker a Hadoop setup
and he asked stated, What if we got a bunch of laptops on the
internet like the playstation 'Folding @ Home' of course these are
widely different distributed models.
I have been thinking about this.
Chris Dyer wrote:
my task logs I see the message:
attempt to override final parameter: mapred.child.ulimit; Ignoring.
which doesn't exactly inspire confidence that I'm on the right path.
Chances are the param has been marked final in the task tracker's running
config which will prevent you
Touretsky, Gregory wrote:
Hi,
Does anyone have an experience running HDFS cluster stretched over
high-latency WAN connections?
Any specific concerns/options/recommendations?
I'm trying to setup the HDFS cluster with the nodes located in the US, Israel
and India - considering it as a
Arvind Sharma wrote:
hmmm... I had seen some exceptions (don't remember which one) on MacOS. There was missing JSR-223 engine on my machine.
Not sure why on Linux distribution you would see this error
From: Ted Yu yuzhih...@gmail.com
To:
gcr44 wrote:
Thanks for the response.
I have already tried moving JobTracker to several different ports always
with the same result.
Chandraprakash Bhagtani wrote:
You can try running JobTracker on some other port. This port might me in
use.
--
Thanks Regards,
Chandra Prakash Bhagtani,
On
Ted Dunning wrote:
You would be entirely welcome in Mahout. Graph based algorithms are key
for lots of kinds of interesting learning and would be a fabulous thing to
have in a comprehensive substrate.
I personally would also be very interested in learning more about about what
sorts of things
ashish pareek wrote:
Hello Bharath,
Earlier even I faced the same problem. I think your are
accessing internet through proxy.So try using direct broadband connection.
Hope this will solve your problem.
or set Ant's proxy up
http://ant.apache.org/manual/proxy.html
Ashish
Raghu Angadi wrote:
Suresh had made an spreadsheet for memory consumption.. will check.
A large portion of NN memory is taken by references. I would expect
memory savings to be very substantial (same as going from 64bit to
32bit), could be on the order of 40%.
The last I heard from Sun was
brien colwell wrote:
Actually Ubuntu comes out of the box with an entry in the hosts file
(/etc/hosts) that maps the computer name to the loopback address. (btw
I'm not sure if this is specific to Ubuntu) The effect is that all name
lookups from the machine for itself resolve to 127.0.0.1.
Konstantin Shvachko wrote:
Steve,
There are other groups claimed they work on HA solution.
We had discussions about it not so long ago in this list.
Is it possible that your colleagues present their design?
As you point out the issue gets fairly complex fast,
particularly because of the
Edward Capriolo wrote:
while I completely agree with you about freebsd, that is not the point
I was driving at. Linux is the main target platform.you chose another
platform you have more work for yourself.if you have a problem like
the one I had, probably no one else has the same environment as
Konstantin Shvachko wrote:
And the only remaining step is to implement fail-over mechanism.
:)
Colleagues of mine work on HA stuff; I try and steer clear of it as it
gets complex fast. Test case: what happens when a network failure
splits the datacentre in two, you now have two clusters
John Clarke wrote:
Thanks for the reply. I considered that but I have a lot of threads in my
application and it's v handy to have log4j output the thread name with the
log message.
It's like the log4j.properties file in the conf/ directory is not being used
as any changes I make seem to have no
networkaddress.cache.ttl to to something low (like
60s), and then you should be able to bring up a node with the same name
but a different IPAddress. This is useful if you can't control the
IPAddr of a node, but you can at least change the DNS entry
2009/8/7 Steve Loughran ste...@apache.org
Stas Oskin
Sugandha Naolekar wrote:
I want to encrypt the data that would be placed in HDFS. So I will have to
use some kind of encryption algorithms, right?
Also, This encryption is to be done on data before placing it in HDFS. How
this can be done? Any special API's available in HADOOP for the above
Scott Carey wrote:
Well, the first thing to do in any performance bottleneck investigation is
to look at the machine hardware resource usage.
During your test, what is the CPU use and disk usage? What about network
utilization?
Top, vmstat, iostat, and some network usage monitoring would be
Saptarshi Guha wrote:
Hello,
Not sure if this has been asked or answered.
Suppose I have tasktrackers A1,A2,A3 each with 4 cores and 16GB ram.
mapred.tasktracker.map.tasks.maximum = 6
mapred.tasktracker.reduce.tasks.maximum = 4
Now suppose I have one more machine(X) with 8 cores and 32GB ram.
Ryan Smith wrote:
Todd, excellent info, thank you. I use Ganglia, I will set up nagios
though, good idea. Just one clarification on Question 1. What if I
actually lose all my master data dirs, and have no back up on the secondary
name node, are the data blocks on all the slaves lost in that
Boyu Zhang wrote:
Dear All,
I have a question in my mind about HDFS and I cannot find the answer from
the documents on the apache website. I have a cluster of 4 machines, one is
the namenode and the other 3 are datanodes. When I put 6 files, each 430 MB,
to HDFS, the 6 files are split into 42
Ryan Smith wrote:
but you dont want to be the one trying to write something just after your
production cluster lost its namenode data.
Steve,
I wasnt planning on trying to solve something like this in production. I
would assume everyone here is a professional and wouldn't even think of
Pallavi Palleti wrote:
Hi all,
I tried to trackdown to the place where I can add some conditions for not allowing
any remote user with username as hadoop(root user) (other than some specific
hostnames or ipaddresses). I could see the call path as FsShell -
DistributedFileSystem -DFSClient -
JQ Hadoop wrote:
I'm wondering where once can get the pagerank implementation for a try.
Thanks,
http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/extras/citerank/
works over the citeceer citation dataset
Todd Lipcon wrote:
On Sat, Jul 4, 2009 at 9:08 AM, David B. Ritch david.ri...@gmail.comwrote:
Thanks, Todd. Perhaps I was misinformed, or misunderstood. I'll make
sure I close files occasionally, but it's good to know that the only
real issue is with data recovery after losing a node.
Boyu Zhang wrote:
Dear all,
Is there any other virtual machines that I can use to provide a Hadoop
cluster over a physical cluster?
1. You can bring up Hadoop under VMWare, VirtualBox, Xen. There are
problems with Centos5.x/RHEL5 under VirtualBox (some clock issue
generates 100% load
201 - 248 of 248 matches
Mail list logo