Cross-posting with common-user, since there's only little activity on hdfs-user
these last days.
Evert
Hi list,
I'm having trouble starting up a DN (0.20.2) with Kerberos
authentication and SSL enabled - I'm getting a NullPointerException
during startup and the daemon exists. It's a bit
Let's play devil's advocate for a second?
Why? Snappy exists.
The only advantage is that you don't have to convert from gzip to snappy and
can process gzip files natively.
Next question is how large are the gzip files in the first place?
I don't disagree, I just want to have a solid argument
How can I set the fair scheduler such that all jobs submitted from a
particular user group go to a pool with the group name?
I have setup fair scheduler and I have two users: A and B (belonging to the
user group hadoop)
When these users submit hadoop jobs, the jobs from A got to a pool named A
Hi All,
I was wondering (network traffic considerations aside) is it possible to run a
TaskTracker without a DataNode. I was hoping to test this method as a means of
scaling processing power temporarily.
Are there better approaches, I don't (currently) need the additional storage
that a
Forgot to mention that I am using Hadoop 0.20.2
From: Daniel Baptista
Sent: 29 February 2012 14:44
To: common-user@hadoop.apache.org
Subject: TaskTracker without datanode
Hi All,
I was wondering (network traffic considerations aside) is it possible to run a
TaskTracker without a DataNode. I
Yes this is fine to do. TTs are not dependent on co-located DNs, but
only benefit if they are.
On Wed, Feb 29, 2012 at 8:14 PM, Daniel Baptista
daniel.bapti...@performgroup.com wrote:
Forgot to mention that I am using Hadoop 0.20.2
From: Daniel Baptista
Sent: 29 February 2012 14:44
To:
Mike,
Snappy is cool and all, but I was not overly impressed with it.
GZ zipps much better then Snappy. Last time I checked for our log file
gzip took them down from 100MB- 40MB, while snappy compressed them
from 100MB-55MB. That was only with sequence files. But still that is
pretty significant
Hi,
On Wed, Feb 29, 2012 at 13:10, Michel Segel michael_se...@hotmail.comwrote:
Let's play devil's advocate for a second?
I always like that :)
Why?
Because then datafiles from other systems (like the Apache HTTP webserver)
can be processed without preprocessing more efficiently.
Snappy
Hi,
On Wed, Feb 29, 2012 at 16:52, Edward Capriolo edlinuxg...@gmail.comwrote:
...
But being able to generate split info for them and processing them
would be good as well. I remember that was a hot thing to do with lzo
back in the day. The pain of once overing the gz files to generate the
I can see a use for it, but I have two concerns about it. My biggest concern
is maintainability. We have had lots of things get thrown into contrib in the
past, very few people use them, and inevitably they start to suffer from bit
rot. I am not saying that it will happen with this, but if
I am going to try few things today. I have a JAXBContext object that
marshals the xml, this is static instance but my guess at this point is
that since this is in separate jar then the one where job runs and I used
DistributeCache.addClassPath this context is being created on every call
for some
I can do perform HDFS operations from the command line like hadoop fs -ls
/. Doesn't that meant that the datanode is up?
If many people are going to use it then by all means put it in. If there is
only one person, or a very small handful of people that are going to use it
then I personally would prefer to see it a separate project. However, Edward,
you have convinced me that I am trying to make a logical
Hi guys, thought I should ask this before I use it ... will using C over
Hadoop give me the usual C memory management? For example, malloc() ,
sizeof() ? My guess is no since this all will eventually be turned into
bytecode, but I need more control on memory which obviously is hard for me
to do
Mark,
Both streaming and pipes allow this, perhaps more so pipes at the level of the
mapreduce task. Can you provide more details on the application?
On Feb 29, 2012, at 1:56 PM, Mark question wrote:
Hi guys, thought I should ask this before I use it ... will using C over
Hadoop give me the
Thanks Charles .. I'm running Hadoop for research to perform duplicate
detection methods. To go deeper, I need to understand what's slowing my
program, which usually starts with analyzing memory to predict best input
size for map task. So you're saying piping can help me control memory even
though
Mark,
So if I understand, it is more the memory management that you are interested
in, rather than a need to run an existing C or C++ application in MapReduce
platform?
Have you done profiling of the application?
C
On Feb 29, 2012, at 2:19 PM, Mark question wrote:
Thanks Charles .. I'm running
I've used hadoop profiling (.prof) to show the stack trace but it was hard
to follow. jConsole locally since I couldn't find a way to set a port
number to child processes when running them remotely. Linux commands
(top,/proc), showed me that the virtual memory is almost twice as my
physical which
The documentation on Starfish http://www.cs.duke.edu/starfish/index.html
looks promising , I have not used it. I wonder if others on the list have found
it more useful than setting mapred.task.profile.
C
On Feb 29, 2012, at 3:53 PM, Mark question wrote:
I've used hadoop profiling (.prof) to
I assume you have also just tried running locally and using the jdk performance
tools (e.g. jmap) to gain insight by configuring hadoop to run absolute minimum
number of tasks?
Perhaps the discussion
I think I've found the problem. There was one line of code that caused this
issue :) that was output.collect(key, value);
I had to add more logging to the code to get to it. For some reason kill
-QUIT didn't send the stacktrace to the userLogs/job/attempt/syslog , I
searched all the logs and
Thanks for the example. I did look at the logs and also at the admin page
and all I see is the exception that I posted initially.
I am not sure why adding an extra jar to the classpath in DistributedCache
causes that exception. I tried to look at Configuration code in hadoop.util
package but it
Mohit,
I'm positive the real exception lies a few scrolls below that message
on the attempt page. Possibly a class not found issue.
The message you see on top is when something throws up an exception
while being configure()-ed. It is most likely a job config or
setup-time issue from your code or
Varun sorry for my late response. Today I have deployed a new version and I
can confirm that patches you provided works well. I' ve been running some
jobs on a 5node cluster for an hour without a core on full load so now
thinks works as expected.
Thank you again!
I have used just your first
Thank you for your time and suggestions, I've already tried starfish, but
not jmap. I'll check it out.
Thanks again,
Mark
On Wed, Feb 29, 2012 at 1:17 PM, Charles Earl charles.ce...@gmail.comwrote:
I assume you have also just tried running locally and using the jdk
performance tools (e.g.
25 matches
Mail list logo