FW: NullPointer during startup debugging DN

2012-02-29 Thread Evert Lammerts
Cross-posting with common-user, since there's only little activity on hdfs-user these last days. Evert Hi list, I'm having trouble starting up a DN (0.20.2) with Kerberos authentication and SSL enabled - I'm getting a NullPointerException during startup and the daemon exists. It's a bit

Re: Should splittable Gzip be a core hadoop feature?

2012-02-29 Thread Michel Segel
Let's play devil's advocate for a second? Why? Snappy exists. The only advantage is that you don't have to convert from gzip to snappy and can process gzip files natively. Next question is how large are the gzip files in the first place? I don't disagree, I just want to have a solid argument

Hadoop fair scheduler doubt: allocate jobs to pool

2012-02-29 Thread Austin Chungath
How can I set the fair scheduler such that all jobs submitted from a particular user group go to a pool with the group name? I have setup fair scheduler and I have two users: A and B (belonging to the user group hadoop) When these users submit hadoop jobs, the jobs from A got to a pool named A

TaskTracker without datanode

2012-02-29 Thread Daniel Baptista
Hi All, I was wondering (network traffic considerations aside) is it possible to run a TaskTracker without a DataNode. I was hoping to test this method as a means of scaling processing power temporarily. Are there better approaches, I don't (currently) need the additional storage that a

RE: TaskTracker without datanode

2012-02-29 Thread Daniel Baptista
Forgot to mention that I am using Hadoop 0.20.2 From: Daniel Baptista Sent: 29 February 2012 14:44 To: common-user@hadoop.apache.org Subject: TaskTracker without datanode Hi All, I was wondering (network traffic considerations aside) is it possible to run a TaskTracker without a DataNode. I

Re: TaskTracker without datanode

2012-02-29 Thread Harsh J
Yes this is fine to do. TTs are not dependent on co-located DNs, but only benefit if they are. On Wed, Feb 29, 2012 at 8:14 PM, Daniel Baptista daniel.bapti...@performgroup.com wrote: Forgot to mention that I am using Hadoop 0.20.2 From: Daniel Baptista Sent: 29 February 2012 14:44 To:

Re: Should splittable Gzip be a core hadoop feature?

2012-02-29 Thread Edward Capriolo
Mike, Snappy is cool and all, but I was not overly impressed with it. GZ zipps much better then Snappy. Last time I checked for our log file gzip took them down from 100MB- 40MB, while snappy compressed them from 100MB-55MB. That was only with sequence files. But still that is pretty significant

Re: Should splittable Gzip be a core hadoop feature?

2012-02-29 Thread Niels Basjes
Hi, On Wed, Feb 29, 2012 at 13:10, Michel Segel michael_se...@hotmail.comwrote: Let's play devil's advocate for a second? I always like that :) Why? Because then datafiles from other systems (like the Apache HTTP webserver) can be processed without preprocessing more efficiently. Snappy

Re: Should splittable Gzip be a core hadoop feature?

2012-02-29 Thread Niels Basjes
Hi, On Wed, Feb 29, 2012 at 16:52, Edward Capriolo edlinuxg...@gmail.comwrote: ... But being able to generate split info for them and processing them would be good as well. I remember that was a hot thing to do with lzo back in the day. The pain of once overing the gz files to generate the

Re: Should splittable Gzip be a core hadoop feature?

2012-02-29 Thread Robert Evans
I can see a use for it, but I have two concerns about it. My biggest concern is maintainability. We have had lots of things get thrown into contrib in the past, very few people use them, and inevitably they start to suffer from bit rot. I am not saying that it will happen with this, but if

Re: 100x slower mapreduce compared to pig

2012-02-29 Thread Mohit Anchlia
I am going to try few things today. I have a JAXBContext object that marshals the xml, this is static instance but my guess at this point is that since this is in separate jar then the one where job runs and I used DistributeCache.addClassPath this context is being created on every call for some

Re: Browse the filesystem weblink broken after upgrade to 1.0.0: HTTP 404 Problem accessing /browseDirectory.jsp

2012-02-29 Thread W.P. McNeill
I can do perform HDFS operations from the command line like hadoop fs -ls /. Doesn't that meant that the datanode is up?

Re: Should splittable Gzip be a core hadoop feature?

2012-02-29 Thread Robert Evans
If many people are going to use it then by all means put it in. If there is only one person, or a very small handful of people that are going to use it then I personally would prefer to see it a separate project. However, Edward, you have convinced me that I am trying to make a logical

Streaming Hadoop using C

2012-02-29 Thread Mark question
Hi guys, thought I should ask this before I use it ... will using C over Hadoop give me the usual C memory management? For example, malloc() , sizeof() ? My guess is no since this all will eventually be turned into bytecode, but I need more control on memory which obviously is hard for me to do

Re: Streaming Hadoop using C

2012-02-29 Thread Charles Earl
Mark, Both streaming and pipes allow this, perhaps more so pipes at the level of the mapreduce task. Can you provide more details on the application? On Feb 29, 2012, at 1:56 PM, Mark question wrote: Hi guys, thought I should ask this before I use it ... will using C over Hadoop give me the

Re: Streaming Hadoop using C

2012-02-29 Thread Mark question
Thanks Charles .. I'm running Hadoop for research to perform duplicate detection methods. To go deeper, I need to understand what's slowing my program, which usually starts with analyzing memory to predict best input size for map task. So you're saying piping can help me control memory even though

Re: Streaming Hadoop using C

2012-02-29 Thread Charles Earl
Mark, So if I understand, it is more the memory management that you are interested in, rather than a need to run an existing C or C++ application in MapReduce platform? Have you done profiling of the application? C On Feb 29, 2012, at 2:19 PM, Mark question wrote: Thanks Charles .. I'm running

Re: Streaming Hadoop using C

2012-02-29 Thread Mark question
I've used hadoop profiling (.prof) to show the stack trace but it was hard to follow. jConsole locally since I couldn't find a way to set a port number to child processes when running them remotely. Linux commands (top,/proc), showed me that the virtual memory is almost twice as my physical which

Re: Streaming Hadoop using C

2012-02-29 Thread Charles Earl
The documentation on Starfish http://www.cs.duke.edu/starfish/index.html looks promising , I have not used it. I wonder if others on the list have found it more useful than setting mapred.task.profile. C On Feb 29, 2012, at 3:53 PM, Mark question wrote: I've used hadoop profiling (.prof) to

Re: Streaming Hadoop using C

2012-02-29 Thread Charles Earl
I assume you have also just tried running locally and using the jdk performance tools (e.g. jmap) to gain insight by configuring hadoop to run absolute minimum number of tasks? Perhaps the discussion

Re: 100x slower mapreduce compared to pig

2012-02-29 Thread Mohit Anchlia
I think I've found the problem. There was one line of code that caused this issue :) that was output.collect(key, value); I had to add more logging to the code to get to it. For some reason kill -QUIT didn't send the stacktrace to the userLogs/job/attempt/syslog , I searched all the logs and

Re: Invocation exception

2012-02-29 Thread Mohit Anchlia
Thanks for the example. I did look at the logs and also at the admin page and all I see is the exception that I posted initially. I am not sure why adding an extra jar to the classpath in DistributedCache causes that exception. I tried to look at Configuration code in hadoop.util package but it

Re: Invocation exception

2012-02-29 Thread Harsh J
Mohit, I'm positive the real exception lies a few scrolls below that message on the attempt page. Possibly a class not found issue. The message you see on top is when something throws up an exception while being configure()-ed. It is most likely a job config or setup-time issue from your code or

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2012-02-29 Thread Merto Mertek
Varun sorry for my late response. Today I have deployed a new version and I can confirm that patches you provided works well. I' ve been running some jobs on a 5node cluster for an hour without a core on full load so now thinks works as expected. Thank you again! I have used just your first

Re: Streaming Hadoop using C

2012-02-29 Thread Mark question
Thank you for your time and suggestions, I've already tried starfish, but not jmap. I'll check it out. Thanks again, Mark On Wed, Feb 29, 2012 at 1:17 PM, Charles Earl charles.ce...@gmail.comwrote: I assume you have also just tried running locally and using the jdk performance tools (e.g.