Programatically initializing and starting HDFS cluster

2008-06-12 Thread Robert Krüger
Hi, for our developers I would like to write a few lines of Java code that, given a base directory, sets up an HDFS filesystem, initializes it, if it is not there yet and then starts the service(s) in process. This is to run on each developer's machine, probably within a tomcat instance. I

join in hadoop

2008-06-12 Thread Vibhooti Verma
Hi All, has anyone tried joining two datasets on some common key? Let me know. -- cheers, Vibhooti

Re: join in hadoop

2008-06-12 Thread Yuri Kudryavcev
This link: http://is.gd/vH3 helps? On 6/12/08, Vibhooti Verma [EMAIL PROTECTED] wrote: Hi All, has anyone tried joining two datasets on some common key? Let me know. -- cheers, Vibhooti

dfs put fails

2008-06-12 Thread H3llRid0r
hi, i'm new in hadoop and im just testing it at the moment. i set up a cluster with 2 nodes and it seems like they are running normally, the log files of the namenode and the datanodes dont show errors. Firewall should be set right. but when i try to upload a file to the dfs i get following

Issue loading a native library through the DistributedCache

2008-06-12 Thread montag
Hi, I'm a new Hadoop user, so if this question is blatantly obvious, I apologize. I'm trying to load a native shared library using the DistributedCache as outlined in https://issues.apache.org/jira/browse/HADOOP-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Re: Getting No job jar file set. User classes may not be found. error when running a map-reduce job in hadoop-0.17

2008-06-12 Thread montag
I've had this error as well when running from the command prompt. I've found that by passing the location of the jar file that contains your classes to conf.setJar() when setting the job configuration parameters fixes it. Basically the code would look something like this: String

Re: client connect as different username?

2008-06-12 Thread Chris Collins
Thanks Nicolas, I read it yet again (ok, only the third time). Yes it talks of whoami, I actually knew that from single stepping the client too, I was still stuck. It mentioned the posix model, kinda guessed that also from the javadocs. From Dougs note it clearly states that No foo

Re: Issue loading a native library through the DistributedCache

2008-06-12 Thread Arun C Murthy
On Jun 12, 2008, at 6:47 AM, montag wrote: Hi, I'm a new Hadoop user, so if this question is blatantly obvious, I apologize. I'm trying to load a native shared library using the DistributedCache as outlined in https://issues.apache.org/jira/browse/HADOOP-1660?

Re: Issue loading a native library through the DistributedCache

2008-06-12 Thread montag
Hi Arun, Thanks for your reply! Yes, I'm trying to load a JNI-based library. I tried what you suggested, but I'm still receiving an UnsatisfiedLink exception I noticed that if I use System.loadLibrary(lib.so) I get the following error: java.lang.UnsatisfiedLinkError: no lib.so in

Re: Question about Hadoop

2008-06-12 Thread lohit
Ideally what you would want is your data to be on HDFS and run your map/reduce jobs on that data. Hadoop framework splits you data and feeds in those splits to each map or reduce task. One problem with Image files is that you will not be able to split them. Alternatively people have done this,

Re: client connect as different username?

2008-06-12 Thread Doug Cutting
Chris Collins wrote: For instance, that all it requires is for me to create the ability for say a mac user with login of bob to access things under /bob is for me to go in as the super user and do something like: hadoop dfs -mkdir /bob hadoop dfs -chown bob /bob where bob literally doesnt

Re: Issue loading a native library through the DistributedCache

2008-06-12 Thread Chang Hu
Out of curiosity: what happens if the slave nodes are running a different os, or just missing the libraries the native library needs? Is this why using native libraries in Hadoop is not recommended? - Chang On Thu, Jun 12, 2008 at 10:51 AM, montag [EMAIL PROTECTED] wrote: Nevermind! The

What did I do wrong? (Too many fetch-failures)

2008-06-12 Thread Rob Collins
In a previous life, I had no problems setting up a small cluster. Now I have managed to mess it up. I see reports of similar symptoms in this mailing list, but I cannot see any solutions to the problem. I am working with 0.16.4. Any suggestions you can provide would be appreciated. Thanks, Rob

Hadoop presentations at the next Freebase user group meeting

2008-06-12 Thread Colin Evans
Hi everyone! We'll be having two talks on Hadoop usage at an upcoming Freebase user group meeting on June 17th at 6:30pm. I'll be talking about how we're using Hadoop with Jython to process Wikipedia and the Freebase data graph, and Justin Bonnar will be talking about using Hadoop to

Re: Question about Hadoop

2008-06-12 Thread Ted Dunning
Once it is in HDFS, you already have backups (due to the replicated file system). Your problems with deleting the dfs data directory are likely configuration problems combined with versioning of the data store (done to avoid confusion, but usually causes confusion). Once you get the

Re: Question about Hadoop

2008-06-12 Thread Chanchal James
Thank you all for the responses. So in order to run a web-based application, I just need to put the part of the application that needs to make use of distributed computation in HDFS, and have the other web site related files access it via Hadoop streaming ? Is that how Hadoop is used ? Sorry

Re: Map Task timed out?

2008-06-12 Thread lohit
Yes, there is a timeout defined by mapred.task.timeout default was 600 seconds. And here silent means the task (either map or reduce ) has not reported any status using the reporter you get with map/reduce function Thanks, Lohit - Original Message From: Edward J. Yoon [EMAIL PROTECTED]