Re: Modeling WordCount in a different way

2009-04-07 Thread Norbert Burger
Aayush, out of curiosity, why do you want model wordcount this way? What benefit do you see? Norbert On 4/6/09, Aayush Garg aayush.g...@gmail.com wrote: Hi, I want to make experiments with wordcount example in a different way. Suppose we have very large data. Instead of splitting all the

Re: Using HDFS to serve www requests

2009-03-26 Thread Norbert Burger
Have you looked into MogileFS already? Seems like a good fit, based on your description. This question has come up more than once here, and MogileFS is an oft-recommended solution. Norbert On 3/26/09, phil cryer p...@cryer.us wrote: When you say that you have huge images, how big is huge?

Re: Probelms getting Eclipse Hadoop plugin to work.

2009-02-19 Thread Norbert Burger
What platform are you running Eclipse on? If Windows, see this thread regarding Cygwin: http://www.mail-archive.com/core-user@hadoop.apache.org/msg07669.html For my case, I've never had to touch any of the plugin's advanced parameters. Usually, setting just the Map/Reduce Master and DFS Master

Re: setting up networking and ssh on multnode cluster...

2009-02-16 Thread Norbert Burger
a...@node0:~$ the boxes are just connected with a cat5 cable. i have not done this with the hadoop account but af is my normal account and i figure it should work too. /etc/init.d/interfaces is empty/does not exist on the machines. (i am using ubuntu 8.10) please advise. Norbert Burger

Re: setting up networking and ssh on multnode cluster...

2009-02-15 Thread Norbert Burger
i have commented out the 192. addresses and changed 127.0.1.1 for node0 and 127.0.1.2 for node0 (in /etc/hosts). with this done i can ssh from one machine to itself and to the other but the prompt does not change when i ssh to the other machine. i don't know if there is a firewall preventing

Re: setting up networking and ssh on multnode cluster...

2009-02-15 Thread Norbert Burger
into the link that you gave. -zander Norbert Burger wrote: i have commented out the 192. addresses and changed 127.0.1.1 for node0 and 127.0.1.2 for node0 (in /etc/hosts). with this done i can ssh from one machine to itself and to the other but the prompt does not change when i ssh

Re: Namenode not listening for remote connections to port 9000

2009-02-13 Thread Norbert Burger
On Fri, Feb 13, 2009 at 8:37 AM, Steve Loughran ste...@apache.org wrote: Michael Lynch wrote: Hi, As far as I can tell I've followed the setup instructions for a hadoop cluster to the letter, but I find that the datanodes can't connect to the namenode on port 9000 because it is only

Re: Eclipse plugin

2009-02-12 Thread Norbert Burger
Are running Eclipse on Windows? If so, be aware that you need to spawn Eclipse from within Cygwin in order to access HDFS. It seems that the plugin uses whoami to get info about the active user. This thread has some more info:

Re: hadoop dfs -test question (with a bit o' Ruby)

2009-02-03 Thread Norbert Burger
I'm no Ruby programmer, but don't you need a call to system() instead of the backtick operator here? Appears that the backtick operator returns STDOUT instead of the return value: http://hans.fugal.net/blog/2007/11/03/backticks-2-0 Norbert On Tue, Feb 3, 2009 at 6:03 PM, S D

Re: To Compute or Not to Compute on Prod

2008-11-02 Thread Norbert Burger
need to be a datanode. If my production node is *not* a dataNode then how can I do hadoop dfs put? I was under the impression that when I install HDFS on a cluster each node in the cluster is a dataNode. Shahab On Fri, Oct 31, 2008 at 1:46 PM, Norbert Burger [EMAIL PROTECTED] wrote

Re: Debugging / Logging in Hadoop?

2008-10-30 Thread Norbert Burger
Seems that the slides for each of the 3 Rapleaf talks are posted in the descriptions: The Collector - A Tool to Have Multi-Writer Appends into HDFS http://docs.google.com/Present?docid=dgz78tv5_10gpjhnvg9 Katta - Distributed Lucene Index in Production

Re: How does an offline Datanode come back up ?

2008-10-28 Thread Norbert Burger
Along these lines, I'm curious what management tools folks are using to ensure cluster availability (ie., auto-restart failed datanodes/namenodes). Are you using a custom cron script, or maybe something more complex (Ganglia, Nagios, puppet, etc.)? Thanks, Norbert On 10/28/08, Steve Loughran

Re: Error when running job as a different user

2008-05-31 Thread Norbert Burger
I ran into this problem also. From your logs, it seems like you haven't set mapred.system.dir to be a fixed variable: http://wiki.apache.org/hadoop/FAQ#14. The impact is that your job control files are written from your submit machine into the HDFS at /tmp/hadoop-user2/mapred/system, while your

Re: User accounts in Master and Slaves

2008-04-23 Thread Norbert Burger
Yes, this is the suggested configuration. Hadoop relies on password-less SSH to be able to start tasks on slave machines. You can find instructions on creating/transferring the SSH keys here: http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Multi-Node_Cluster%29 On Wed, Apr

Re: Urgent

2008-04-17 Thread Norbert Burger
-42.compute-1.amazonaws.com. Please set up DNS so localhost points to . Then it asks for en enter key. The java exception was not coming earlier. U mean I should set prerna.dyndns.org to 75.101.217.228? Thanks Prerna On Wed, Apr 16, 2008 at 8:27 AM, Norbert Burger [EMAIL PROTECTED] wrote

Re: Urgent

2008-04-17 Thread Norbert Burger
? Thanks Prerna On Wed, Apr 16, 2008 at 8:27 AM, Norbert Burger [EMAIL PROTECTED] wrote: There is no need to maintain a server and client Cygwin session on a local machine. In the typical Hadoop-on-EC2 setup, all of your nodes are EC2 hosts, spawned dynamically

Re: Urgent

2008-04-17 Thread Norbert Burger
host shouldnt some key be generated on dyndns? since I am not able to do ssh to that host On Thu, Apr 17, 2008 at 12:17 PM, Norbert Burger [EMAIL PROTECTED] wrote: You need to create a DynDNS account and then add host records to this account. On Thu, Apr 17, 2008 at 12:03 PM, Prerna

Re: Urgent

2008-04-16 Thread Norbert Burger
and hence I cant run bin/hadoop. From here I do not know how to proceed? I basically want to implement http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873. Hence I created a host using dyndns. If you can help me,it will be great. On Tue, Apr 15, 2008 at 2:15 PM, Norbert

Re: Urgent

2008-04-15 Thread Norbert Burger
, Apr 15, 2008 at 2:15 PM, Norbert Burger [EMAIL PROTECTED] wrote: Are you trying to run Hadoop on a local cluster, or in the EC2 environment? If EC2, then your MASTER_HOST setting is wrong, becuase it points to a residential ISP (*.rr.com). It should instead point to your jobtracker

Re: incorrect data check

2008-04-08 Thread Norbert Burger
Colin, how about writing a streaming mapper which simply runs md5sum on each file it gets as input? Run this task along with the identity reducer, and you should be able to identify pretty quickly if there's HDFS corruption issue. Norbert On Tue, Apr 8, 2008 at 5:50 PM, Colin Freas [EMAIL

Re: Hadoop streaming cacheArchive

2008-03-20 Thread Norbert Burger
, Amareshwari Sriramadasu [EMAIL PROTECTED] wrote: Norbert Burger wrote: I'm trying to use the cacheArchive command-line options with the hadoop-0.15.3-streaming.jar. I'm using the option as follows: -cacheArchive hdfs://host:50001/user/root/lib.jar#lib Unfortunately, my PERL scripts