[Streaming] How to pass arguments to a map/reduce script

2008-08-21 Thread Gopal Gandhi
I am using Hadoop streaming and I need to pass arguments to my map/reduce script. Because a map/reduce script is triggered by hadoop, like hadoop -file MAPPER -mapper $MAPPER -file REDUCER -reducer $REDUCER ... How can I pass arguments to MAPPER? I tried -cmdenv name=val , but it does not

Re: Know how many records remain?

2008-08-21 Thread Chris Dyer
Qin's question actually raises an issue-- it seems that using a close() call, which does not throw IOException and which does not provide the user with access to the OutputCollector object makes this important piece of functionality (from a client's perspective) hard to use. Does anyone feel

AlreadyBeingCreatedException during reduce

2008-08-21 Thread Barry Haddow
Hi I'm seeing repeated AlreadyBeingCreatedExceptions during the reduce phase of my job, which eventually causes the job to fail. Can anyone suggest what could be causing this exception? I have hadoop configured with just one slave, running two reduces simultaneously. regards Barry

HDFS Vs KFS

2008-08-21 Thread Wasim Bari
Hi, Can some expert differentiate or compare HDFS with KFS ? Apparently it looks like similar architecture with little difference and same objective. Thanks, Wasim

map input key values?

2008-08-21 Thread Deyaa Adranale
Hi, what I can guarantee about the value of the map keys input (using the TextInputFormat)? sometimes, it could be useful for some needs. for example: - if I want, in some cases, to apply map only on some percent on the data. If the input keys are indexes then I can ignore (do nothing) when

Re: Know how many records remain?

2008-08-21 Thread Qin Gao
I end up with using my own MapRunner, so that I can control the call to map function, and then calling close() is not necessary. However I think it is reasonable to have close() throw IOException, but providing OutputCollector may make the framework a little messy, my suggestion is stay with the

Re: HDFS Vs KFS

2008-08-21 Thread rae l
On Thu, Aug 21, 2008 at 9:44 PM, Wasim Bari [EMAIL PROTECTED] wrote: Hi, Can some expert differentiate or compare HDFS with KFS ? Apparently it looks like similar architecture with little difference and same objective. What's KFS? Which KFS? Here all ones know HDFS, but someone like me

Re: [Streaming] How to pass arguments to a map/reduce script

2008-08-21 Thread Rong-en Fan
On Thu, Aug 21, 2008 at 3:14 PM, Gopal Gandhi [EMAIL PROTECTED] wrote: I am using Hadoop streaming and I need to pass arguments to my map/reduce script. Because a map/reduce script is triggered by hadoop, like hadoop -file MAPPER -mapper $MAPPER -file REDUCER -reducer $REDUCER ... How

Re: Why is scaling HBase much simpler then scaling a relational db?

2008-08-21 Thread Mork0075
Thank you, but i still don't got it. I've read tons of websites and papers, but there's no clear und founded answer why use BigTable instead of relational databases. MySQL Cluster seams to offer the same scalabilty and level of abstraction, whithout switching to a non relational pardigm.

Re: Why is scaling HBase much simpler then scaling a relational db?

2008-08-21 Thread Fernando Padilla
I'm no expert, but maybe I can explain it the way I see it, maybe it will resonate with other newbies like me :) Sorry if it's long winded, or boring for those who already know all this. BigTable and Hadoop are inherently sharded and distributed. They are architected to store the data in

RE: Why is scaling HBase much simpler then scaling a relational db?

2008-08-21 Thread Jonathan Gray
A few very big differences... - HBase/BigTable don't have transactions in the same way that a relational database does. While it is possible (and was just recently implemented for HBase, see HBASE-669) it is not at the core of this design. A major bottleneck of distributed multi-master

Re: HDFS Vs KFS

2008-08-21 Thread Wasim Bari
KFS is also another Distributed file system implemented in C++. Here you can get details: http://kosmosfs.sourceforge.net/ -- From: rae l [EMAIL PROTECTED] Sent: Thursday, August 21, 2008 4:52 PM To: core-user@hadoop.apache.org Subject: Re:

Re: [Streaming] How to pass arguments to a map/reduce script

2008-08-21 Thread Steve Gao
That's interesting. Suppose your mapper script is a Perl script, how do you assign my.mapper.arg1's value to a variable $x? $x = $my.mapper.arg1 I just tried the way and my perl script does not recognize $my.mapper.arg1. --- On Thu, 8/21/08, Rong-en Fan [EMAIL PROTECTED] wrote: From: Rong-en Fan

Hadoop over Lustre?

2008-08-21 Thread Joel Welling
Hi folks; I'm new to Hadoop, and I'm trying to set it up on a cluster for which almost all the disk is mounted via the Lustre filesystem. That filesystem is visible to all the nodes, so I don't actually need HDFS to implement a shared filesystem. (I know the philosophical reasons why people

Re: Why is scaling HBase much simpler then scaling a relational db?

2008-08-21 Thread Mork0075
Thanks a lot for all replies, this is really helpful. As you describe it, its a problem of implementation. BigTable is designed to scale, there are routines to shard the data, desitribute it to the pool of connected servers. Could MySQL perhaps decide tomorrow to implement something similar

Re: HDFS Vs KFS

2008-08-21 Thread rae l
On Fri, Aug 22, 2008 at 12:34 AM, Wasim Bari [EMAIL PROTECTED] wrote: KFS is also another Distributed file system implemented in C++. Here you can get details: http://kosmosfs.sourceforge.net/ Just from the basic information: http://sourceforge.net/projects/kosmosfs # Developers : 2 #

Re: HDFS Vs KFS

2008-08-21 Thread Tim Wintle
I haven't used KFS, but I believe a major difference is that you can (apparently) mount KFS as a standard device under Linux, allowing you to read and write directly to it without having to re-compile the application (as far as I know that's not possible with HDFS, although the last time I

Re: [Streaming] How to pass arguments to a map/reduce script

2008-08-21 Thread Yuri Pradkin
On Thursday 21 August 2008 00:14:56 Gopal Gandhi wrote: I am using Hadoop streaming and I need to pass arguments to my map/reduce script. Because a map/reduce script is triggered by hadoop, like hadoop -file MAPPER -mapper $MAPPER -file REDUCER -reducer $REDUCER ... How can I pass

Re: HDFS Vs KFS

2008-08-21 Thread Otis Gospodnetic
Isn't there FUSE for HDFS, as well as the WebDAV option? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Tim Wintle [EMAIL PROTECTED] To: core-user@hadoop.apache.org Sent: Thursday, August 21, 2008 1:42:51 PM Subject: Re: HDFS Vs KFS

Haddop 0.17.2 configuration problems!

2008-08-21 Thread Gerardo Velez
I'm trying to install Hadoop 0.17.2 version on a linux box (xen os) So, bin/start-all.sh works fine, but hadoop-hadoop-jobtracker-softtek-helio-dev.log shows me error showed below. Do you now how to fix it? Thanks in advance 2008-08-21 11:20:28,020 INFO org.apache.hadoop.mapred.JobTracker:

Re: [Streaming] How to pass arguments to a map/reduce script

2008-08-21 Thread Steve Gao
Unfortunately this does not work. Hadoop complains: 08/08/21 18:04:46 ERROR streaming.StreamJob: Unexpected arg1 while processing

how to tell if the DFS is ready?

2008-08-21 Thread Karl Anderson
I'm getting NotReplicatedYet exceptions when I try to put a file on DFS for a newly created cluster. If I wait a while, the put works. Is there a way to tell if the DFS is ready from the master node? hadoop dfs -put isn't giving me a meaningful error status

Re: Hadoop over Lustre?

2008-08-21 Thread Arun C Murthy
It wouldn't be too much of a stretch to use Lustre directly... although it isn't trivial either. You'd need to implement the 'FileSystem' interface for Lustre, define a URI scheme (e.g. lfs://) etc. Please take a take a look at the KFS/ S3 implementations. Arun On Aug 21, 2008, at 9:59 AM,

getMaxReduceTasks()

2008-08-21 Thread Manish Shah
In cluster stats i see a number reported as the max capacity of reduce tasks for the cluster. How is this number computed? I didnt see any info in the java doc for the method. thanks. - Manish Co-Founder Rapleaf.com We're looking for a product manager, sys admin, and software

Re: Haddop 0.17.2 configuration problems!

2008-08-21 Thread Gerardo Velez
Thanks for answer! I guess safe-mode come out after awhile, but I was wondering if safemode problems it is causing my problem Basically, hadoop server starts just fine, but at moment to run example it never ends, here is some log: bin/hadoop jar hadoop-0.17.2-examples.jar wordcount

Re: Haddop 0.17.2 configuration problems!

2008-08-21 Thread Gerardo Velez
Hi all!! I just realized in a secondarynode log file and it stored this exception java.net.NoRouteToHostException: No route to host at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) at

Re: Haddop 0.17.2 configuration problems!

2008-08-21 Thread Gerardo Velez
More specifyc exception org.apache.hadoop.mapred.ReduceTask: java.net.NoRouteToHostException: No route to host On Thu, Aug 21, 2008 at 12:03 PM, Gerardo Velez [EMAIL PROTECTED]wrote: Hi all!! I just realized in a secondarynode log file and it stored this exception

Hadoop on Suse

2008-08-21 Thread Wasim Bari
Hi, Anyone experience with installing Hadoop or HDFS on Suse Linux? Thanks

Re: Hadoop on Suse

2008-08-21 Thread Miles Osborne
yes and it works out-of-the-box Miles 2008/8/21 Wasim Bari [EMAIL PROTECTED] Hi, Anyone experience with installing Hadoop or HDFS on Suse Linux? Thanks -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Get information of input split from MapRunner?

2008-08-21 Thread Qin Gao
Hi mailing, I want to get information of current input split inside the MapRunner object (or map function), however the only object I can get from the MapRunner is the RecordReader, and I saw no method defined in RecordReader to fetch the InputSplit object. Do you have any suggestions on this?

[Announcement] HBase major release version to track Hadoop major releases.

2008-08-21 Thread Jim Kellerman
Since HBase became a subproject of Hadoop, it started its own release numbering scheme. However, because a particular release of HBase requires a specific release of Hadoop, HBase will change so that its major releases correspond to the Hadoop major release that it depends on. For example, in

Re: [Streaming] How to pass arguments to a map/reduce script

2008-08-21 Thread Rong-en Fan
On Fri, Aug 22, 2008 at 12:51 AM, Steve Gao [EMAIL PROTECTED] wrote: That's interesting. Suppose your mapper script is a Perl script, how do you assign my.mapper.arg1's value to a variable $x? $x = $my.mapper.arg1 I just tried the way and my perl script does not recognize $my.mapper.arg1.

Questions about lucene index on HDFS

2008-08-21 Thread Jarvis . Guo
Hi all, Firstly I have known that there is a FsDirectory class in Nutch-0.9 so we can access the index on HDFS. But after I tested it, i found that we can only read the index but can not to append or modify, I think the reason is the one mentioned in the HDFS-file append issues, am I right?