Re: Reg: Problem in Build Versions of Hadoop-0.17.0

2008-07-16 Thread Shengkai Zhu
Replace the hadoop-*-core.jar in datanodes with your jar compiled under jobs On 7/16/08, chaitanya krishna [EMAIL PROTECTED] wrote: Hi, I'm using hadoop-0.17.0 and recently, when i stopped and restarted dfs, the datanodes are being created and soon, they r not present. the logs of namenode

Re: Version Mismatch when accessing hdfs through a nonhadoop java application?

2008-07-16 Thread Thibaut_
Jason Venner-2 wrote: When you compile from svn, the svn state number becomes part of the required version for hdfs - the last time I looked at it was 0.15.3 but it may still be happening. Hi Jason, Client and server are using the same library file (I checked it again,

Re: multiple Output Collectors ?

2008-07-16 Thread Alejandro Abdelnur
multiple mappers mean multiple jobs, which means you'll have to run 2 jobs on the same data, with the MultipleOutputs and MultipleOutputFormat you can do that in one pass form a single Mapper. On Wed, Jul 16, 2008 at 3:26 AM, Khanh Nguyen [EMAIL PROTECTED] wrote: Thank you very much. Someone

Re: Reg: Problem in Build Versions of Hadoop-0.17.0

2008-07-16 Thread chaitanya krishna
Thanks for the reply. It worked! :) On Wed, Jul 16, 2008 at 11:45 AM, Shengkai Zhu [EMAIL PROTECTED] wrote: Replace the hadoop-*-core.jar in datanodes with your jar compiled under jobs On 7/16/08, chaitanya krishna [EMAIL PROTECTED] wrote: Hi, I'm using hadoop-0.17.0 and recently,

Re: dfs.DataNode connection issues

2008-07-16 Thread brainstorm
If you refer to the other nodes: 2008-07-16 14:41:00,124 ERROR dfs.DataNode - 192.168.0.252:50010:DataXceiver: java.io.IOException: Block blk_7443738244200783289 has already been started (though not completed), and thus cannot be created. at

Is there a way to preempt the initial set of reduce tasks?

2008-07-16 Thread Murali Krishna
Hi, I have to run a small MR job while there is a bigger job already running. The first job takes around 20 hours to finish and the second 1 hour. The second job will be given a higher priority. The problem here is that the first set of reducers of job1 will be occupying all the slots and will

Re: Finished or not?

2008-07-16 Thread Amar Kamat
I have seen the opposite case where the maps are shown as 100% done while there are still some maps running. I have seen this on trunk and there were some failed/killed tasks. Amar Andreas Kostyrka wrote: On Wednesday 09 July 2008 05:56:28 Amar Kamat wrote: Andreas Kostyrka wrote:

Re: Finished or not?

2008-07-16 Thread Daniel Blaisdell
To speed up the overall map operation time, the last few map tasks are sent to multiple machines. The machine that finishes first wins and that block is passed onto the reduce phase while the other map tasks are killed and their results ignored. -Daniel On Wed, Jul 16, 2008 at 9:47 AM, Amar

Re: dfs.DataNode connection issues

2008-07-16 Thread brainstorm
Just for the record, as I have seen on previous archives regarding this same problem, I've changed the (cheap) 10/100 switch with a (robust?) 100/1000 one and a couple of ethernet cables... and nope, in my case it's not hardware related (at least on switch/cable end). Any other hints ? Thanks in

RE: Is there a way to preempt the initial set of reduce tasks?

2008-07-16 Thread Goel, Ankur
I presume that the initial set of reducers of job1 are taking fairly long to complete thereby denying the reducers of job2 a chance to run. I don't see a provision in hadoop to preempt a running task. This looks like an enhancment to task tracker scheduling where running tasks are preempted

Re: Is there a way to preempt the initial set of reduce tasks?

2008-07-16 Thread Amar Kamat
I think the JobTracker can easily detect this. The case where a high priority job is starved as there are no slots/resources. Preemption should probably kick in where tasks from a low priority job might get scheduled even though the high priority job has some tasks to run. Amar Goel, Ankur

Re: Is there a way to preempt the initial set of reduce tasks?

2008-07-16 Thread Amar Kamat
Goel, Ankur wrote: Ok in that case bumping up the priority of job2 to a level higher than job1 before running job2 should actually fix the starvation issue. @Ankur, Preemption across jobs with different priorities is still not there in Hadoop. Hence job1 will succeed before job2 because of

ROME 1.0 RC1 disted

2008-07-16 Thread Alejandro Abdelnur
I've updated ROME wiki, uploaded the javadocs and dists, tagging CVS at the moment. Would somebody double check everything is OK, links and downloads? Also we should get the JAR in a maven repo. Who had the necessary permissions to do? We can do a final 1.0 release in a couple of weeks, if no

Re: Two questions about hadoop

2008-07-16 Thread chaitanya krishna
Hi, Try setting number of map tasks in the program itself. For example, in the Wordcount example, you can set the number of maptasks in run method as conf.setNumMapTasksno. of map tasks I hope this answers your first query. Regards, V.V.Chaitanya Krishna IIIT,Hyderabad On Wed, Jul 16, 2008

Re: How to chain multiple hadoop jobs?

2008-07-16 Thread Joman Chu
Here is some more complete sample code that is based on my own MapReduce jobs. //import lots of things public class MyMapReduceTool extends Configured implements Tool { public int run(String[] args) throws Exception { JobConf conf = new JobConf(getConf(),

Re: dfs.DataNode connection issues

2008-07-16 Thread C G
You should look at https://issues.apache.org/jira/browse/HADOOP-3678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12610003#action_12610003 as well.  This eliminates spurious connection reset by peer messages that clutter up the DataNode logs and can be

Logging and JobTracker

2008-07-16 Thread Kylie McCormick
Hello (Again): I've managed to get Map/Reduce on its feet and running, but the JobClient runs the Map() to 100% then idles. At least, I think it's idling. It's certainly not updating, and I let it run 10+ minutes. I tried to get the history of the job and/or the logs, and I seem to be running

Re: Are lines broken in dfs and/or in InputSplit

2008-07-16 Thread Kevin
I tried a bit and it looks that lines are preserved so far. However, is this property supported for sure, or what should I do to keep it works in this way? Thank you. -Kevin On Tue, Jul 15, 2008 at 5:07 PM, Kevin [EMAIL PROTECTED] wrote: Hi, I was trying to parse text input with line-based

Re: Logging and JobTracker

2008-07-16 Thread Arun C Murthy
On Jul 16, 2008, at 4:09 PM, Kylie McCormick wrote: Hello (Again): I've managed to get Map/Reduce on its feet and running, but the JobClient runs the Map() to 100% then idles. At least, I think it's idling. It's certainly not updating, and I let it run 10+ minutes. I tried to get the

Volunteer recruitment for matrix library project on Hadoop.

2008-07-16 Thread Edward J. Yoon
Hello all, The Hama team which is trying to port typical linear algebra operations on Hadoop looking for a couple of more volunteers. This would essentially speedup development time for typical machine learning algorithms. If you interested in here contact [EMAIL PROTECTED] Thanks. -- Best

RE: Is there a way to preempt the initial set of reduce tasks?

2008-07-16 Thread Vivek Ratan
There are a few different issues at play here. - It seems like you're facing a problem only because the reducers of JOb1 are long running (somebody else pointed this out too). Once a reducer of Job1 finishes, that slot will go to a reducer of Job2 in today's Hadoop. Can you confirm that is