Re: Exception in thread main org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:

2008-11-24 Thread Sagar Naik

Include the ${HADOOP}/conf/ dir in the classpath of the java program
Alternatively,
u can also try,
bin/hadoop jar your_jar main_class args

-Sagar

Saju K K wrote:
This is in referance with the sample application in the JAVAWord 
http://www.javaworld.com/javaworld/jw-09-2008/jw-09-hadoop.html?page=5



bin/hadoop dfs -mkdir /opt/www/hadoop/hadoop-0.18.2/words
bin/hadoop dfs -put word1 /opt/www/hadoop/hadoop-0.18.2/words
bin/hadoop dfs -put word2 /opt/www/hadoop/hadoop-0.18.2/words
bin/hadoop dfs -put word3 /opt/www/hadoop/hadoop-0.18.2/words
bin/hadoop dfs -put word4 /opt/www/hadoop/hadoop-0.18.2/words

When i browse through the
http://serdev40.apac.nokia.com:50075/browseDirectory.jsp .I could see the
files in the directory

Also below commands execute properly 


bin/hadoop dfs -ls /opt/www/hadoop/hadoop-0.18.2/words/
bin/hadoop dfs -ls /opt/www/hadoop/hadoop-0.18.2/words/word1
bin/hadoop dfs -cat /opt/www/hadoop/hadoop-0.18.2/words/word1

But on executing this command ,i am  getting an  error
java -Xms1024m -Xmx1024m com.nokia.tag.test.EchoOhce
/opt/www/hadoop/hadoop-0.18.2/words/ result

 java -Xms1024m -Xmx1024m com.nokia.tag.test.EchoOhce
/opt/www/hadoop/hadoop-0.18.2/words result
08/11/24 10:52:54 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
Exception in thread main org.apache.hadoop.mapred.InvalidInputException:
Input path does not exist: file:/opt/www/hadoop/hadoop-0.18.2/words
at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:210)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
at com.nokia.tag.test.EchoOhce.run(EchoOhce.java:123)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at com.nokia.tag.test.EchoOhce.main(EchoOhce.java:129)

Can  anybody know why there is a failure from the Java application

  




Re: Exception in thread main org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:

2008-11-24 Thread Saju K K

It works ...
Thanks Sagar 

--saju 

Sagar Naik-3 wrote:
 
 Include the ${HADOOP}/conf/ dir in the classpath of the java program
 Alternatively,
 u can also try,
 bin/hadoop jar your_jar main_class args
 
 -Sagar
 
 Saju K K wrote:
 This is in referance with the sample application in the JAVAWord 
 http://www.javaworld.com/javaworld/jw-09-2008/jw-09-hadoop.html?page=5


 bin/hadoop dfs -mkdir /opt/www/hadoop/hadoop-0.18.2/words
 bin/hadoop dfs -put word1 /opt/www/hadoop/hadoop-0.18.2/words
 bin/hadoop dfs -put word2 /opt/www/hadoop/hadoop-0.18.2/words
 bin/hadoop dfs -put word3 /opt/www/hadoop/hadoop-0.18.2/words
 bin/hadoop dfs -put word4 /opt/www/hadoop/hadoop-0.18.2/words

 When i browse through the
 http://serdev40.apac.nokia.com:50075/browseDirectory.jsp .I could see the
 files in the directory

 Also below commands execute properly 

 bin/hadoop dfs -ls /opt/www/hadoop/hadoop-0.18.2/words/
 bin/hadoop dfs -ls /opt/www/hadoop/hadoop-0.18.2/words/word1
 bin/hadoop dfs -cat /opt/www/hadoop/hadoop-0.18.2/words/word1

 But on executing this command ,i am  getting an  error
 java -Xms1024m -Xmx1024m com.nokia.tag.test.EchoOhce
 /opt/www/hadoop/hadoop-0.18.2/words/ result

  java -Xms1024m -Xmx1024m com.nokia.tag.test.EchoOhce
 /opt/www/hadoop/hadoop-0.18.2/words result
 08/11/24 10:52:54 INFO jvm.JvmMetrics: Initializing JVM Metrics with
 processName=JobTracker, sessionId=
 Exception in thread main
 org.apache.hadoop.mapred.InvalidInputException:
 Input path does not exist: file:/opt/www/hadoop/hadoop-0.18.2/words
 at
 org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:179)
 at
 org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:210)
 at
 org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
 at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1026)
 at com.nokia.tag.test.EchoOhce.run(EchoOhce.java:123)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
 at com.nokia.tag.test.EchoOhce.main(EchoOhce.java:129)

 Can  anybody know why there is a failure from the Java application

   
 
 
 

-- 
View this message in context: 
http://www.nabble.com/Exception-in-thread-%22main%22-org.apache.hadoop.mapred.InvalidInputException%3A-Input-path-does-not-exist%3A-tp20655207p20656757.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.



hdfs read failure

2008-11-24 Thread Tamás Szokol
Hello!





I'm trying to perform a read test of HDFS files through libhdfs using
the hadoop-0.18.2/src/c++/libhdfs/hdfs_read.c test program. Creating
the files succeeds but reading them fails.



I create two 1MB local files with hdfs_write.c and then I put it under hdfs 
using hadoop fs -put. The files go under dfs.data.dir as:

hdfs://server:port/dfs.data.dir/file1 and 


hdfs://server:port/dfs.data.dir/file2



Then I try to read it back with hdfs_read and measure the time it takes but I 
get the following exceptions:



Reading file:///home/sony/hadoop/dfs/blocks/file1 1MB

Exception in thread main java.lang.IllegalArgumentException: Wrong
FS: hdfs://myserver.com:23000/home/sony/hadoop/dfs/blocks/file1,
expected: file:///

    at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:320)

    at 
org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:52)

Call to 
org.apache.hadoop.fs.FileSystem::open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;)
 failed!

hdfs_read.c: Failed to open 
hdfs://myserver.com:23000/home/sony/hadoop/dfs/blocks/file1 for writing!

..

Reading file:home/sony/hadoop/dfs/blocks/file2 1MB

Exception in thread main java.lang.IllegalArgumentException: Wrong
FS: hdfs://myserver.com:23000/home/sony/hadoop/dfs/blocks/file2,
expected: file:///

    at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:320)

    at 
org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:52)

Call to 
org.apache.hadoop.fs.FileSystem::open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;)
 failed!

hdfs_read.c: Failed to open 
hdfs://myserver.com:23000/home/sony/hadoop/dfs/blocks/file2 for writing!



Do I use an incorrect URI? What can be the problem?



Cheers,

Tamas


  

s3n exceptions

2008-11-24 Thread Alexander Aristov
Hi all

I am testing s3n file system facilities and try to copy from hdfs to S3 in
original format

And I get next errors

08/11/24 05:04:49 INFO mapred.JobClient: Running job: job_200811240437_0004
08/11/24 05:04:50 INFO mapred.JobClient:  map 0% reduce 0%
08/11/24 05:05:00 INFO mapred.JobClient:  map 44% reduce 0%
08/11/24 05:05:03 INFO mapred.JobClient:  map 0% reduce 0%
08/11/24 05:05:03 INFO mapred.JobClient: Task Id :
attempt_200811240437_0004_m_00_0, Status : FAILED
java.io.IOException: Copied: 0 Skipped: 0 Failed: 7
at
org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:542)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)

08/11/24 05:05:15 INFO mapred.JobClient: Task Id :
attempt_200811240437_0004_m_00_1, Status : FAILED
java.io.IOException: Copied: 0 Skipped: 0 Failed: 7
at
org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:542)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)


-- 
Best Regards
Alexander Aristov


Re: Hadoop Installation

2008-11-24 Thread Steve Loughran

Mithila Nagendra wrote:

I tried dropping the jar files into the lib. It still doesnt work.. The
following is how the lib looks after the new files were put in:

[EMAIL PROTECTED] hadoop-0.17.2.1]$ cd bin
[EMAIL PROTECTED] bin]$ ls
hadoophadoop-daemon.sh   rccstart-all.sh
start-dfs.sh stop-all.sh   stop-dfs.sh
hadoop-config.sh  hadoop-daemons.sh  slaves.sh  start-balancer.sh
 start-mapred.sh  stop-balancer.sh  stop-mapred.sh
[EMAIL PROTECTED] bin]$ cd ..
[EMAIL PROTECTED] hadoop-0.17.2.1]$ mv commons-logging-1.1.1/* lib
[EMAIL PROTECTED] hadoop-0.17.2.1]$ cd lib
[EMAIL PROTECTED] lib]$ ls
commons-cli-2.0-SNAPSHOT.jar  commons-logging-1.1.1-javadoc.jar
commons-logging-tests.jar  junit-3.8.1.jar  log4j-1.2.13.jar   site
commons-codec-1.3.jar commons-logging-1.1.1-sources.jar
jets3t-0.5.0.jar   junit-3.8.1.LICENSE.txt  native
xmlenc-0.52.jar
commons-httpclient-3.0.1.jar  commons-logging-adapters-1.1.1.jar
 jetty-5.1.4.jarkfs-0.1.jar  NOTICE.txt
commons-logging-1.0.4.jar commons-logging-api-1.0.4.jar
jetty-5.1.4.LICENSE.txtkfs-0.1.LICENSE.txt  RELEASE-NOTES.txt
commons-logging-1.1.1.jar commons-logging-api-1.1.1.jar   jetty-ext
 LICENSE.txt  servlet-api.jar


OK, you now have two copies of commons-logging in there. I would detele 
the -1.0.4 version and the -api and -sources JARs.


But I don't think that is the root cause of this problem. Are you 
running on a linux system that has commons-logging installed as an RPM 
or .deb package? Because that could be making a real mess of your 
classpath. The error your are seeing implies that log4j isnt there or it 
won't load. And as log4j is there, it looks like a classloader problem 
of some sort. These are tractable, but they are hard to track down and 
it is only on your system(s) that it exists. There's not much that can 
be done remotely.


-steve


Re: Hadoop+log4j

2008-11-24 Thread Brian Bockelman


On Nov 24, 2008, at 9:49 AM, Steve Loughran wrote:


Scott Whitecross wrote:

Thanks Brian.  So you have had luck w/ log4j?


We grab logs off machines by not using lo4j and routing to our own  
logging infrastructure that can feed events to other boxes via RMI  
and queues. This stuff slots in behind commons-logging, with a  
custom commons-logging bridge specified on the command line. To get  
this into Hadoop I had to patch hadoop.jar and remove the properties  
file that bound it only to log4j. The central receiver/SPOF logs  
events by sent time and received time and can store all results into  
text files intermixed for post-processing. It's good for testing,  
but on a big production cluster you'd want something more robust and  
scaleable


Hey Steve,

Sounds like a cool setup, but might be a little much for Scott's  
purposes (trying to debug a single Map phase...).  Scott, I have been  
able to successfully add new log4j loggers, but in Hadoop code, not in  
a M-R task.  If you try things in local mode, you'll be guaranteed to  
have the same JVM, so the configuration should be loaded the same way.


Then again, I might be putting words into Scott's mouth: maybe he does  
indeed want to scale this way up and turn it into a logging  
infrastructure.


Scott, did you have any luck debugging the job through the wiki  
document on debugging mapreduce?  I'd make sure to start there before  
you take too much of a detour into log4j-land.


Brian


Re: Hadoop+log4j

2008-11-24 Thread Steve Loughran

Scott Whitecross wrote:

Thanks Brian.  So you have had luck w/ log4j?



We grab logs off machines by not using lo4j and routing to our own 
logging infrastructure that can feed events to other boxes via RMI and 
queues. This stuff slots in behind commons-logging, with a custom 
commons-logging bridge specified on the command line. To get this into 
Hadoop I had to patch hadoop.jar and remove the properties file that 
bound it only to log4j. The central receiver/SPOF logs events by sent 
time and received time and can store all results into text files 
intermixed for post-processing. It's good for testing, but on a big 
production cluster you'd want something more robust and scaleable


Re: Hadoop Installation

2008-11-24 Thread Mithila Nagendra
Thanks Steve! Will take a look at it..
Mithila

On Mon, Nov 24, 2008 at 6:32 PM, Steve Loughran [EMAIL PROTECTED] wrote:

 Mithila Nagendra wrote:

 I tried dropping the jar files into the lib. It still doesnt work.. The
 following is how the lib looks after the new files were put in:

 [EMAIL PROTECTED] hadoop-0.17.2.1]$ cd bin
 [EMAIL PROTECTED] bin]$ ls
 hadoophadoop-daemon.sh   rccstart-all.sh
 start-dfs.sh stop-all.sh   stop-dfs.sh
 hadoop-config.sh  hadoop-daemons.sh  slaves.sh  start-balancer.sh
  start-mapred.sh  stop-balancer.sh  stop-mapred.sh
 [EMAIL PROTECTED] bin]$ cd ..
 [EMAIL PROTECTED] hadoop-0.17.2.1]$ mv commons-logging-1.1.1/* lib
 [EMAIL PROTECTED] hadoop-0.17.2.1]$ cd lib
 [EMAIL PROTECTED] lib]$ ls
 commons-cli-2.0-SNAPSHOT.jar  commons-logging-1.1.1-javadoc.jar
 commons-logging-tests.jar  junit-3.8.1.jar  log4j-1.2.13.jar
 site
 commons-codec-1.3.jar commons-logging-1.1.1-sources.jar
 jets3t-0.5.0.jar   junit-3.8.1.LICENSE.txt  native
 xmlenc-0.52.jar
 commons-httpclient-3.0.1.jar  commons-logging-adapters-1.1.1.jar
  jetty-5.1.4.jarkfs-0.1.jar  NOTICE.txt
 commons-logging-1.0.4.jar commons-logging-api-1.0.4.jar
 jetty-5.1.4.LICENSE.txtkfs-0.1.LICENSE.txt  RELEASE-NOTES.txt
 commons-logging-1.1.1.jar commons-logging-api-1.1.1.jar
 jetty-ext
 LICENSE.txt  servlet-api.jar


 OK, you now have two copies of commons-logging in there. I would detele the
 -1.0.4 version and the -api and -sources JARs.

 But I don't think that is the root cause of this problem. Are you running
 on a linux system that has commons-logging installed as an RPM or .deb
 package? Because that could be making a real mess of your classpath. The
 error your are seeing implies that log4j isnt there or it won't load. And as
 log4j is there, it looks like a classloader problem of some sort. These are
 tractable, but they are hard to track down and it is only on your system(s)
 that it exists. There's not much that can be done remotely.

 -steve



Re: Hadoop Installation

2008-11-24 Thread Mithila Nagendra
Hey Steve

Out of the following which one do I remove - just making sure.. I got rid
of  commons-logging-1.0.4.jar

commons-logging-api-1.0.4.jar
commons-logging-1.1.1-sources.jar commons-logging-1.1.1-sources.jar

Thanks!
Mithila


On Mon, Nov 24, 2008 at 6:32 PM, Steve Loughran [EMAIL PROTECTED] wrote:

 Mithila Nagendra wrote:

 I tried dropping the jar files into the lib. It still doesnt work.. The
 following is how the lib looks after the new files were put in:

 [EMAIL PROTECTED] hadoop-0.17.2.1]$ cd bin
 [EMAIL PROTECTED] bin]$ ls
 hadoophadoop-daemon.sh   rccstart-all.sh
 start-dfs.sh stop-all.sh   stop-dfs.sh
 hadoop-config.sh  hadoop-daemons.sh  slaves.sh  start-balancer.sh
  start-mapred.sh  stop-balancer.sh  stop-mapred.sh
 [EMAIL PROTECTED] bin]$ cd ..
 [EMAIL PROTECTED] hadoop-0.17.2.1]$ mv commons-logging-1.1.1/* lib
 [EMAIL PROTECTED] hadoop-0.17.2.1]$ cd lib
 [EMAIL PROTECTED] lib]$ ls
 commons-cli-2.0-SNAPSHOT.jar  commons-logging-1.1.1-javadoc.jar
 commons-logging-tests.jar  junit-3.8.1.jar  log4j-1.2.13.jar
 site
 commons-codec-1.3.jar commons-logging-1.1.1-sources.jar
 jets3t-0.5.0.jar   junit-3.8.1.LICENSE.txt  native
 xmlenc-0.52.jar
 commons-httpclient-3.0.1.jar  commons-logging-adapters-1.1.1.jar
  jetty-5.1.4.jarkfs-0.1.jar  NOTICE.txt
 commons-logging-1.0.4.jar commons-logging-api-1.0.4.jar
 jetty-5.1.4.LICENSE.txtkfs-0.1.LICENSE.txt  RELEASE-NOTES.txt
 commons-logging-1.1.1.jar commons-logging-api-1.1.1.jar
 jetty-ext
 LICENSE.txt  servlet-api.jar


 OK, you now have two copies of commons-logging in there. I would detele the
 -1.0.4 version and the -api and -sources JARs.

 But I don't think that is the root cause of this problem. Are you running
 on a linux system that has commons-logging installed as an RPM or .deb
 package? Because that could be making a real mess of your classpath. The
 error your are seeing implies that log4j isnt there or it won't load. And as
 log4j is there, it looks like a classloader problem of some sort. These are
 tractable, but they are hard to track down and it is only on your system(s)
 that it exists. There's not much that can be done remotely.

 -steve



Third Hadoop Get Together @ Berlin

2008-11-24 Thread Isabel Drost

The third German Hadoop get together is going to take place at 9th of December 
at newthinking store in Berlin:

http://upcoming.yahoo.com/event/1383706/?ps=6

You can order drinks directly at the bar in the newthinking store. As this Get 
Together takes place in December - Christmas time - there will be cookies as 
well. There are quite a few good restaurants nearby, so we can go there after 
the official part.

Stefan Groschupf offered to prepare a talk on his project katta. We are still 
looking for one or more interesting talks. We would like to invite you, the 
visitor to tell your Hadoop story. If you like, you can bring slides - there 
will be a beamer. Please send your proposal at [EMAIL PROTECTED]

There will be slots of 20min each for talks on your Hadoop topic. After each 
talk there will be time to discuss.

A big Thanks goes to the newthinking store for again providing a room in the 
center of Berlin for us.

Looking forward to seeing you in Berlin,
Isabel Drost

-- 
QOTD: It's not an optical illusion, it just looks like one.   -- Phil White 
  |\  _,,,---,,_   Web:   http://www.isabel-drost.de
  /,`.-'`'-.  ;-;;,_   VoIP:sip://[EMAIL PROTECTED]
 |,4-  ) )-,_..;\ (  `'-'  Tel: (+49) 30 6920 6101
'---''(_/--'  `-'\_) (fL)  IM:  xmpp://[EMAIL PROTECTED]



pgpFXWGctW86l.pgp
Description: PGP signature


Re: Hadoop Installation

2008-11-24 Thread Steve Loughran

Mithila Nagendra wrote:

Hey Steve

Out of the following which one do I remove - just making sure.. I got rid
of  commons-logging-1.0.4.jar

commons-logging-api-1.0.4.jar
commons-logging-1.1.1-sources.jar commons-logging-1.1.1-sources.jar


Hadoop is currently built with commons-logging-1.0.4.jar, so strictly 
speaking that should be the only one to retain; all the others can go. 
But you could also delete everythiing except commons-logging-1.1.1.jar 
and it should work just as well. None of the -sources JARs are needed, 
none of the -api jars are needed.

-steve


Re: Hadoop Installation

2008-11-24 Thread Mithila Nagendra
Hey Steve
I deleted what ever I needed to.. still no luck..

You said that the classpath might be messed up.. Is there some way I can
reset it? For the root user? What path do I set it to.

Mithila

On Mon, Nov 24, 2008 at 8:54 PM, Steve Loughran [EMAIL PROTECTED] wrote:

 Mithila Nagendra wrote:

 Hey Steve

 Out of the following which one do I remove - just making sure.. I got rid
 of  commons-logging-1.0.4.jar

 commons-logging-api-1.0.4.jar
 commons-logging-1.1.1-sources.jar commons-logging-1.1.1-sources.jar


 Hadoop is currently built with commons-logging-1.0.4.jar, so strictly
 speaking that should be the only one to retain; all the others can go. But
 you could also delete everythiing except commons-logging-1.1.1.jar and it
 should work just as well. None of the -sources JARs are needed, none of the
 -api jars are needed.
 -steve



Re: Facing issues Integrating PIG with Hadoop.

2008-11-24 Thread Alan Gates

Pig questions should be sent to [EMAIL PROTECTED]

The error you're getting usually means that you have a version of  
hadoop that doesn't match your version of pig.  If you downloaded  
latest for hadoop, that will be the case, as pig currently supports  
hadoop 0.18, but not 0.19 or top of trunk.  Try running your pig  
version with a released 0.18 version of hadoop and see if you get  
better results.


Alan.

On Nov 24, 2008, at 10:09 AM, cutepooja54321 wrote:



hi , i am also trying to do the same thing,,
can u please help me if u managed to do it ,, pleassee.. i am also  
a student


us latha wrote:


Hi All

Am a student trying to integrate PIG and Hadoop technologies to  
build a

custom application as a part of my MS project.
Am trying out a simple scenario where I have setup a single node  
hadoop

cluster and trying to execute the pig script script1-*hadoop*.*pig
*mentioned
in the pig tutorial.

Am hitting several issues like Failed to create data storage etc.
Had posted same to the groups already.
http://www.nabble.com/Integration-of-pig-and-hadoop-fails-with-% 
22Failed-to-create-DataStorage%22-error.-td18931962.html


Could you please suggest me the proper steps to integrate  pig and  
hadoop.

Right now, am following the below ones.

1) Have downloded latest source for hadoop and PIG
2) Compiled hadoop and started single node cluster
3) Compiled PIG and replaced the hadoop class files with the new  
ones from

step 2 in the pig.jar
4) executing the pig script by setting HADOOPSITEPATH

Please let me if the above steps needs are incorrect (or) should i  
use any
specific pig and hadoop versions? We are stuck up with the errors.  
Request

you to pls help in resolving the same.

Thankyou
Srilatha




--
View this message in context: http://www.nabble.com/Facing-issues- 
Integrating-PIG-with-Hadoop.-tp19597351p20666428.html

Sent from the Hadoop core-user mailing list archive at Nabble.com.





Is Hudson Patch verifier stuck?

2008-11-24 Thread Abdul Qadeer
Hudon patch verifier is running for last 10 hours on a patch.  Is
it stuck or is it normal for it to take so long on some patches?

Abdul Qadeer


do NOT start reduce task until all mappers are finished

2008-11-24 Thread Haijun Cao
Hi,



I am using 0.18.2 with fair scheduler hadoop-3476. 

The purpose of fair scheduler is to prevent long running jobs
from blocking short jobs. I gave it a try --- start a long job first, then a
short one. The short job is able to grab some map slot and finishes its map
phase quickly, but it still blocks on reduce phase. Because the long job has
taken all the reduce slots (because the long job starts first and its reducers
are started shortly after).
 
The long job’s reducer won’t finish until all its mappers
have finished. So my short job is still blocked by the long job…. Making the
fair scheduler useless for my workload.
 
I am wondering if there is a way to NOT to start reduce task
until all its mappers have finished. 
 
Thanks

Haijun Cao


  

[ANNOUNCE] Hadoop release 0.19.0 available

2008-11-24 Thread Nigel Daley
Release 0.19.0 contains many improvements, new features, bug fixes and  
optimizations.


For release details and downloads, visit:

 http://hadoop.apache.org/core/releases.html

Thanks to all who contributed to this release!

Nigel


Re: How to integrate hadoop framework with web application

2008-11-24 Thread 晋光峰
Thanks for your feedback.

I think I have found the initial solution. Since the hadoop job execution
and the web application execution are two different processes. I plan to use
intermediate files as the process communication media. It seems that it is
impossible to call hadoop functions directly from the java servelt class.

So my step is: 1) startup the hadoop job execution 2) Get the result output
files and put to tomcat web application file folder 3) Get the hadoop job
result by reading files from java servlet class.

regards
On Mon, Nov 24, 2008 at 3:17 PM, Alexander Aristov 
[EMAIL PROTECTED] wrote:

 Hi
 You may want to take a look at the Nutch project - hadoop based search
 engine. It has web application with hadoop integration.

 As far as I remember you should add hadoop libs and configuration files to
 classpath and init hadoop on startup.

 Alexander

 2008/11/24 柳松 [EMAIL PROTECTED]

  Dear  晋光峰:
 Glad to see another Chinese name here. It sounds possible, but could
 you
  give us a little more detail?
  Best Regards.
 
 
 
  在2008-11-24?09:41:15,晋光峰?[EMAIL PROTECTED]?写道:
  Dear?all,
  
  Does?anyone?knows?how?to?integrate?hadoop?to?web?applications??I?want?to
 
 
 startup?a?hadoop?job?by?the?Java?Servlet?(in?web?server?servlet?container),
 
 then?get?the?result?and?send?result?back?to?browser.?Is?this?possible??How
  to?connect?the?web?server?with?the?hadoop?framework?
  
  Please?give?me?any?advice?or?suggestions?about?this.
  
  Thanks
  --?
  Guangfeng?Jin
  
  Software?Engineer
  
  iZENEsoft?(Shanghai)?Co.,?Ltd
 



 --
 Best Regards
 Alexander Aristov




-- 
Guangfeng Jin

Software Engineer

iZENEsoft (Shanghai) Co., Ltd


Re: ls command output format

2008-11-24 Thread Tsz Wo Sze
Filed HADOOP-4719 for this.

Nicholas Sze.




- Original Message 
 From: Tsz Wo (Nicholas), Sze [EMAIL PROTECTED]
 To: core-user@hadoop.apache.org
 Sent: Friday, November 21, 2008 7:54:27 AM
 Subject: Re: ls command output format
 
 Hi Alex,
 
 Yes, the doc about ls is out-dated.  Thanks for pointing this out.  Would you 
 mind to file a JIRA?
 
 Nicholas Sze
 
 
 
 - Original Message 
  From: Alexander Aristov 
  To: core-user@hadoop.apache.org
  Sent: Friday, November 21, 2008 6:08:08 AM
  Subject: Re: ls command output format
  
  Found out that output has been changed in 0.18
  
  see HADOOP-2865 
  
  Docs should be also then updated.
  
  Alex
  
  2008/11/21 Alexander Aristov 
  
   Hello
  
   I wonder if hadoop shell command ls has changed output format
  
   Trying hadoop-0.18.2 I got next output
  
   [root]# hadoop fs -ls /
   Found 2 items
   drwxr-xr-x   - root supergroup  0 2008-11-21 08:08 /mnt
   drwxr-xr-x   - root supergroup  0 2008-11-21 08:19 /repos
  
  
   Though according to docs it should be that file name goes first.
   http://hadoop.apache.org/core/docs/r0.18.2/hdfs_shell.html#ls
  
   Usage: hadoop fs -ls 
   For a file returns stat on the file with the following format:
   filename filesize modification_date modification_time
   permissions userid groupid
   For a directory it returns list of its direct children as in unix. A
   directory is listed as:
   dirname 
 modification_time modification_time permissions userid
   groupid
   Example:
   hadoop fs -ls /user/hadoop/file1 /user/hadoop/file2 hdfs://
   nn.example.com/user/hadoop/dir1 /nonexistentfile
   Exit Code:
Returns 0 on success and -1 on error.
  
  
   I wouldn't notice the issue if I haven't had scripts which rely on the
   formatting.
  
   --
   Best Regards
   Alexander Aristov
  
  
  
  
  -- 
  Best Regards
  Alexander Aristov



Re: ls command output format

2008-11-24 Thread Alexander Aristov
Thanks for creating it. I haven't tried Jira yet and didn't know how to do
this.
Alex

2008/11/25 Tsz Wo Sze [EMAIL PROTECTED]

 Filed HADOOP-4719 for this.

 Nicholas Sze.




 - Original Message 
  From: Tsz Wo (Nicholas), Sze [EMAIL PROTECTED]
  To: core-user@hadoop.apache.org
  Sent: Friday, November 21, 2008 7:54:27 AM
  Subject: Re: ls command output format
 
  Hi Alex,
 
  Yes, the doc about ls is out-dated.  Thanks for pointing this out.  Would
 you
  mind to file a JIRA?
 
  Nicholas Sze
 
 
 
  - Original Message 
   From: Alexander Aristov
   To: core-user@hadoop.apache.org
   Sent: Friday, November 21, 2008 6:08:08 AM
   Subject: Re: ls command output format
  
   Found out that output has been changed in 0.18
  
   see HADOOP-2865
  
   Docs should be also then updated.
  
   Alex
  
   2008/11/21 Alexander Aristov
  
Hello
   
I wonder if hadoop shell command ls has changed output format
   
Trying hadoop-0.18.2 I got next output
   
[root]# hadoop fs -ls /
Found 2 items
drwxr-xr-x   - root supergroup  0 2008-11-21 08:08 /mnt
drwxr-xr-x   - root supergroup  0 2008-11-21 08:19 /repos
   
   
Though according to docs it should be that file name goes first.
http://hadoop.apache.org/core/docs/r0.18.2/hdfs_shell.html#ls
   
Usage: hadoop fs -ls
For a file returns stat on the file with the following format:
filename filesize modification_date modification_time
permissions userid groupid
For a directory it returns list of its direct children as in unix. A
directory is listed as:
dirname
  modification_time modification_time permissions userid
groupid
Example:
hadoop fs -ls /user/hadoop/file1 /user/hadoop/file2 hdfs://
nn.example.com/user/hadoop/dir1 /nonexistentfile
Exit Code:
 Returns 0 on success and -1 on error.
   
   
I wouldn't notice the issue if I haven't had scripts which rely on
 the
formatting.
   
--
Best Regards
Alexander Aristov
   
  
  
  
   --
   Best Regards
   Alexander Aristov




-- 
Best Regards
Alexander Aristov


Re: Block placement in HDFS

2008-11-24 Thread Owen O'Malley


On Nov 24, 2008, at 8:44 PM, Mahadev Konar wrote:


Hi Dennis,
 I don't think that is possible to do.


No, it is not possible.


 The block placement is determined
by HDFS internally (which is local, rack local and off rack).


Actually, it was changed in 0.17 or so to be node-local, off-rack, and  
a second node off rack.


-- Owen


RE: do NOT start reduce task until all mappers are finished

2008-11-24 Thread Haijun Cao
Amar, Thanks for the pointer.  

-Original Message-
From: Amar Kamat [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 24, 2008 8:43 PM
To: core-user@hadoop.apache.org
Subject: Re: do NOT start reduce task until all mappers are finished

Haijun Cao wrote:
 Hi,



 I am using 0.18.2 with fair scheduler hadoop-3476. 

 The purpose of fair scheduler is to prevent long running jobs
 from blocking short jobs. I gave it a try --- start a long job first,
then a
 short one. The short job is able to grab some map slot and finishes
its map
 phase quickly, but it still blocks on reduce phase. Because the long
job has
 taken all the reduce slots (because the long job starts first and its
reducers
 are started shortly after).
  
 The long job's reducer won't finish until all its mappers
 have finished. So my short job is still blocked by the long job
Making the
 fair scheduler useless for my workload.
  
 I am wondering if there is a way to NOT to start reduce task
 until all its mappers have finished. 
  
   
https://issues.apache.org/jira/browse/HADOOP-4666 is opened to address 
something similar. Starting the reducers after all the maps are done 
might result into increased runtime of the job.  The reason for starting

the reducers along with the maps is to interleave/parallelize map and 
shuffle(data-pulling) phase since maps are typically cpu bound while 
shuffle is io bound.
Amar
 Thanks

 Haijun Cao