Unable to run Jar file in Hadoop.

2009-06-25 Thread krishna prasanna
Hi,   When i am trying to run a Jar in Hadoop, it is giving me the following error   had...@krishna-dev:/usr/local/hadoop$ bin/hadoop jar /user/hadoop/hadoop-0.18.0-examples.jar java.io.IOException: Error opening job jar: /user/hadoop/hadoop-0.18.0-examples.jar     at

RE: Unable to run Jar file in Hadoop.

2009-06-25 Thread Shravan Mahankali
Am as well having similar... there is no solution yet!!! Thank You, Shravan Kumar. M Catalytic Software Ltd. [SEI-CMMI Level 5 Company] - This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom

Re: Unable to run Jar file in Hadoop.

2009-06-25 Thread krishna prasanna
Oh! thanks Shravan Krishna. From: Shravan Mahankali shravan.mahank...@catalytic.com To: core-user@hadoop.apache.org Sent: Thursday, 25 June, 2009 1:50:51 PM Subject: RE: Unable to run Jar file in Hadoop. Am as well having similar... there is no solution yet!!!

Re: FYI, Large-scale graph computing at Google

2009-06-25 Thread Edward J. Yoon
What do you think about another new computation framework on HDFS? On Mon, Jun 22, 2009 at 3:50 PM, Edward J. Yoon edwardy...@apache.org wrote: http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html -- It sounds like Pregel seems, a computing framework based on

Re: Problem with setting up the cluster

2009-06-25 Thread Tom White
Have a look at the datanode log files on the datanode machines and see what the error is in there. Cheers, Tom On Thu, Jun 25, 2009 at 6:21 AM, .ke. sivakumarkesivaku...@gmail.com wrote: Hi all, I'm a student and I have been tryin to set up the hadoop cluster for a while but have been

Re: Unable to run Jar file in Hadoop.

2009-06-25 Thread Amareshwari Sriramadasu
Is your jar file in local file system or hdfs? The jar file should be in local fs. Thanks Amareshwari Shravan Mahankali wrote: Am as well having similar... there is no solution yet!!! Thank You, Shravan Kumar. M Catalytic Software Ltd. [SEI-CMMI Level 5 Company] -

Re: Unable to run Jar file in Hadoop.

2009-06-25 Thread Tom White
Hi Krishna, You get this error when the jar file cannot be found. It looks like /user/hadoop/hadoop-0.18.0-examples.jar is an HDFS path, when in fact it should be a local path. Cheers, Tom On Thu, Jun 25, 2009 at 9:43 AM, krishna prasannasvk_prasa...@yahoo.com wrote: Oh! thanks Shravan

Re: Rebalancing Hadoop Cluster running 15.3

2009-06-25 Thread Tom White
Hi Usman, Before the rebalancer was introduced one trick people used was to increase the replication on all the files in the system, wait for re-replication to complete, then decrease the replication to the original level. You can do this using hadoop fs -setrep. Cheers, Tom On Thu, Jun 25,

Re: Unable to run Jar file in Hadoop.

2009-06-25 Thread krishna prasanna
Thanks all problem is resolved now. Issues is the same, jar file is in HDFSwhich logically is wrong.   Krishna.   Hi Krishna, You get this error when the jar file cannot be found. It looks like /user/hadoop/hadoop-0.18.0-examples.jar is an HDFS path, when in fact it should be a local path.

Re: Hadoop 0.20.0, xml parsing related error

2009-06-25 Thread Steve Loughran
Ram Kulbak wrote: Hi, The exception is a result of having xerces in the classpath. To resolve, make sure you are using Java 6 and set the following system property: -Djavax.xml.parsers.DocumentBuilderFactory=com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderFactoryImpl This can also be

Re: FYI, Large-scale graph computing at Google

2009-06-25 Thread Steve Loughran
Edward J. Yoon wrote: What do you think about another new computation framework on HDFS? On Mon, Jun 22, 2009 at 3:50 PM, Edward J. Yoon edwardy...@apache.org wrote: http://googleresearch.blogspot.com/2009/06/large-scale-graph-computing-at-google.html -- It sounds like Pregel seems, a

What is the best way to use the Hadoop output data

2009-06-25 Thread Huy Phan
Hi everybody, I'm working on a hadoop project that processing the log files. In the reduce part, as usual, I store the output to HDFS, but I also want send those output data to the message queue using HTTP Post Request. I'm wondering if there's any performance killer in this approach, I posted the

Re: Rebalancing Hadoop Cluster running 15.3

2009-06-25 Thread Usman Waheed
Hi Tom, Thanks for the trick :). I tried by setting the replication to 3 in the hadoop-default.xml but then the namenode-logfile in /var/log/hadoop started getting full with the messages marked in bold: 2009-06-24 14:39:06,338 INFO org.apache.hadoop.dfs.StateChange: STATE*

Re: Rebalancing Hadoop Cluster running 15.3

2009-06-25 Thread Tom White
You can change the value of hadoop.root.logger in conf/log4j.properties to change the log level globally. See also the section Custom Logging levels in the same file to set levels on a per-component basis. You can also use hadoop daemonlog to set log levels on a temporary basis (they are reset on

Re: Rebalancing Hadoop Cluster running 15.3

2009-06-25 Thread Usman Waheed
Thanks much, Cheers, Usman You can change the value of hadoop.root.logger in conf/log4j.properties to change the log level globally. See also the section Custom Logging levels in the same file to set levels on a per-component basis. You can also use hadoop daemonlog to set log levels on a

Re: FYI, Large-scale graph computing at Google

2009-06-25 Thread mike anderson
This would be really useful for my current projects. I'd be more than happy to help out if needed. On Thu, Jun 25, 2009 at 5:57 AM, Steve Loughran ste...@apache.org wrote: Edward J. Yoon wrote: What do you think about another new computation framework on HDFS? On Mon, Jun 22, 2009 at 3:50

HDFS Safemode and EC2 EBS?

2009-06-25 Thread Chris Curtin
Hi, I am using 0.19.0 on EC2. The Hadoop execution and HDFS directories are on EBS volumes mounted to each node in my EC2 cluster. Only the install of hadoop is in the AMI. We have 10 EBS volumes and when the cluster starts it randomly picks one for each slave. We don't always start all 10 slaves

Re: HDFS Safemode and EC2 EBS?

2009-06-25 Thread Tom White
Hi Chris, You should really start all the slave nodes to be sure that you don't lose data. If you start fewer than #nodes - #replication + 1 nodes then you are virtually guaranteed to lose blocks. Starting 6 nodes out of 10 will cause the filesystem to remain in safe mode, as you've seen. BTW

Re: FYI, Large-scale graph computing at Google

2009-06-25 Thread Steve Loughran
mike anderson wrote: This would be really useful for my current projects. I'd be more than happy to help out if needed. well the first bit of code to play with then is this http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/extras/citerank/ the standalone.xml file is the one

Using addCacheArchive

2009-06-25 Thread akhil1988
Hi All! I want a directory to be present in the local working directory of the task for which I am using the following statements: DistributedCache.addCacheArchive(new URI(/home/akhil1988/Config.zip), conf); DistributedCache.createSymlink(conf); Here Config is a directory which I have zipped

Re: FYI, Large-scale graph computing at Google

2009-06-25 Thread Amandeep Khurana
I've been working on some graph stuff using MR as well. I'd be more than interested to chip in as well.. I remember exchanging a few mails with Paolo about having an RDF store over HBase and developing graph algorithms over it. Amandeep Khurana Computer Science Graduate Student University of

map.input.file in hadoop0.20

2009-06-25 Thread Amandeep Khurana
How do I read the map.input.file parameter in the mapper class in hadoop 0.20. In earlier versions, this would work: public void configure(JobConf conf) { filename = conf.get(map.input.file); } What about 0.20? Amandeep Amandeep Khurana Computer Science Graduate Student

Re: Using addCacheArchive

2009-06-25 Thread akhil1988
Please ask any questions if I am not clear above about the problem I am facing. Thanks, Akhil akhil1988 wrote: Hi All! I want a directory to be present in the local working directory of the task for which I am using the following statements: DistributedCache.addCacheArchive(new

Re: THIS WEEK: PNW Hadoop / Apache Cloud Stack Users' Meeting, Wed Jun 24th, Seattle

2009-06-25 Thread Bradford Stephens
Hey all, Just writing a quick note of thanks, we had another solid group of people show up! As always, we learned quite a lot about interesting use cases for Hadoop, Lucene, and the rest of the Apache 'Cloud Stack'. I couldn't get it taped, but we talked about: -Scaling Lucene with Katta and

Re: FYI, Large-scale graph computing at Google

2009-06-25 Thread Edward J. Yoon
To be honest, I was thought the BigTable (HBase) for the map/reduce based graph/matrix operations. The main problems of performance were the sequential algorithm, the cost for MR job building in iterations. and, the locality of adjacent components. As mentioned on Pregel, If some algorithm

grahical tool for hadoop mapreduce

2009-06-25 Thread Manhee Jo
Hi, Do you know any graphical tools to show the progress of mapreduce using the job log under logs/history/ ? The web interface (namenode:50030) gives me similar one. But what I need is more specific ones that show the number of total running map tasks and reduce tasks at some points of time,

'could not lock file' error.

2009-06-25 Thread Edward J. Yoon
Hi, I always get the 'could not lock file' error when editing/creating pages - Page could not get locked. Missing 'current' file? My ID is 'udanax'. Someone can help me? -- Best Regards, Edward J. Yoon @ NHN, corp. edwardy...@apache.org http://blog.udanax.org

Re: Using addCacheArchive

2009-06-25 Thread Amareshwari Sriramadasu
Hi Akhil, DistributedCache.addCacheArchive takes path on hdfs. From your code, it looks like you are passing local path. Also, if you want to create symlink, you should pass URI as hdfs://path#linkname, besides calling DistributedCache.createSymlink(conf); Thanks Amareshwari akhil1988

Re: Using addCacheArchive

2009-06-25 Thread akhil1988
Thanks Amareshwari for your reply! The file Config.zip is lying in the HDFS, if it would not have been then the error would be reported by the jobtracker itself while executing the statement: DistributedCache.addCacheArchive(new URI(/home/akhil1988/Config.zip), conf); But I get error in the map

Re: Using addCacheArchive

2009-06-25 Thread Amareshwari Sriramadasu
Is your hdfs path /home/akhil1988/Config.zip? Usually hdfs path is of the form /user/akhil1988/Config.zip. Just wondering if you are giving wrong path in the uri! Thanks Amareshwari akhil1988 wrote: Thanks Amareshwari for your reply! The file Config.zip is lying in the HDFS, if it would not

Re: Using addCacheArchive

2009-06-25 Thread akhil1988
Yes, my HDFS paths are of the form /home/user-name/ And I have used these in DistributedCache's addCacheFiles method successfully. Thanks, Akhil Amareshwari Sriramadasu wrote: Is your hdfs path /home/akhil1988/Config.zip? Usually hdfs path is of the form /user/akhil1988/Config.zip. Just

Pregel

2009-06-25 Thread Mark Kerzner
Hi all, my guess, as good as anybody's, is that Pregel is to large graphs is what Hadoop is to large datasets. In other words, Pregel is the next natural step for massively scalable computations after Hadoop. And, as with MapReduce, Google will talk about the technology but not give out the code

Re: hadoop lucene integration

2009-06-25 Thread Nick Cen
there is sample index code under the contrib directory, maybe you can take a see. 2009/6/26 m.harig m.ha...@gmail.com hi all I've work experience with lucene , but am new to hadoop , i created a index by lucene , please any1 tell me how to use hadoop for my lucene index for

PIG and Hadoop

2009-06-25 Thread krishna prasanna
Hi, Here is my scenario 1. Having Cluster of 3 machines, 2. Have a Jar file with includes PIG.jar. How can i run a Jar  (instead of PIG Script file) in Hadoop mode ? for running script file in hadoop mode, java -cp $PIGDIR/pig.jar:$HADOOPSITEPATH org.apache.pig.Main script1-hadoop.pig any