Re: Appending to HDFS file

2014-08-27 Thread rab ra
hello Here is d code snippet, I use to append def outFile = ${outputFile}.txt Path pt = new Path(${hdfsName}/${dir}/${outFile}) def fs = org.apache.hadoop.fs.FileSystem.get(configuration); FSDataOutputStream fp = fs.create(pt, true) fp ${key} ${value}\n On 27 Aug 2014 09:46, Stanley Shi

AW: Running job issues

2014-08-27 Thread Blanca Hernandez
Hi, thanks for your answers. Sorry, I forgot to add it, I couldn´t run the command neither: C:\development\tools\hadoop%HADOOP_PREFIX%\bin\hdfs dfs -format -format: Unknown command C:\development\tools\hadoopecho %HADOOP_PREFIX% C:\development\tools\hadoop By using –help command there is no

Re: Missing Snapshots for 2.5.0

2014-08-27 Thread Karthik Kambatla
There was an issue with the infrastructure. It is now fixed and the 2.5.0 artifacts are available. Mark - can you please retry now. Thanks Karthik On Tue, Aug 26, 2014 at 6:54 AM, Karthik Kambatla ka...@cloudera.com wrote: Thanks for reporting this, Mark. It appears the artifacts are

Re: Running job issues

2014-08-27 Thread Susheel Kumar Gadalay
You have to use this command to format hdfs namenode –format not hdfs dfs -format On 8/27/14, Blanca Hernandez blanca.hernan...@willhaben.at wrote: Hi, thanks for your answers. Sorry, I forgot to add it, I couldn´t run the command neither:

total number of map tasks

2014-08-27 Thread Stijn De Weirdt
hi all, we are tuning yarn (or trying to) on our environment (shared fielsystem, no hdfs) using terasort and one of the main issue we are seeing is that an avg map task takes 15sec. some tuning guides and websites suggest that ideally map tasks run between 40sec to 1 or 2 minutes.

What happens when .....?

2014-08-27 Thread Kandoi, Nikhil
Hi All, I have a system where files are coming in hdfs at regular intervals and I perform an operation everytime the directory size goes above a particular point. My Question is that when I submit a map reduce job, would it only work on the files present at that point ?? Regards, Nikhil Kandoi

NoSuchElementException while running local MapReduce-Job on FreeBSD

2014-08-27 Thread Malte Maltesmann
Hi all, I tried to run a MapReduce-Job on my tow node FreeBSD-Cluster with Hadoop 2.4.1 and HBase 0.98.4 and ran into below Exception. I then tried the example provided here http://gerrymcnicol.azurewebsites.net/index.php/2014/01/02/hadoop-and-cassandra-part-4-writing-your-first-mapreduce-job/,

Hadoop and Eclipse configuration

2014-08-27 Thread YIMEN YIMGA Gael
Hello Dears hadoopers :) , I'm currently configure Eclipse to work with Hadoop. I set up a Single node cluster (all the hadoop services are functioning on the node). That server is different to my computer where Eclipse is installed. When I tried to create a new project in Eclipse, the

Re: total number of map tasks

2014-08-27 Thread Chris MacKenzie
It's my understanding that you don't get map tasks as such but containers. My experience is with version 2 + And if that's true containers are based on memory tuning in mapred-site.xml Otherwise I'd love to learn more. Sent from my iPhone On 27 Aug 2014, at 12:14, Stijn De Weirdt

Re: total number of map tasks

2014-08-27 Thread Stijn De Weirdt
hi all, someone PM'ed me suggesting i'd take a look in the input split setting, and indeed, the splitsize is determining the number of tasks stijn On 08/27/2014 06:23 PM, Chris MacKenzie wrote: It's my understanding that you don't get map tasks as such but containers. My experience is

Need some tutorials for Mapreduce written in Python

2014-08-27 Thread Amar Singh
Hi Users, I am new to big data world and was in process of reading some material of writing mapreduce using Python. Any links or pointers in that direction will be really helpful.

Re: Need some tutorials for Mapreduce written in Python

2014-08-27 Thread Sebastiano Di Paola
Hi there, In order to use Python to write mapreduce jobs you need to use hadoop streaming api. So I will suggest start searching for it. (here's a link although is for hadoop 1.x http://hadoop.apache.org/docs/r1.2.1/streaming.html ) but it's a starting point. With streaming API you can use

Re: Need some tutorials for Mapreduce written in Python

2014-08-27 Thread Amund Tveit
Here is one: Parallel Machine Learning for Hadoop/Mapreduce - A Python Example http://atbrox.com/2010/02/08/parallel-machine-learning-for-hadoopmapreduce-a-python-example/ (disclaimer: I wrote it) Best regards, Amund Tveit 2014-08-27 20:13 GMT+02:00 Amar Singh amarsingh...@gmail.com: Hi

RE: Missing Snapshots for 2.5.0

2014-08-27 Thread Campbell, Mark
It finds the sources now, thank you. Just need to figure out the rest of the errors on the dist build at the end :p Many thanks everyone. Cheers, Mark From: Karthik Kambatla [mailto:ka...@cloudera.com] Sent: Wednesday, August 27, 2014 3:51 AM To: user@hadoop.apache.org;

Re: Running job issues

2014-08-27 Thread Arpit Agarwal
Susheel is right. I've fixed the typo on the wiki page. On Wed, Aug 27, 2014 at 12:28 AM, Susheel Kumar Gadalay skgada...@gmail.com wrote: You have to use this command to format hdfs namenode –format not hdfs dfs -format On 8/27/14, Blanca Hernandez blanca.hernan...@willhaben.at wrote:

Re: Local file system to access hdfs blocks

2014-08-27 Thread Demai Ni
Hi, Stanley, Many thanks. Your method works. For now, I can have two steps approach: 1) getFileBlockLocations to grab hdfs BlockLocation[] 2) use local file system call(like find command) to match the block to files on local file system . Maybe there is an existing Hadoop API to return such

Re: Local file system to access hdfs blocks

2014-08-27 Thread Yehia Elshater
Hi Demai, You can use fsck utility like the following: hadoop fsck /path/to/your/hdfs/file -files -blocks -locations -racks This will display all the information you need about the blocks of your file. Hope it helps. Yehia On 27 August 2014 20:18, Demai Ni nid...@gmail.com wrote: Hi,

Re: Need some tutorials for Mapreduce written in Python

2014-08-27 Thread Marco Shaw
You might want to consider the Hadoop course on udacity.com. I think it provides a decent foundation to Hadoop/MapReduce with a focus on Python (using the streaming API like Sebastiano mentions). Marco On Wed, Aug 27, 2014 at 3:13 PM, Amar Singh amarsingh...@gmail.com wrote: Hi Users, I am

Re: Local file system to access hdfs blocks

2014-08-27 Thread Yehia Elshater
Hi Demai, Sorry, I missed that you are already tried this out. I think you can construct the block location on the local file system if you have the block pool id and the block id. If you are using cloudera distribution, the default location is under /dfs/dn ( the value of dfs.data.dir,

Re: Local file system to access hdfs blocks

2014-08-27 Thread Demai Ni
Yehia, No problem at all. I really appreciate your willingness to help. Yeah. now I am able to get such information through two steps, and the first step will be either hadoop fsck or getFileBlockLocations(). and then search the local filesystem, my cluster is using the default from CDH, which is

Re: Need some tutorials for Mapreduce written in Python

2014-08-27 Thread thejas prasad
Are any books for this as well? On Wed, Aug 27, 2014 at 8:30 PM, Marco Shaw marco.s...@gmail.com wrote: You might want to consider the Hadoop course on udacity.com. I think it provides a decent foundation to Hadoop/MapReduce with a focus on Python (using the streaming API like Sebastiano

Re: Need some tutorials for Mapreduce written in Python

2014-08-27 Thread Sriram Balachander
Hadoop The Definitive Guide, Hadoop in action are good books and the course in edureka is also good. Regards Sriram On Wed, Aug 27, 2014 at 9:25 PM, thejas prasad thejch...@gmail.com wrote: Are any books for this as well? On Wed, Aug 27, 2014 at 8:30 PM, Marco Shaw marco.s...@gmail.com

New to hadoop/java. How do I write map reduce programs in Cloudera VM? I can't import org.Apache.* like in tutorials.

2014-08-27 Thread mani kandan
Thanks, Mani