Re: Ubuntu 12.04 - Which JDK?

2012-11-07 Thread Alexander Lorenz
Recommend JDK 1.6.3x, 1.7 has serval issues within NIO and I wouldn't use them in production, OpenJDK has sometimes a odd behavior and IBM's JDK - no. cheers, Alex On Nov 8, 2012, at 7:17 AM, Sanjeev Verma wrote: > AFAIK, openjdk can be used to run hadoop. Why do you want to build hadoop > f

Re: Ubuntu 12.04 - Which JDK?

2012-11-07 Thread Sanjeev Verma
AFAIK, openjdk can be used to run hadoop. Why do you want to build hadoop from source? Get precompiled binaries and you will be ok. Also, you can install sun/oracle jdk on ubuntu. Just google for instructions, u will find plenty, like here - http://www.ubuntututorials.com/install-oracle-java-jdk-7

Re: Ubuntu 12.04 - Which JDK?

2012-11-07 Thread Harsh J
I don't think OpenJDK 7 has been as extensively tested as OpenJDK 6 for Hadoop. I'd recommend staying on 1.6-based JVMs if you have a prod goal for this cluster, but otherwise, you could give OpenJDK 7 a try and let us know. On Thu, Nov 8, 2012 at 11:30 AM, a...@hsk.hk wrote: > Hi, > > I am plann

Ubuntu 12.04 - Which JDK?

2012-11-07 Thread a...@hsk.hk
Hi, I am planning to use Ubuntu 12.04, from http://wiki.apache.org/hadoop/HadoopJavaVersions, about OpenJDK "Note*: OpenJDK6 has some open bugs w.r.t handling of generics... so OpenJDK cannot be used to compile hadoop mapreduce code in branch-0.23 and beyond, please use other JDKs." Is it OK

Re: Doubt on Input and Output Mapper - Key value pairs

2012-11-07 Thread Mahesh Balija
Hi Rams, A mapper will accept single key-value pair as input and can emit 0 or more key-value pairs based on what you want to do in mapper function (I mean based on your business logic in mapper function). But the framework will actually aggregate the list of values associate

Re: Map-Reduce V/S Hadoop Ecosystem

2012-11-07 Thread Russell Jurney
Hourly consultants may prefer MapReduce. Everyone else should be using Pig, Hive, Cascading, etc. Russell Jurney twitter.com/rjurney On Nov 7, 2012, at 8:08 PM, yogesh dhari wrote: Thanks Bejoy Sir, I am always grateful to u for your help. Please explain these word into simple language with

RE: Sticky Bit Problem (CDH4.1)

2012-11-07 Thread Kartashov, Andy
Have you tried hadoop fs -chmod a+rwx /tmp From: Arun C Murthy [mailto:a...@hortonworks.com] Sent: Wednesday, November 07, 2012 3:11 PM To: user@hadoop.apache.org Subject: Re: Sticky Bit Problem (CDH4.1) Pls ask Cloudera lists... On Nov 7, 2012, at 9:57 AM, Brian Derickson wrote: Hey all, Wh

Re: Sticky Bit Problem (CDH4.1)

2012-11-07 Thread Arun C Murthy
Pls ask Cloudera lists... On Nov 7, 2012, at 9:57 AM, Brian Derickson wrote: > Hey all, > > When setting up the namenode, some of the commands that we run are: > hadoop fs -mkdir /tmp > hadoop fs -chmod -R 1777 /tmp > > This has worked for previous CDH releases of Hadoop. > > We recently upgra

Re: Map-Reduce V/S Hadoop Ecosystem

2012-11-07 Thread Bejoy KS
Hi Yogesh Pretty much all the requirements fit well into hive, pig etc. The HivQL and pig latin are parsed by its respective parsers to map reduce jobs. This MR code thus generated is generic and is totally based on some rules defined in the parser. But say your requirement has something more

Re: same edits file is loaded more than once

2012-11-07 Thread Colin McCabe
Hi, If you want to learn more about HA in HDFS, here are some slides from a talk that Aaron T. Meyers and Suresh Srinivas gave: http://www.slideshare.net/hortonworks/nn-ha-hadoop-worldfinal-10173419 branch-2 and later contain HDFS HA. cheers, Colin On Sun, Nov 4, 2012 at 1:06 AM, lei liu wrot

RE: Map-Reduce V/S Hadoop Ecosystem

2012-11-07 Thread yogesh dhari
Thanks Bejoy Sir, I am always grateful to u for your help. Please explain these word into simple language with some case (if possible) " If your requirement is that complex and you need very low level control of your code mapreduce is better. If you are an expert in mapreduce your code can be

Re: problem with hadoop-snappy

2012-11-07 Thread Colin McCabe
I think what you're hitting is probably https://issues.apache.org/jira/browse/HADOOP-8756. It was fixed in branch-2 and branch-3, but not in the old releases. If you want to work around the problem, try explicitly setting LD_LIBRARY_PATH to include the directory that contains snappy. cheers, Col

Re: Sticky Bit Problem (CDH4.1)

2012-11-07 Thread Harsh J
Hi Brian, I am not seeing this trouble on a similar version locally: harsh@~]$ hadoop fs -ls drwxr-xr-x - harsh harsh 0 2012-09-28 04:21 outfoo [harsh@~]$ hadoop fs -chmod 1777 outfoo [harsh@~]$ hadoop fs -chmod -R 1777 outfoo [harsh@~]$ hadoop fs -chmod -R +t outfoo [harsh@~]$ What i

Re: Please help on providing correct answers

2012-11-07 Thread Ramasubramanian Narayanan
Hi, Have given my explanation for choosing and why I am saying given answer is wrong... You are running a job that will process a single InputSplit on a cluster which has no other jobs currently running. Each node has an equal number of open Map slots. On which node will Hadoop first attempt to r

Re: Please help on providing correct answers

2012-11-07 Thread Michael Segel
Sorry, I think I had better explain why I am curious... First, there are a couple of sites that have study questions to help pass Cloudera's certification. ( I don't know if Hortonworks has cert tests, but both MapR and Cloudera do.) Its just looking first at the questions... not really good

Re: Map-Reduce V/S Hadoop Ecosystem

2012-11-07 Thread Bejoy KS
Hi Yogesh, The development time in Pig and hive are pretty less compared to its equivalent mapreduce code and for generic cases it is very efficient. If your requirement is that complex and you need very low level control of your code mapreduce is better. If you are an expert in mapreduce your

Re: Please help on providing correct answers

2012-11-07 Thread Harsh J
Hi, I'd instead like you to explain why you think someone's proposed answer (who?) is wrong and why yours is correct. You learn more that way than us head nodding/shaking to things you ask. On Wed, Nov 7, 2012 at 10:51 PM, Ramasubramanian Narayanan wrote: > Hi, > >I came across the following

Sticky Bit Problem (CDH4.1)

2012-11-07 Thread Brian Derickson
Hey all, When setting up the namenode, some of the commands that we run are: hadoop fs -mkdir /tmp hadoop fs -chmod -R 1777 /tmp This has worked for previous CDH releases of Hadoop. We recently upgraded our test cluster to CDH 4.1 and the chmod no longer works. sudo -u hdfs hadoop fs -chmod -R

Re: Question related to Number of Mapper

2012-11-07 Thread Michael Segel
The larger question is how many blocks are required to store a 100MB file if the HDFS block size is 64MB. If it takes 2 blocks then when you run your job, you will have 1 mapper per block, unless the file is not splittable. (But from your example its a simple text file which is splittable.)

Re: Please help on providing correct answers

2012-11-07 Thread Ramasubramanian Narayanan
nothing as consolidated.. I am collecting for the past 1 month... few as printout and few from mails and few from googling and few from sites and few from some of my friends... regards, Rams On Wed, Nov 7, 2012 at 10:57 PM, Michael Segel wrote: > Ok... > Where are you pulling these questions

Re: Please help on providing correct answers

2012-11-07 Thread Michael Segel
Ok... Where are you pulling these questions from? Seriously. On Nov 7, 2012, at 11:21 AM, Ramasubramanian Narayanan wrote: > Hi, > >I came across the following question in some sites and the answer that > they provided seems to be wrong according to me... I might be wrong... Can > s

Re: fsck only working on namenode

2012-11-07 Thread Harsh J
While your problem is interesting, you need not use FSCK to get block IDs of a file, as thats not the right way to fetch it (its a rather long, should-be-disallowed route). You can leverage the FileSystem API itself to do that. See FileSystem#getFileBlockLocations(…), i.e. http://hadoop.apache.org/

fsck only working on namenode

2012-11-07 Thread Sebastian.Lehrack
Hi, I've installed hadoop 1.0.3 on a cluster of about 25 nodes and till now, it's working fine. Recently, i had to use fsck in a map-process, which leads to a connection refused error. I read about this error, that i should check about firewalls and proper configfiles etc. The command is only work

Re: Question related to Number of Mapper

2012-11-07 Thread Ramasubramanian Narayanan
Hi, Thanks! But it is given as 100 Mappers... I think we can also use 'n' number of Mappers as the same as the number of input files... (not for this question)... If you know more detail on that please share.. Note : I forgot from where this question I taken :) regards, Rams. On Wed, Nov 7, 20

Re: Regarding MapReduce Input Format

2012-11-07 Thread Harsh J
You are correct. (D) automatically does (B). On Wed, Nov 7, 2012 at 9:41 PM, Ramasubramanian Narayanan wrote: > Hi, > > I came across the below question and I feel 'D' is the correct answer but in > some site it is mentioned that 'B' is the correct answer... Can you please > tell which is the rig

Re: Question related to Number of Mapper

2012-11-07 Thread Michael Segel
0 Custer didn't run. He got surrounded and then massacred. :-P (See Custer's last stand at Little Big Horn) Ok... plain text files 100 files 2 blocks each would by default attempt to schedule 200 mappers. Is this one of those online Cert questions? On Nov 7, 2012, at 10:20 AM, Ramasubraman

Re: Can we use Pentaho Job in oozie?

2012-11-07 Thread Harsh J
Hi, Oozie has its own list now. Moving to Oozie's user lists (u...@oozie.apache.org). I am not very sure what a Pentaho Job is, but Oozie does provide sufficient developer-extensions (such as custom actions and custom EL functions, aside of the simple Java Action), to allow integration with other

RE: Map-Reduce V/S Hadoop Ecosystem

2012-11-07 Thread Kartashov, Andy
The way I understand it... Hadoop is a distributed file system that allows you to create folders in its own NameSpace, copy files to and from your local Linux FS. Set-up Hadoop configuration for local|pseudo-distributed|fully-distributed cluster. You write your jobs using MapReduce API and ex

Can we use Pentaho Job in oozie?

2012-11-07 Thread Ramasubramanian Narayanan
Hi, Can we use Pentaho Jobs in oozie? regards, Rams

Map-Reduce V/S Hadoop Ecosystem

2012-11-07 Thread yogesh.kumar13
Hello Hadoop Champs, Please give some suggestion.. As Hadoop Ecosystem(Hive, Pig...) internally do Map-Reduce to process. My Question is 1). where Map-Reduce program(written in Java, python etc) are overtaking Hadoop Ecosystem. 2). Limitations of Hadoop Ecosystem comparing with Writing Map-Re

Re: Spill file compression

2012-11-07 Thread Sigurd Spieckermann
When I log the calls of the combiner function and print the number of elements iterated over, it is all 1 during the spill-writing phase and the combiner is called very often. Is this normal behavior? According to what mentioned earlier, I would expect the combiner to combine all records with the s

Re: Regarding loading Image file into HDFS

2012-11-07 Thread Harsh J
Hi, Blocks are split at arbitrary block size boundaries. Readers can read the whole file by reading all blocks together (this is transparently handled by the underlying DFS reader classes itself, a developer does not have to care about it). HDFS does not care about what _type_ of file you store,

Regarding loading Image file into HDFS

2012-11-07 Thread Ramasubramanian Narayanan
Hi, I have basic doubt... How Hadoop splits an Image file into blocks and puts in HDFS? Usually Image file cannot be splitted right how it is happening in Hadoop? regards, Rams

Re: warning message

2012-11-07 Thread Visioner Sadak
u getting those in hadoop community mailsI dunt see any such mails may be u r office mail host provider blocks those links or attachments On Wed, Nov 7, 2012 at 8:03 PM, Kartashov, Andy wrote: > Guys, > > > > Sometimes I get an occasional e-mail saying at the top: > > > > “This might be a ph

Re: config ... answer for Visioner Sadak

2012-11-07 Thread Visioner Sadak
Hey thnks Andy.prob solved. On Wed, Nov 7, 2012 at 8:09 PM, Kartashov, Andy wrote: > Sadak, > > > > Sorry, could not answer your original e-mail as it was blocked. > > > > Are you running SNN on a separate node? > > > > If so, it needs to communicate with NN. > > > > Add this property to

Re: Hadoop configs

2012-11-07 Thread Visioner Sadak
got it the culprit was apaches website frm whr i copied the secnamenodeaddrs http://hadoop.apache.org/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/Federation.html#Configuration here secondary.http-address.ns1 shud be used instead of secondaryhttp-address.ns1 dfs.namenode.secondaryhttp-address

config ... answer for Visioner Sadak

2012-11-07 Thread Kartashov, Andy
Sadak, Sorry, could not answer your original e-mail as it was blocked. Are you running SNN on a separate node? If so, it needs to communicate with NN. Add this property to your hdfs-site.xml dfs.namenode.http-address :50070 Needed for running SNN The address

warning message

2012-11-07 Thread Kartashov, Andy
Guys, Sometimes I get an occasional e-mail saying at the top: "This might be a phishing e-mail and is potentially unsafe. Links and other functionality have been disabled" Is this because of the posted links? Rgds, AK NOTICE: This e-mail message and any attachments are confidential, subje

RE: Doubts on compressed file

2012-11-07 Thread Jim Neofotistos
Gzip is decently fast, but cannot take advantage of Hadoop's natural map splits because it's impossible to start decompressing a gzip stream starting at a random offset in the file.  LZO is a wonderful compression scheme to use with Hadoop because it's incredibly fast, and (with a bit of work) it'

Re: Sample questions for taking Cloudera CDH3 exam for Hadoop developer

2012-11-07 Thread Marco Shaw
Hi, I don't think there are really any legitimate/legal sites to get sample questions... Cloudera does have a promo currently: two chances to pass before the end of the year. I'm going to try to write the v4 admin exams in the next few weeks, if I fail, I will study another 2-3 weeks and try my

Re: Hadoop configs

2012-11-07 Thread Visioner Sadak
I tried configuring secondary namenode but getting this error when i start it Exception in thread "main" java.lang.IllegalArgumentException: Target address cannot be null. any hints.. On Wed, Nov 7, 2012 at 1:47 PM, Visioner Sadak wrote: > I have configured a cluster setup of hadoop,shud i creat

Re: Sample questions for taking Cloudera CDH3 exam for Hadoop developer

2012-11-07 Thread Marco Shaw
Keep in mind that v3 exams will be retired at the end of this year... http://university.cloudera.com/certification.html On Wed, Nov 7, 2012 at 8:50 AM, Ramasubramanian Narayanan < ramasubramanian.naraya...@gmail.com> wrote: > Hi, > > Can anyone suggest sample model questions for taking the Cloud

Re: Spill file compression

2012-11-07 Thread Sigurd Spieckermann
OK, I found the answer to one of my questions just now -- the location of the spill files and their sizes. So, there's a discrepancy between what I see and what you said about the compression. The total size of all spill files of a single task matches with what I estimate for them to be *without* c

RE: Sample questions for taking Cloudera CDH3 exam for Hadoop developer

2012-11-07 Thread Jim Neofotistos
www.crinlogic.com/hadooptest.html both admin & developer   James Neofotistos Senior Sales ConsultantEmerging Markets EastPhone: 1-781-565-1890| Mobile: 1-603-759-7889Email:jim.neofotis...@oracle.com  From: Ramasubramanian Narayanan [mailto:ramasubramanian.naraya...@gmail.com] Sent: Wednesday, N

Re: Spill file compression

2012-11-07 Thread Sigurd Spieckermann
OK, just wanted to confirm. Maybe there is another problem then. I just looked at the task logs and there were ~200 spills recorded for a single task, only afterwards there was a merge phase. In my case, 200 spills are about 2GB (uncompressed). One map output record easily fits into the in-memory b

Re: Doubts on compressed file

2012-11-07 Thread Niels Basjes
Hi, > If a zip file(Gzip) is loaded into HDFS will it get splitted into Blocks and > store in HDFS? Yes. > I understand that a single mapper can work with GZip as it reads the entire > file from beginning to end... In that case if the GZip file size is larget > than 128 MB will it get splitted i

Re: Doubts on compressed file

2012-11-07 Thread Harsh J
Hi, Yes all files are split into block-size chunks in HDFS. HDFS is agnostic about what the file's content is, and its attributes (such as compression, etc.). This is left to the file reader logic to handle. When a GZip reader initializes, it reads the whole file length, across all the blocks the

Re: Doubt on Input and Output Mapper - Key value pairs

2012-11-07 Thread Harsh J
The answer (a) is correct, in general. On Wed, Nov 7, 2012 at 6:09 PM, Ramasubramanian Narayanan wrote: > Hi, > > Which of the following is correct w.r.t mapper. > > (a) It accepts a single key-value pair as input and can emit any number of > key-value pairs as output, including zero. > (b) It ac

Re: Spill file compression

2012-11-07 Thread Harsh J
Yes we do compress each spill output using the same codec as specified for map (intermediate) output compression. However, the counted bytes may be counting decompressed values of the records written, and not post-compressed ones. On Wed, Nov 7, 2012 at 6:02 PM, Sigurd Spieckermann wrote: > Hi gu

Doubt on Input and Output Mapper - Key value pairs

2012-11-07 Thread Ramasubramanian Narayanan
Hi, Which of the following is correct w.r.t mapper. (a) It accepts a single key-value pair as input and can emit any number of key-value pairs as output, including zero. (b) It accepts a single key-value pair as input and emits a single key and list of corresponding values as output regards, Ra

Re: One mapper/reducer runs on a single JVM

2012-11-07 Thread Lin Ma
Thanks Mike. 1. So I think you mean for Hadoop, since it is batch job latency is not the most key concern, so time spent on swap is acceptable. But for HBase, the normal use case is on-demand and semi-real time query, so we need to avoid the memory swap to impact latency? 2. Supposing I have 4 map

Doubts on compressed file

2012-11-07 Thread Ramasubramanian Narayanan
Hi, If a zip file(Gzip) is loaded into HDFS will it get splitted into Blocks and store in HDFS? I understand that a single mapper can work with GZip as it reads the entire file from beginning to end... In that case if the GZip file size is larget than 128 MB will it get splitted into blocks and s

Re: Hadoop configs

2012-11-07 Thread Visioner Sadak
also is checkpoint node and secondary name node the same On Wed, Nov 7, 2012 at 1:47 PM, Visioner Sadak wrote: > I have configured a cluster setup of hadoop,shud i create a directory for > secondary namenode as well if i need one hw to mention tht in core-site.xml > ,is tmp directory needed in co