Recommend JDK 1.6.3x, 1.7 has serval issues within NIO and I wouldn't use them
in production, OpenJDK has sometimes a odd behavior and IBM's JDK - no.
cheers,
Alex
On Nov 8, 2012, at 7:17 AM, Sanjeev Verma wrote:
> AFAIK, openjdk can be used to run hadoop. Why do you want to build hadoop
> f
AFAIK, openjdk can be used to run hadoop. Why do you want to build hadoop
from source? Get precompiled binaries and you will be ok.
Also, you can install sun/oracle jdk on ubuntu. Just google for
instructions, u will find plenty, like here -
http://www.ubuntututorials.com/install-oracle-java-jdk-7
I don't think OpenJDK 7 has been as extensively tested as OpenJDK 6
for Hadoop. I'd recommend staying on 1.6-based JVMs if you have a prod
goal for this cluster, but otherwise, you could give OpenJDK 7 a try
and let us know.
On Thu, Nov 8, 2012 at 11:30 AM, a...@hsk.hk wrote:
> Hi,
>
> I am plann
Hi,
I am planning to use Ubuntu 12.04, from
http://wiki.apache.org/hadoop/HadoopJavaVersions, about OpenJDK
"Note*: OpenJDK6 has some open bugs w.r.t handling of generics... so OpenJDK
cannot be used to compile hadoop mapreduce code in branch-0.23 and beyond,
please use other JDKs."
Is it OK
Hi Rams,
A mapper will accept single key-value pair as input and can emit
0 or more key-value pairs based on what you want to do in mapper function
(I mean based on your business logic in mapper function).
But the framework will actually aggregate the list of values
associate
Hourly consultants may prefer MapReduce. Everyone else should be using Pig,
Hive, Cascading, etc.
Russell Jurney twitter.com/rjurney
On Nov 7, 2012, at 8:08 PM, yogesh dhari wrote:
Thanks Bejoy Sir,
I am always grateful to u for your help.
Please explain these word into simple language with
Have you tried hadoop fs -chmod a+rwx /tmp
From: Arun C Murthy [mailto:a...@hortonworks.com]
Sent: Wednesday, November 07, 2012 3:11 PM
To: user@hadoop.apache.org
Subject: Re: Sticky Bit Problem (CDH4.1)
Pls ask Cloudera lists...
On Nov 7, 2012, at 9:57 AM, Brian Derickson wrote:
Hey all,
Wh
Pls ask Cloudera lists...
On Nov 7, 2012, at 9:57 AM, Brian Derickson wrote:
> Hey all,
>
> When setting up the namenode, some of the commands that we run are:
> hadoop fs -mkdir /tmp
> hadoop fs -chmod -R 1777 /tmp
>
> This has worked for previous CDH releases of Hadoop.
>
> We recently upgra
Hi Yogesh
Pretty much all the requirements fit well into hive, pig etc. The HivQL and pig
latin are parsed by its respective parsers to map reduce jobs. This MR code
thus generated is generic and is totally based on some rules defined in the
parser.
But say your requirement has something more
Hi,
If you want to learn more about HA in HDFS, here are some slides from
a talk that Aaron T. Meyers and Suresh Srinivas gave:
http://www.slideshare.net/hortonworks/nn-ha-hadoop-worldfinal-10173419
branch-2 and later contain HDFS HA.
cheers,
Colin
On Sun, Nov 4, 2012 at 1:06 AM, lei liu wrot
Thanks Bejoy Sir,
I am always grateful to u for your help.
Please explain these word into simple language with some case (if possible)
" If your requirement is that complex and you need very low level control
of your code mapreduce is better. If you are an expert in mapreduce your
code can be
I think what you're hitting is probably
https://issues.apache.org/jira/browse/HADOOP-8756. It was fixed in
branch-2 and branch-3, but not in the old releases.
If you want to work around the problem, try explicitly setting
LD_LIBRARY_PATH to include the directory that contains snappy.
cheers,
Col
Hi Brian,
I am not seeing this trouble on a similar version locally:
harsh@~]$ hadoop fs -ls
drwxr-xr-x - harsh harsh 0 2012-09-28 04:21 outfoo
[harsh@~]$ hadoop fs -chmod 1777 outfoo
[harsh@~]$ hadoop fs -chmod -R 1777 outfoo
[harsh@~]$ hadoop fs -chmod -R +t outfoo
[harsh@~]$
What i
Hi,
Have given my explanation for choosing and why I am saying given answer is
wrong...
You are running a job that will process a single InputSplit on a cluster
which has no other jobs
currently running. Each node has an equal number of open Map slots. On
which node will Hadoop
first attempt to r
Sorry, I think I had better explain why I am curious...
First, there are a couple of sites that have study questions to help pass
Cloudera's certification.
( I don't know if Hortonworks has cert tests, but both MapR and Cloudera do.)
Its just looking first at the questions... not really good
Hi Yogesh,
The development time in Pig and hive are pretty less compared to its equivalent
mapreduce code and for generic cases it is very efficient.
If your requirement is that complex and you need very low level control of your
code mapreduce is better. If you are an expert in mapreduce your
Hi,
I'd instead like you to explain why you think someone's proposed
answer (who?) is wrong and why yours is correct. You learn more that
way than us head nodding/shaking to things you ask.
On Wed, Nov 7, 2012 at 10:51 PM, Ramasubramanian Narayanan
wrote:
> Hi,
>
>I came across the following
Hey all,
When setting up the namenode, some of the commands that we run are:
hadoop fs -mkdir /tmp
hadoop fs -chmod -R 1777 /tmp
This has worked for previous CDH releases of Hadoop.
We recently upgraded our test cluster to CDH 4.1 and the chmod no longer
works.
sudo -u hdfs hadoop fs -chmod -R
The larger question is how many blocks are required to store a 100MB file if
the HDFS block size is 64MB.
If it takes 2 blocks then when you run your job, you will have 1 mapper per
block, unless the file is not splittable. (But from your example its a simple
text file which is splittable.)
nothing as consolidated.. I am collecting for the past 1 month... few
as printout and few from mails and few from googling and few from sites and
few from some of my friends...
regards,
Rams
On Wed, Nov 7, 2012 at 10:57 PM, Michael Segel wrote:
> Ok...
> Where are you pulling these questions
Ok...
Where are you pulling these questions from?
Seriously.
On Nov 7, 2012, at 11:21 AM, Ramasubramanian Narayanan
wrote:
> Hi,
>
>I came across the following question in some sites and the answer that
> they provided seems to be wrong according to me... I might be wrong... Can
> s
While your problem is interesting, you need not use FSCK to get block
IDs of a file, as thats not the right way to fetch it (its a rather
long, should-be-disallowed route). You can leverage the FileSystem API
itself to do that. See FileSystem#getFileBlockLocations(…), i.e.
http://hadoop.apache.org/
Hi,
I've installed hadoop 1.0.3 on a cluster of about 25 nodes and till now,
it's working fine.
Recently, i had to use fsck in a map-process, which leads to a
connection refused error.
I read about this error, that i should check about firewalls and proper
configfiles etc.
The command is only work
Hi,
Thanks!
But it is given as 100 Mappers... I think we can also use 'n' number of
Mappers as the same as the number of input files... (not for this
question)... If you know more detail on that please share..
Note : I forgot from where this question I taken :)
regards,
Rams.
On Wed, Nov 7, 20
You are correct. (D) automatically does (B).
On Wed, Nov 7, 2012 at 9:41 PM, Ramasubramanian Narayanan
wrote:
> Hi,
>
> I came across the below question and I feel 'D' is the correct answer but in
> some site it is mentioned that 'B' is the correct answer... Can you please
> tell which is the rig
0 Custer didn't run. He got surrounded and then massacred. :-P (See Custer's
last stand at Little Big Horn)
Ok... plain text files 100 files 2 blocks each would by default attempt to
schedule 200 mappers.
Is this one of those online Cert questions?
On Nov 7, 2012, at 10:20 AM, Ramasubraman
Hi,
Oozie has its own list now. Moving to Oozie's user lists
(u...@oozie.apache.org).
I am not very sure what a Pentaho Job is, but Oozie does provide
sufficient developer-extensions (such as custom actions and custom EL
functions, aside of the simple Java Action), to allow integration with
other
The way I understand it...
Hadoop is a distributed file system that allows you to create folders in its
own NameSpace, copy files to and from your local Linux FS. Set-up Hadoop
configuration for local|pseudo-distributed|fully-distributed cluster.
You write your jobs using MapReduce API and ex
Hi,
Can we use Pentaho Jobs in oozie?
regards,
Rams
Hello Hadoop Champs,
Please give some suggestion..
As Hadoop Ecosystem(Hive, Pig...) internally do Map-Reduce to process.
My Question is
1). where Map-Reduce program(written in Java, python etc) are overtaking Hadoop
Ecosystem.
2). Limitations of Hadoop Ecosystem comparing with Writing Map-Re
When I log the calls of the combiner function and print the number of
elements iterated over, it is all 1 during the spill-writing phase and the
combiner is called very often. Is this normal behavior? According to what
mentioned earlier, I would expect the combiner to combine all records with
the s
Hi,
Blocks are split at arbitrary block size boundaries. Readers can read
the whole file by reading all blocks together (this is transparently
handled by the underlying DFS reader classes itself, a developer does
not have to care about it).
HDFS does not care about what _type_ of file you store,
Hi,
I have basic doubt... How Hadoop splits an Image file into blocks and puts
in HDFS? Usually Image file cannot be splitted right how it is happening in
Hadoop?
regards,
Rams
u getting those in hadoop community mailsI dunt see any such mails may
be u r office mail host provider blocks those links or attachments
On Wed, Nov 7, 2012 at 8:03 PM, Kartashov, Andy wrote:
> Guys,
>
>
>
> Sometimes I get an occasional e-mail saying at the top:
>
>
>
> “This might be a ph
Hey thnks Andy.prob solved.
On Wed, Nov 7, 2012 at 8:09 PM, Kartashov, Andy wrote:
> Sadak,
>
>
>
> Sorry, could not answer your original e-mail as it was blocked.
>
>
>
> Are you running SNN on a separate node?
>
>
>
> If so, it needs to communicate with NN.
>
>
>
> Add this property to
got it the culprit was apaches website frm whr i copied the secnamenodeaddrs
http://hadoop.apache.org/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/Federation.html#Configuration
here secondary.http-address.ns1 shud be used instead of
secondaryhttp-address.ns1
dfs.namenode.secondaryhttp-address
Sadak,
Sorry, could not answer your original e-mail as it was blocked.
Are you running SNN on a separate node?
If so, it needs to communicate with NN.
Add this property to your hdfs-site.xml
dfs.namenode.http-address
:50070
Needed for running SNN
The address
Guys,
Sometimes I get an occasional e-mail saying at the top:
"This might be a phishing e-mail and is potentially unsafe. Links and other
functionality have been disabled"
Is this because of the posted links?
Rgds,
AK
NOTICE: This e-mail message and any attachments are confidential, subje
Gzip is decently fast, but cannot take advantage of Hadoop's natural map splits because it's impossible to start decompressing a gzip stream starting at a random offset in the file. LZO is a wonderful compression scheme to use with Hadoop because it's incredibly fast, and (with a bit of work) it'
Hi,
I don't think there are really any legitimate/legal sites to get sample
questions...
Cloudera does have a promo currently: two chances to pass before the end of
the year.
I'm going to try to write the v4 admin exams in the next few weeks, if I
fail, I will study another 2-3 weeks and try my
I tried configuring secondary namenode but getting this error when i start
it
Exception in thread "main" java.lang.IllegalArgumentException: Target
address cannot be null. any hints..
On Wed, Nov 7, 2012 at 1:47 PM, Visioner Sadak wrote:
> I have configured a cluster setup of hadoop,shud i creat
Keep in mind that v3 exams will be retired at the end of this year...
http://university.cloudera.com/certification.html
On Wed, Nov 7, 2012 at 8:50 AM, Ramasubramanian Narayanan <
ramasubramanian.naraya...@gmail.com> wrote:
> Hi,
>
> Can anyone suggest sample model questions for taking the Cloud
OK, I found the answer to one of my questions just now -- the location of
the spill files and their sizes. So, there's a discrepancy between what I
see and what you said about the compression. The total size of all spill
files of a single task matches with what I estimate for them to be
*without* c
www.crinlogic.com/hadooptest.html both admin & developer James Neofotistos Senior Sales ConsultantEmerging Markets EastPhone: 1-781-565-1890| Mobile: 1-603-759-7889Email:jim.neofotis...@oracle.com From: Ramasubramanian Narayanan [mailto:ramasubramanian.naraya...@gmail.com] Sent: Wednesday, N
OK, just wanted to confirm. Maybe there is another problem then. I just
looked at the task logs and there were ~200 spills recorded for a single
task, only afterwards there was a merge phase. In my case, 200 spills are
about 2GB (uncompressed). One map output record easily fits into the
in-memory b
Hi,
> If a zip file(Gzip) is loaded into HDFS will it get splitted into Blocks and
> store in HDFS?
Yes.
> I understand that a single mapper can work with GZip as it reads the entire
> file from beginning to end... In that case if the GZip file size is larget
> than 128 MB will it get splitted i
Hi,
Yes all files are split into block-size chunks in HDFS. HDFS is
agnostic about what the file's content is, and its attributes (such as
compression, etc.). This is left to the file reader logic to handle.
When a GZip reader initializes, it reads the whole file length, across
all the blocks the
The answer (a) is correct, in general.
On Wed, Nov 7, 2012 at 6:09 PM, Ramasubramanian Narayanan
wrote:
> Hi,
>
> Which of the following is correct w.r.t mapper.
>
> (a) It accepts a single key-value pair as input and can emit any number of
> key-value pairs as output, including zero.
> (b) It ac
Yes we do compress each spill output using the same codec as specified
for map (intermediate) output compression. However, the counted bytes
may be counting decompressed values of the records written, and not
post-compressed ones.
On Wed, Nov 7, 2012 at 6:02 PM, Sigurd Spieckermann
wrote:
> Hi gu
Hi,
Which of the following is correct w.r.t mapper.
(a) It accepts a single key-value pair as input and can emit any number of
key-value pairs as output, including zero.
(b) It accepts a single key-value pair as input and emits a single key and
list of corresponding values as output
regards,
Ra
Thanks Mike.
1. So I think you mean for Hadoop, since it is batch job latency is not the
most key concern, so time spent on swap is acceptable. But for HBase, the
normal use case is on-demand and semi-real time query, so we need to avoid
the memory swap to impact latency?
2. Supposing I have 4 map
Hi,
If a zip file(Gzip) is loaded into HDFS will it get splitted into Blocks
and store in HDFS?
I understand that a single mapper can work with GZip as it reads the entire
file from beginning to end... In that case if the GZip file size is larget
than 128 MB will it get splitted into blocks and s
also is checkpoint node and secondary name node the same
On Wed, Nov 7, 2012 at 1:47 PM, Visioner Sadak wrote:
> I have configured a cluster setup of hadoop,shud i create a directory for
> secondary namenode as well if i need one hw to mention tht in core-site.xml
> ,is tmp directory needed in co
53 matches
Mail list logo