Just wondering what is the diff between File_bytes_read vs hdfs_bytes_read
which gets displayed in the output of job.
Thanks
Sai
Can some one please help:
How to unzip a .tar.bz2 file which is in hadoop/hdfs
Thanks
Sai
Can some please help:
1. Difference between FILE_Bytes_READ vs HDFS_Bytes_Read.
Thanks
Sai
Is hdinsights a C# version of hadoop or is it in java.
Please let me know.
Thanks
Sai
Hi
Here is the input file for the wordcount job:
**
Hi This is a simple test.
Hi Hadoop how r u.
Hello Hello.
Hi Hi.
Hadoop Hadoop Welcome.
**
After running the wordcount successfully
here r the counters info:
***
Job Counters SLOTS_MILLIS_MAPS 0 0
To: user@hadoop.apache.org; Sai Sai saigr...@yahoo.in
Sent: Thursday, 26 September 2013 5:09 PM
Subject: Re: 2 Map tasks running for a small input file
Hi,
Default number of map tasks is 2. You can set mapred.map.tasks to 1 to
avoid this.
Regards,
Viji
On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai
Hi
I have a few questions i am trying to understand:
1. Is each input split same as a record, (a rec can be a single line or
multiple lines).
2. Is each Task a collection of few computations or attempts.
For ex: if i have a small file with 5 lines.
By default there will be 1 line on which
Is counter like a static var. If so is it persisted on the name node or data
node.
Any input please.
Thanks
Sai
Is it possible to define num of mappers to run for a job.
What r the conditions we need to be aware of when defining such a thing.
Please help.
Thanks
Sai
1. Can we think of a job pool similar to a queue.
2. Is it possible to configure a slot if so how.
Please help.
Thanks
Sai
Just wondering if anyone has any documentation or references to any articles
how to simulate a multi node cluster setup in 1 laptop with hadoop running on
multiple ubuntu VMs. any help is appreciated.
Thanks
Sai
Just wondering if anyone would have any suggestions.
We r a bunch of developers on bench for a few months trained on Hadoop but do
not have any projects to work.
We would like to develop a Hadoop/Hive/Pig based product for our company so we
can be of value to the company and not be scared of lay
Just wondering if someone can explain what is the diff between these 2 dirs:
Contents of directory /home/satish/work/mapred/staging/satish/.staging
and this dir:
/hadoop/mapred/system
Thanks
Sai
Is it possible to do Hadoop development on cloud in a secure and economical way
without worrying about our source being taken away. We would like to have
Hadoop and eclipse installed on a vm in cloud and our developers will log into
the cloud on a daily basis and work on the cloud. Like this
Just a friendly follow up to see if anyone has any suggestions for the issue
with port given below.
Any help is appreciated.
Thanks
Sai
On May 20, 2013 5:40 PM, Sai Sai saigr...@yahoo.in wrote:
Not sure if this is the right group to ask questions about flume:
I am getting an exception about
Excellent Sanjay, really excellent input. Many Thanks for this input.
I have been always thinking about some ideas but never knowing what to proceed
with.
Thanks again.
Sai
From: Sanjay Subramanian sanjay.subraman...@wizecommerce.com
To: user@hadoop.apache.org
Not sure if this is the right group to ask questions about flume:
I am getting an exception about unable to open a port in flume when trying to
create a remote agent, more details below:
---
13/05/20 04:55:30 ERROR avro.AvroCLIClient: Unable to open connection to Flume.
to run the netcat flume sample?
--
Lenin.
On May 20, 2013 5:40 PM, Sai Sai saigr...@yahoo.in wrote:
Not sure if this is the right group to ask questions about flume:
I am getting an exception about unable to open a port in flume when trying to
create a remote agent, more details below
Just wondering if it is right to assume that a Map task is run 3 times on 3
different TTs in parallel and whoever completes processing the task first that
output is picked up and written to intermediate location.
Or is it true that a map task even though its data is replicated 3 times will
run
In real world can a file be of this big size as 10 TB?
Will the data be put into a txt file or what kind of a file?
If someone would like to open such a big file to look at the content will OS
support opening such big files?
If not how to handle this kind of scenario?
Any input will be
If we have a 640 MB data file and have 3 Data Nodes in a cluster.
The file can be split into 10 Blocks and starts the Mappers M1, M2, M3 first.
As each one completes the task M4 and so on will be run.
It appears like it is not necessary to run all the 10 Map tasks in parallel at
once.
Just
A few basic questions:
Will HDFS refer to the memory of NameNode DataNode or is it a separate
machine.
For NameNode, DataNode and others there is a process associated with each of em.
But no process is for HDFS, wondering why? I understand that fsImage has the
meta data of the HDFS, so when
blocks it will result in atleast 300K Map tasks being
performed and this looks like an overkill from a performance or just a logical
perspective.
Will appreciate any thoughts on this.
Thanks
Sai
From: Sai Sai saigr...@yahoo.in
To: user@hadoop.apache.org user
Thanks Kai for confirming it.
From: Kai Voigt k...@123.org
To: user@hadoop.apache.org; Sai Sai saigr...@yahoo.in
Sent: Saturday, 13 April 2013 7:18 AM
Subject: Re: 100K Maps scenario
No, only one copy of each block will be processed.
If a task fails
I am running the wordcount from hadoop-examples, i am giving as input a bunch
of test files, i have noticed in the output given below reduce starts when the
map is at 23%, i was wondering if it is not right that reducers will start only
after the complete mapping is done which mean when map is
1. Will fsImage maintain the data/metadata of name node.
2. Will any input files be stored in fsImage.
3. When a namenode goes down will all the data in the name node go down or just
the meta data only and what will happen to fsimage editslog.
5. Is the fsimage file which is also maintained as
Here is my understanding about putting a file into hdfs:
A client contacts name node and gets the location of blocks where it needs to
put the blocks in data nodes.
But before this how does the name node know how many blocks it needs to split a
file into.
Who splits the file is it the client
Can some one give a simple analogy of Bloom Filter in SQL.
I am trying to understand and always get confused.
Thanks
Just wondering if there are a list of linux commands or any article which r
needed for learning hadoop.
Thanks
Sai
From: Jens Scheidtmann jens.scheidtm...@gmail.com
To: user@hadoop.apache.org; Sai Sai saigr...@yahoo.in
Sent: Friday, 29 March 2013 9:26 PM
Subject: Re: Understanding Sys.output from mapper partitioner
Hallo Sai,
the interesting bits are, how your job
Below r my simple mapper, partitioner classes and the input file and the output
displayed on Console at the end of the message:
My question is about the keys it prints in the console window highlighted in
bold in the console output which looks like this:
Here is the first few lines of the
Q1. Is it right to assume the System.out.println statements are used only in
eclipse environment and
In a multi node cluster environment we need to use counters.
Q2. I am slightly confused as it appears like using System.out.println
statements
we r able to get detailed info at every line of
Hadoop splits large files into file blocks of size 64MB. Are these same as
storage blocks or r they different.
Thanks
Sai
In some examples/articles sometimes they use:
public static class MyMapper
and sometimes they use
public class MyMapper
When/why should we use static vs normal class.
Thanks
Sai
I have put a break pt in map/reduce method and tried looking thru the context
object by using the option inspect
i see a lot of variables inside it but wondering if it is possible to look at
the contents in it meaningfully
by contents i mean the keys and values only that we add at each step.
Just wondering what is the difference between serialized comparator vs normal
comparator given below,
the reason i am trying to understand this is how will you verify if you r using
serialized comparator during debugging if the comparator is
working or not as when you debug in eclipse it
Thanks Harsh.
So the setup/cleanup r for the Job and not the Mappers i take it.
Thanks.
From: Harsh J ha...@cloudera.com
To: user@hadoop.apache.org user@hadoop.apache.org; Sai Sai
saigr...@yahoo.in
Sent: Friday, 22 March 2013 10:05 PM
Subject: Re: Setup
Just wondering if there is any step by step explaination/article of MR output
we get when we run a job either in eclipse or ubuntu.Any help is appreciated.
Thanks
Sai
Just wondering if this is right way to understand this:
A large file is split into multiple blocks and each block is split into
multiple file splits and each file split has multiple records and each record
has multiple lines. Each line is processed by 1 instance of mapper.
Any help is
Just wondering if there r any commands in Hadoop which would give us the
current version that we
r using and any command which will give us the info of cluster setup of H we r
working on.
Thanks
Sai
Just wondering after we put a file in hadoop for running MR jobs after we r
done with it.
Is it a standard to delete it or just leave it there like that.
Just wondering what others do.
Any input will be appreciated.
Thanks
Sai
I have a list of following processes given below, i am trying to kill the
process 13082 using:
kill 13082
Its not terminating RunJar.
I have done a stop-all.sh hoping it would stop all the processes but only
stopped the hadoop related processes.
I am just wondering if it is necessary to stop
, is it right in
assuming it converts each objects word1/word2/word3 to byte[] and compares
them.
If so is it for performance reason it is done.
Could you please verify.
Thanks
Sai
From: Mahesh Balija balijamahesh@gmail.com
To: user@hadoop.apache.org; Sai Sai saigr
Greetings,
Below is the program i am trying to run and getting this exception:
***
Test Start.
java.net.UnknownHostException: unknown host: master
at org.apache.hadoop.ipc.Client$Connection.init(Client.java:214)
at
Hello
I have a question about how Mapreduce sorting works internally with multiple
columns.
Below r my classes using 2 columns in an input file given below.
1st question: About the method hashCode, we r adding a 31 + , i am wondering
why is this required. what does 31 refer to.
2nd
This may be a basic beginner debug question will appreciate if anyone can pour
some light:
Here is the method i have in Eclipse:
***
@Override
protected void setup(Context context) throws java.io.IOException,
InterruptedException {
Path[]
46 matches
Mail list logo