File_bytes_read vs hdfs_bytes_read

2014-03-14 Thread Sai Sai
Just wondering what is the diff between File_bytes_read vs hdfs_bytes_read which gets displayed in the output of job. Thanks Sai

how to unzip a .tar.bz2 file in hadoop/hdfs

2014-03-14 Thread Sai Sai
Can some one please help: How to unzip a .tar.bz2 file which is in hadoop/hdfs Thanks Sai

Difference between FILE_Bytes_READ vs HDFS_Bytes_Read.

2014-03-13 Thread Sai Sai
Can some please help: 1. Difference between FILE_Bytes_READ vs HDFS_Bytes_Read. Thanks Sai

Is hdinsights a C# version of hadoop or is it in java.

2014-03-13 Thread Sai Sai
Is hdinsights a C# version of hadoop or is it in java. Please let me know. Thanks Sai

Re: 2 Map tasks running for a small input file

2013-09-26 Thread Sai Sai
Hi Here is the input file for the wordcount job: ** Hi This is a simple test. Hi Hadoop how r u. Hello Hello. Hi Hi. Hadoop Hadoop Welcome. ** After running the wordcount successfully  here r the counters info: *** Job Counters SLOTS_MILLIS_MAPS 0 0

Re: 2 Map tasks running for a small input file

2013-09-26 Thread Sai Sai
To: user@hadoop.apache.org; Sai Sai saigr...@yahoo.in Sent: Thursday, 26 September 2013 5:09 PM Subject: Re: 2 Map tasks running for a small input file Hi, Default number of map tasks is 2. You can set mapred.map.tasks to 1 to avoid this. Regards, Viji On Thu, Sep 26, 2013 at 4:28 PM, Sai Sai

Re: Input Split vs Task vs attempt vs computation

2013-09-26 Thread Sai Sai
Hi I have a few questions i am trying to understand: 1. Is each input split same as a record, (a rec can be a single line or multiple lines). 2. Is each Task a collection of few computations or attempts. For ex: if i have a small file with 5 lines. By default there will be 1 line on which

Re: Is counter a static var

2013-06-07 Thread Sai Sai
Is counter like a static var. If so is it persisted on the name node or data node. Any input please. Thanks Sai

Re: Is it possible to define num of mappers to run for a job

2013-06-07 Thread Sai Sai
Is it possible to define num of mappers to run for a job. What r the conditions we need to be aware of when defining such a thing. Please help. Thanks Sai

Re: Pool slot questions

2013-06-07 Thread Sai Sai
1. Can we think of a job pool similar to a queue. 2. Is it possible to configure a slot if so how. Please help. Thanks Sai

Re: Install hadoop on multiple VMs in 1 laptop like a cluster

2013-05-31 Thread Sai Sai
Just wondering if anyone has any documentation or references to any articles how to simulate a multi node cluster setup in 1 laptop with hadoop running on multiple ubuntu VMs. any help is appreciated. Thanks Sai

Hadoop based product recomendations.

2013-05-29 Thread Sai Sai
Just wondering if anyone would have any suggestions. We r a bunch of developers on bench for a few months trained on Hadoop but do not have any projects to work. We would like to develop a Hadoop/Hive/Pig based product for our company so we can be of value to the company and not be scared of lay

Re: diff between these 2 dirs

2013-05-24 Thread Sai Sai
Just wondering if someone can explain what is the diff between these 2 dirs: Contents of directory /home/satish/work/mapred/staging/satish/.staging and this dir: /hadoop/mapred/system Thanks Sai

Re: Hadoop Development on cloud in a secure and economical way.

2013-05-22 Thread Sai Sai
Is it possible to do Hadoop development on cloud in a secure and economical way without worrying about our source being taken away. We would like to have Hadoop and eclipse installed on a vm in cloud and our developers will log into the cloud on a daily basis and work on the cloud. Like this

Re: Flume port issue

2013-05-21 Thread Sai Sai
Just a friendly follow up to see if anyone has any suggestions for the issue with port given below. Any help is appreciated. Thanks Sai On May 20, 2013 5:40 PM, Sai Sai saigr...@yahoo.in wrote: Not sure if this is the right group to ask questions about flume: I am getting an exception about

Re: Project ideas

2013-05-21 Thread Sai Sai
Excellent Sanjay, really excellent input. Many Thanks for this input. I have been always thinking about some ideas but never knowing what to proceed with. Thanks again. Sai From: Sanjay Subramanian sanjay.subraman...@wizecommerce.com To: user@hadoop.apache.org

Re: Flume port issue

2013-05-20 Thread Sai Sai
Not sure if this is the right group to ask questions about flume: I am getting an exception about unable to open a port in flume when trying to create a remote agent, more details below: --- 13/05/20 04:55:30 ERROR avro.AvroCLIClient: Unable to open connection to Flume.

Re: Flume port issue

2013-05-20 Thread Sai Sai
to run the netcat flume sample? -- Lenin. On May 20, 2013 5:40 PM, Sai Sai saigr...@yahoo.in wrote: Not sure if this is the right group to ask questions about flume: I am getting an exception about unable to open a port in flume when trying to create a remote agent, more details below

Re: Does a Map task run 3 times on 3 TTs or just once

2013-04-12 Thread Sai Sai
Just wondering if it is right to assume that a Map task is run 3 times on 3 different TTs in parallel and whoever completes processing the task first that output is picked up and written to intermediate location. Or is it true that a map task even though its data is replicated 3 times will run

Re: 10 TB of a data file.

2013-04-12 Thread Sai Sai
In real world can a file be of this big size as 10 TB?  Will the data be put into a txt file or what kind of a file? If someone would like to open such a big file to look at the content will OS support opening such big files?  If not how to handle this kind of scenario? Any input will be

Re: How to find the num of Mappers

2013-04-12 Thread Sai Sai
If we have a 640 MB data file and have 3 Data Nodes in a cluster. The file can be split into 10 Blocks and starts the Mappers M1, M2,  M3 first. As each one completes the task M4 and so on will be run.  It appears like it is not necessary to run all the 10 Map tasks in parallel at once. Just

Re: Will HDFS refer to the memory of NameNode DataNode or is it a separate machine

2013-04-12 Thread Sai Sai
A few basic questions: Will HDFS refer to the memory of NameNode DataNode or is it a separate machine. For NameNode, DataNode and others there is a process associated with each of em. But no process is for HDFS, wondering why? I understand that fsImage has the meta data of the HDFS, so when

Re: 100K Maps scenario

2013-04-12 Thread Sai Sai
blocks it will result in atleast 300K Map tasks being performed and this looks like an overkill from a performance or just a logical perspective.  Will appreciate any thoughts on this. Thanks Sai From: Sai Sai saigr...@yahoo.in To: user@hadoop.apache.org user

Re: 100K Maps scenario

2013-04-12 Thread Sai Sai
Thanks Kai for confirming it. From: Kai Voigt k...@123.org To: user@hadoop.apache.org; Sai Sai saigr...@yahoo.in Sent: Saturday, 13 April 2013 7:18 AM Subject: Re: 100K Maps scenario No, only one copy of each block will be processed. If a task fails

Re: Reduce starts before map completes (at 23%)

2013-04-11 Thread Sai Sai
I am running the wordcount from hadoop-examples, i am giving as input a bunch of test files, i have noticed in the output given below reduce starts when the map is at 23%, i was wondering if it is not right that reducers will start only after the complete mapping is done which mean when map is

Re: fsImage editsLog questions

2013-04-03 Thread Sai Sai
1. Will fsImage maintain the data/metadata of name node. 2. Will any input files be stored in fsImage. 3. When a namenode goes down will all the data in the name node go down or just the meta data only and what will happen to fsimage editslog. 5. Is the fsimage file which is also maintained as

Re: Who splits the file into blocks

2013-03-31 Thread Sai Sai
Here is my understanding about putting a file into hdfs: A client contacts name node and gets the location of blocks where it needs to put the blocks in data nodes. But before this how does the name node know how many blocks it needs to split a file into. Who splits the file is it the client

Re: Bloom Filter analogy in SQL

2013-03-29 Thread Sai Sai
Can some one give a simple analogy of Bloom Filter in SQL. I am trying to understand and always get confused. Thanks

Re: list of linux commands for hadoop

2013-03-29 Thread Sai Sai
Just wondering if there are a list of linux commands or any article which r needed for learning hadoop. Thanks

Re: Understanding Sys.output from mapper partitioner

2013-03-29 Thread Sai Sai
Sai From: Jens Scheidtmann jens.scheidtm...@gmail.com To: user@hadoop.apache.org; Sai Sai saigr...@yahoo.in Sent: Friday, 29 March 2013 9:26 PM Subject: Re: Understanding Sys.output from mapper partitioner Hallo Sai, the interesting bits are, how your job

Understanding Sys.output from mapper partitioner

2013-03-27 Thread Sai Sai
Below r my simple mapper, partitioner classes and the input file and the output displayed on Console at the end of the message: My question is about the keys it prints in the console window highlighted in bold in the console output which looks like this: Here is the first few lines of the

System.out.printlin vs Counters

2013-03-27 Thread Sai Sai
Q1. Is it right to assume the System.out.println statements are used only in eclipse environment and In a multi node cluster environment we need to use counters. Q2. I am slightly confused as it appears like using System.out.println statements we r able to get detailed info at every line of

Storage Block vs File Block

2013-03-27 Thread Sai Sai
Hadoop splits large files into file blocks of size 64MB. Are these same as storage blocks or r they different. Thanks Sai

Static class vs Normal Class when to use

2013-03-27 Thread Sai Sai
In some examples/articles sometimes they use: public static class MyMapper  and sometimes they use public class MyMapper  When/why should we use static vs normal class. Thanks Sai

Re: Inspect a context object and see whats in it

2013-03-27 Thread Sai Sai
I have put a break pt in map/reduce method and tried looking thru the context object by using the option inspect i see a lot of variables inside it but wondering if it is possible to look at the contents in it meaningfully  by contents i mean the keys and values only that we add at each step. 

Re: Serialized comparator vs normal comparator

2013-03-27 Thread Sai Sai
Just wondering what is the difference between serialized comparator vs normal comparator given below, the reason i am trying to understand this is how will you verify if you r using serialized comparator during debugging if the comparator is  working or not as when you debug in eclipse it

Re: Setup/Cleanup question

2013-03-22 Thread Sai Sai
Thanks Harsh. So the setup/cleanup r for the Job and not the Mappers i take it. Thanks. From: Harsh J ha...@cloudera.com To: user@hadoop.apache.org user@hadoop.apache.org; Sai Sai saigr...@yahoo.in Sent: Friday, 22 March 2013 10:05 PM Subject: Re: Setup

Re: Dissecting MR output article

2013-03-22 Thread Sai Sai
Just wondering if there is any step by step explaination/article of MR output we get when we run a job either in eclipse or ubuntu.Any help is appreciated. Thanks Sai

Re: Block vs FileSplit vs record vs line

2013-03-14 Thread Sai Sai
Just wondering if this is right way to understand this: A large file is split into multiple blocks and each block is split into multiple file splits and each file split has multiple records and each record has multiple lines. Each line is processed by 1 instance of mapper. Any help is

Re: Find current version cluster info of hadoop

2013-03-07 Thread Sai Sai
Just wondering if there r any commands in Hadoop which would give us the current version that we r using and any command which will give us the info of cluster setup of H we r working on. Thanks Sai

Files in hadoop.

2013-03-04 Thread Sai Sai
Just wondering after we put a file in hadoop for running MR jobs after we r done with it. Is it a standard to delete it or just leave it there like that. Just wondering what others do. Any input will be appreciated. Thanks Sai

Re: Unknown processes unable to terminate

2013-03-04 Thread Sai Sai
I have a list of following processes given below, i am trying to kill the process 13082 using: kill 13082 Its not terminating RunJar. I have done a stop-all.sh hoping it would stop all the processes but only stopped the hadoop related processes. I am just wondering if it is necessary to stop

Re: WordPairCount Mapreduce question.

2013-02-24 Thread Sai Sai
, is it right in assuming  it converts each objects word1/word2/word3 to byte[] and compares them. If so is it for performance reason it is done. Could you please verify. Thanks Sai From: Mahesh Balija balijamahesh@gmail.com To: user@hadoop.apache.org; Sai Sai saigr

Re: Trying to copy file to Hadoop file system from a program

2013-02-24 Thread Sai Sai
Greetings, Below is the program i am trying to run and getting this exception: *** Test Start. java.net.UnknownHostException: unknown host: master     at org.apache.hadoop.ipc.Client$Connection.init(Client.java:214)     at

Re: WordPairCount Mapreduce question.

2013-02-23 Thread Sai Sai
Hello I have a question about how Mapreduce sorting works internally with multiple columns. Below r my classes using 2 columns in an input file given below. 1st question: About the method hashCode, we r adding a 31 + , i am wondering why is this required. what does 31 refer to. 2nd

Re: Newbie Debuggin Question

2013-02-21 Thread Sai Sai
This may be a basic beginner debug question will appreciate if anyone can pour some light: Here is the method i have in Eclipse: *** @Override     protected void setup(Context context) throws java.io.IOException,             InterruptedException {         Path[]