Re: Non utf-8 chars in input

2012-09-11 Thread Joshi, Rekha
Hi Ajay, Try SequenceFileAsBinaryInputFormat ? Thanks Rekha On 11/09/12 11:24 AM, Ajay Srivastava ajay.srivast...@guavus.com wrote: Hi, I am using default inputFormat class for reading input from text files but the input file has some non utf-8 characters. I guess that TextInputFormat class

what happens when a datanode rejoins?

2012-09-11 Thread Mehul Choube
Hi, What happens when an existing (not new) datanode rejoins a cluster for following scenarios: 1. Some of the blocks it was managing are deleted/modified? 2. The size of the blocks are now modified say from 64MB to 128MB? 3. What if the block replication factor was one

Re: what happens when a datanode rejoins?

2012-09-11 Thread George Datskos
Hi Mehul Some of the blocks it was managing are deleted/modified? The namenode will asynchronously replicate the blocks to other datanodes in order to maintain the replication factor after a datanode has not been in contact for 10 minutes. The size of the blocks are now modified say

Re: what happens when a datanode rejoins?

2012-09-11 Thread George Datskos
Mehul, Let me make an addition. Some of the blocks it was managing are deleted/modified? Blocks that are deleted in the interim will deleted on the rejoining node as well, after it rejoins . Regarding the modified, I'd advise against modifying blocks after they have been fully written.

Re: what happens when a datanode rejoins?

2012-09-11 Thread Harsh J
George has answered most of these. I'll just add on: On Tue, Sep 11, 2012 at 12:44 PM, Mehul Choube mehul_cho...@symantec.com wrote: 1. Some of the blocks it was managing are deleted/modified? A DN runs a block report upon start, and sends the list of blocks to the NN. NN validates them

Re: how to make different mappers execute different processing on same data ?

2012-09-11 Thread Narasingu Ramesh
Hi Jason, Mehmet said is exactly correct ,without reducers we cannot increase performance please you can add mappers and reducers in any processing data you can get output and performance is good. Thanks Regards, Ramesh.Narasingu On Tue, Sep 11, 2012 at 9:31 AM, Mehmet

Re: Non utf-8 chars in input

2012-09-11 Thread Ajay Srivastava
Rekha, I guess that problem is that Text class uses utf-8 encoding and one can not set other encoding for this class. I have not seen any other Text like class which supports other encoding otherwise I have written my custom input format class. Thanks for your inputs. Regards, Ajay

Re: configure hadoop-0.22 fairscheduler

2012-09-11 Thread Jameson Li
Hi Harsh, Thanks for your reply. And I am sorry for my unclear description. As I mentioned previous, I think I configured the fairsheduler correctly in hadoop-0.22.0. But when I commit lots of the jobs: many big jobs (map number and reduce number is bigger than the map/reduce slot) commit

what happens when a datanode rejoins?

2012-09-11 Thread mehul choube
Hi, What happens when an existing (not new) datanode rejoins a cluster for following scenarios: a) Some of the blocks it was managing are deleted/modified? b) The size of the blocks are now modified say from 64MB to 128MB? c) What if the block replication factor was one (yea not in most

RE: build failure - trying to build hadoop trunk checkout

2012-09-11 Thread Tony Burton
Changing the hostname to lowercase fixed this particular problem - thanks for your replies. The build is failing elsewhere now, I'll post a new thread for that. Tony From: Tony Burton [mailto:tbur...@sportingindex.com] Sent: 10 September 2012 10:44 To: user@hadoop.apache.org Subject: RE:

Re: Can't run PI example on hadoop 0.23.1

2012-09-11 Thread Narasingu Ramesh
Hi Vinod, Please check whether input file location and output file location doesnt match. please find your input file first put into HDFS and then run MR job it is working fine. Thanks Regards, Ramesh.Narasingu On Tue, Sep 11, 2012 at 4:23 AM, Vinod Kumar Vavilapalli

Re: hadoop trunk build failure - yarn, surefire related?

2012-09-11 Thread Narasingu Ramesh
Hi, Please find i think one command is there then only build the all applications. Thanks Regards, Ramesh.Narasingu On Tue, Sep 11, 2012 at 2:28 PM, Tony Burton tbur...@sportingindex.comwrote: Hi, ** ** I’ve checked out the hadoop trunk, and I’m running “mvn test” on the

RE: what happens when a datanode rejoins?

2012-09-11 Thread Mehul Choube
The namenode will asynchronously replicate the blocks to other datanodes in order to maintain the replication factor after a datanode has not been in contact for 10 minutes. What happens when the datanode rejoins after namenode has already re-replicated the blocs it was managing? Will

Re: what happens when a datanode rejoins?

2012-09-11 Thread Harsh J
Hi, Inline. On Tue, Sep 11, 2012 at 2:36 PM, Mehul Choube mehul_cho...@symantec.com wrote: The namenode will asynchronously replicate the blocks to other datanodes in order to maintain the replication factor after a datanode has not been in contact for 10 minutes. What happens when the

RE: what happens when a datanode rejoins?

2012-09-11 Thread Mehul Choube
DataNode rejoins take care of only NameNode. Sorry didn't get this From: Narasingu Ramesh [mailto:ramesh.narasi...@gmail.com] Sent: Tuesday, September 11, 2012 2:38 PM To: user@hadoop.apache.org Subject: Re: what happens when a datanode rejoins? Hi Mehul, DataNode rejoins take care

Re: Undeliverable messages

2012-09-11 Thread Harsh J
Ha, good sleuthing. I just moved it to INFRA, as no one from our side has gotten to this yet. I guess we can only moderate, not administrate. So the ticket now awaits action from INFRA on ejecting it out. On Tue, Sep 11, 2012 at 2:34 PM, Tony Burton tbur...@sportingindex.com wrote: Thanks

RE: hadoop trunk build failure - yarn, surefire related?

2012-09-11 Thread Tony Burton
Hi Ramesh Thanks for the quick reply, but I'm having trouble following your English. Are you saying that there is one command to build everything? If so, can you tell me what it is? Tony From: Narasingu Ramesh [mailto:ramesh.narasi...@gmail.com] Sent: 11 September 2012 10:06 To:

Re: Undeliverable messages

2012-09-11 Thread Harsh J
And done. We shouldn't get this anymore. Thanks for bumping on this issue Tony! On Tue, Sep 11, 2012 at 2:44 PM, Harsh J ha...@cloudera.com wrote: Ha, good sleuthing. I just moved it to INFRA, as no one from our side has gotten to this yet. I guess we can only moderate, not administrate. So

Re: hadoop trunk build failure - yarn, surefire related?

2012-09-11 Thread Steve Loughran
It's probably some maven thing -in particular Maven's habit of grabbing the online nightly snapshots off apache rather than local, try mvn clean install -DskipTests -offline to force in all the artifacts, then run the MR tests Tony -why not get on the mapreduce-dev mailing list, as this is the

RE: hadoop trunk build failure - yarn, surefire related?

2012-09-11 Thread Tony Burton
Thanks Steve, I’ll try the mvn command you suggest. All the snapshots I can see came from repository.apache.org though. How do I run the MR tests only? Thanks for the mapreduce-dev mailing list suggestion, I thought all lists had merged into one though – did I get the wrong end of the stick?

RE: Undeliverable messages

2012-09-11 Thread Tony Burton
No problem! I'll remove that Outlook filter now... :) -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: 11 September 2012 10:34 To: user@hadoop.apache.org Subject: Re: Undeliverable messages And done. We shouldn't get this anymore. Thanks for bumping on this issue Tony!

RE: hadoop trunk build failure - yarn, surefire related?

2012-09-11 Thread Tony Burton
Good suggestions Harsh and Hemanth. When I was asked to submit a patch for hadoop 1.0.3, I thought it a good exercise to work through the build process to become familiar even though the patch is documentation-only. Maybe the requests for patches could come with a list of suggested reading as

Re: what's the default reducer number?

2012-09-11 Thread Bejoy Ks
Hi Lin The default value for number of reducers is 1 namemapred.reduce.tasks/name value1/value It is not determined by data volume. You need to specify the number of reducers for your mapreduce jobs as per your data volume. Regards Bejoy KS On Tue, Sep 11, 2012 at 4:53 PM, Jason Yang

Re: FW: Doubts Reg

2012-09-11 Thread sudha sadhasivam
Dear Madam, I'am keeping as attachment relevant screen shots of running hive Profile.png  shows contents of /etc/profile hive_common_lib.png shows h-ve_common*.jar is already in $HIVE_HOME/lib  , here $HIVE_HOME is /home/yahoo/hive/build/dist as evident from classpath_err.png Yours Truly G

Re: what's the default reducer number?

2012-09-11 Thread Bejoy Ks
Hi Lin The default values for all the properties are in core-default.xml hdfs-default.xml and mapred-default.xml Regards Bejoy KS On Tue, Sep 11, 2012 at 5:06 PM, Jason Yang lin.yang.ja...@gmail.comwrote: Hi, Bejoy Thanks for you reply. where could I find the default value of

Re: Some general questions about DBInputFormat

2012-09-11 Thread Bejoy KS
Hi Yaron Sqoop uses a similar implementation. You can get some details there. Replies inline • (more general question) Are there many use-cases for using DBInputFormat? Do most Hadoop jobs take their input from files or DBs? From my small experience Most MR jobs have data in hdfs. It is

Question about the task assignment strategy

2012-09-11 Thread Hiroyuki Yamada
Hi, I want to make sure my understanding about task assignment in hadoop is correct or not. When scanning a file with multiple tasktrackers, I am wondering how a task is assigned to each tasktracker . Is it based on the block sequence or data locality ? Let me explain my question by example.

Re: Question about the task assignment strategy

2012-09-11 Thread Hemanth Yamijala
Hi, Task assignment takes data locality into account first and not block sequence. In hadoop, tasktrackers ask the jobtracker to be assigned tasks. When such a request comes to the jobtracker, it will try to look for an unassigned task which needs data that is close to the tasktracker and will

RE: hadoop trunk build failure - yarn, surefire related?

2012-09-11 Thread Tony Burton
Another mvn test caused the build to fail slightly further down the road. As my Jira issue is documentation-only, I've submitted the patch anyway. Is this multiple-failure scenario typical for trying to build hadoop from the trunk? It's sure putting me off submitting code in future. Is there

Re: Error in : hadoop fsck /

2012-09-11 Thread Hemanth Yamijala
Could you please review your configuration to see if you are pointing to the right namenode address ? (This will be in core-site.xml) Please paste it here so we can look for clues. Thanks hemanth On Tue, Sep 11, 2012 at 9:25 PM, yogesh dhari yogeshdh...@live.com wrote: Hi all, I am running

Re: Error in : hadoop fsck /

2012-09-11 Thread Arpit Gupta
Yogesh try this hadoop fsck -Ddfs.http.address=localhost:50070 / 50070 is the default http port that the namenode runs on. The property dfs.http.address should be set in your hdfs-site.xml -- Arpit Gupta Hortonworks Inc. http://hortonworks.com/ On Sep 11, 2012, at 9:03 AM, yogesh dhari

how to specify the root directory of hadoop on slave node?

2012-09-11 Thread Richard Tang
Hi, All I need to setup a hadoop/hdfs cluster with one namenode on a machine and two datanodes on two other machines. But after setting datanode machiines in conf/slaves file, running bin/start-dfs.sh can not start hdfs normally.. I am aware that I have not specify the root directory hadoop is

security-in-HADOOP

2012-09-11 Thread nisha
How security is maintained in hadoop, is it maintained by giving folder/file permissions in hadoop how can i make sure that somebody else dunt write in to my hdfs file system ...

removing datanodes from clustes.

2012-09-11 Thread yogesh dhari
Hello all, I am not getting the clear way out to remove datanode from the cluster. please explain me decommissioning steps with example. like how to creating exclude files and other steps involved in it. Thanks regards Yogesh Kumar

Re: security-in-HADOOP

2012-09-11 Thread Bertrand Dechoux
By reading the documentation, like the following http://hadoop.apache.org/docs/r1.0.3/hdfs_permissions_guide.html On Tue, Sep 11, 2012 at 8:14 PM, nisha nishakulkarn...@gmail.com wrote: How security is maintained in hadoop, is it maintained by giving folder/file permissions in hadoop how can

Re: How to remove datanode from cluster..

2012-09-11 Thread Bejoy Ks
Hi Yogesh The detailed steps are available in hadoop wiki on FAQ page http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F Regrads Bejoy KS On Wed, Sep 12, 2012 at 12:14 AM, yogesh dhari

Re: Question about the task assignment strategy

2012-09-11 Thread Hiroyuki Yamada
I figured out the cause. HDFS block size is 128MB, but I specify mapred.min.split.size as 512MB, and data local I/O processing goes wrong for some reason. When I remove the mapred.min.split.size configuration, tasktrackers pick data-local tasks. Why does it happen ? It seems like a bug. Split is

Re: Accessing image files from hadoop to jsp

2012-09-11 Thread Michael Segel
Here's one... Write a Java program which can be accessed on the server side to pull the picture from HDFS and display it on your JSP. On Sep 11, 2012, at 3:48 PM, Visioner Sadak visioner.sa...@gmail.com wrote: any hints experts atleast if i m on the right track or we cant use hftp at all

How to split a sequence file

2012-09-11 Thread Jason Yang
Hi, I have a sequence file written by SequenceFileOutputFormat with key/value type of Text, BytesWritable, like below: Text BytesWritable - id_A_01 7F2B3C687F2B3C687F2B3C68 id_A_02

Re: How to not output the key

2012-09-11 Thread Manoj Babu
Hi, You have to specify the reducer key out type as NullWritable. Cheers! Manoj. On Wed, Sep 12, 2012 at 7:43 AM, Nataraj Rashmi - rnatar rashmi.nata...@acxiom.com wrote: Hello, ** ** I have simple map/reduce program to merge input files into one big output files. My question is,

RE: Issue in access static object in MapReduce

2012-09-11 Thread Stuti Awasthi
Thanks Bejoy, I try to implement and if face any issues will let you know. Thanks Stuti From: Bejoy Ks [mailto:bejoy.had...@gmail.com] Sent: Tuesday, September 11, 2012 8:39 PM To: user@hadoop.apache.org Subject: Re: Issue in access static object in MapReduce Hi Stuti You can pass the json

RE: removing datanodes from clustes.

2012-09-11 Thread Brahma Reddy Battula
Hi Yogesh.. FYI. Please go through following.. http://tech.zhenhua.info/2011/04/how-to-decommission-nodesblacklist.html http://hadoop-karma.blogspot.in/2011/01/hadoop-cookbook-how-to-decommission.html From: yogesh dhari [yogeshdh...@live.com] Sent: Wednesday,

Re: Issue in access static object in MapReduce

2012-09-11 Thread Kunaal
Have you looked at Terracotta or any other distributed caching system? Kunal -- Sent while mobile -- On Sep 11, 2012, at 9:30 PM, Stuti Awasthi stutiawas...@hcl.com wrote: Thanks Bejoy, I try to implement and if face any issues will let you know. Thanks Stuti From: Bejoy Ks

Re: How to split a sequence file

2012-09-11 Thread Robert Dyer
If the file is pre-sorted, why not just make multiple sequence files - 1 for each split? Then you don't have to compute InputSplits because the physical files are already split. On Tue, Sep 11, 2012 at 11:00 PM, Harsh J ha...@cloudera.com wrote: Hey Jason, Is the file pre-sorted? You could

Re: Question about the task assignment strategy

2012-09-11 Thread Hemanth Yamijala
Hi, I tried a similar experiment as yours but couldn't replicate the issue. I generated 64 MB files and added them to my DFS - one file from every machine, with a replication factor of 1, like you did. My block size was 64MB. I verified the blocks were located on the same machine as where I