reply: a question about dfs.replication

2013-07-01 Thread Francis . Hu
YouPeng Yang, you said that may be the answer. Thank you. 发件人: YouPeng Yang [mailto:yypvsxf19870...@gmail.com] 发送时间: Tuesday, July 02, 2013 12:52 收件人: user@hadoop.apache.org 主题: Re: reply: a question about dfs.replication HI HU and Yu Aggree with dfs.replication is a client si

reply: a question about dfs.replication

2013-07-01 Thread Francis . Hu
actually, my client side is already set to "2". 发件人: Azuryy Yu [mailto:azury...@gmail.com] 发送时间: Tuesday, July 02, 2013 12:40 收件人: user@hadoop.apache.org 主题: Re: reply: a question about dfs.replication It's not HDFS issue. dfs.replication is a client side configuration, not server side.

Re: temporary folders for YARN tasks

2013-07-01 Thread Sandy Ryza
LocalDirAllocator should help with this. You can look through MapReduce code to see how it's used. -Sandy On Mon, Jul 1, 2013 at 11:01 PM, Devaraj k wrote: > You can make use of this configuration to do the same. > > ** ** > > > > List of directories to store *localized* files

Re: Hang when add/remove a datanode into/from a 2 datanode cluster

2013-07-01 Thread sam liu
Yes, the default replication factor is 3. However, in my case, it's strange: during decommission hangs, I found some block's expected replicas is 3, but the 'dfs.replication' value in hdfs-site.xml of every cluster node is always 2 from the beginning of cluster setup. Below is my steps: 1. Install

RE: temporary folders for YARN tasks

2013-07-01 Thread Devaraj k
You can make use of this configuration to do the same. List of directories to store localized files in. An application's localized file directory will be found in: ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}. Individual containers' work di

RE: intermediate results files

2013-07-01 Thread Devaraj k
If you are 100% sure that all the node data nodes are available and healthy for that period of time, you can choose the replication factor as 1 or <3. Thanks Devaraj k From: John Lilley [mailto:john.lil...@redpoint.net] Sent: 02 July 2013 04:40 To: user@hadoop.apache.org Subject: RE: intermediat

RE: YARN tasks and child processes

2013-07-01 Thread Devaraj k
It is possible to persist the data by YARN task, you can choose whichever place you want to persist. If you choose to persist in HDFS, you need to take care deleting the data after using it. If you choose to write in local dir, you may write the data into the nm local dirs (i.e 'yarn.nodemanage

Re: Job level parameters

2013-07-01 Thread Felix . 徐
Hi Azuryy, thanks for that , but I'm still confusing about which ones can be specified for each job , some parameters are set for the framework and can not be changed by each mapreduce job. 2013/7/2 Azuryy Yu > They are all listed in the mapred-default.xml, and there are detailed > description.

Re: reply: a question about dfs.replication

2013-07-01 Thread YouPeng Yang
HI HU and Yu Aggree with dfs.replication is a client side configuration, not server side. It make the point in my last mail sense. And the cmd:hdfs dfs -setrep -R -w 2 / solve the problem that I can not change the existed file's replication value. 2013/7/2 Azuryy Yu > It's not HDF

Re: 答复: 答复: a question about dfs.replication

2013-07-01 Thread YouPeng Yang
Hi Hu It comes an point in my mind.And I tested it. Before you set the dfs.replication to 2,the block may have already existed with the original replication value 3. After you changed the value, the replication of the former block was still 3. So the file that you created after the ch

Re: reply: a question about dfs.replication

2013-07-01 Thread Azuryy Yu
It's not HDFS issue. dfs.replication is a client side configuration, not server side. so you need to set it to '2' on your client side( your application running on). THEN execute command such as : hdfs dfs -put or call HDFS API in java application. On Tue, Jul 2, 2013 at 12:25 PM, Francis.Hu

Re: WebHDFS - Jetty

2013-07-01 Thread Alejandro Abdelnur
John, you should look a the AuthenticationHandler interface in the hadoop-auth module, there are 2 implementations Pseudo and Kerberos. Hope this helps On Mon, Jul 1, 2013 at 9:15 PM, John Strange wrote: > We have an existing java module that integrates our authentication > system that works

reply: a question about dfs.replication

2013-07-01 Thread Francis . Hu
Thanks all of you, I just get the problem fixed through the command: hdfs dfs -setrep -R -w 2 / Is that an issue of HDFS ? Why do i need to execute manually a command to tell the hadoop the replication factor even it is set in hdfs-site.xml ? Thanks, Francis.Hu 发件人: Francis.Hu [mail

WebHDFS - Jetty

2013-07-01 Thread John Strange
We have an existing java module that integrates our authentication system that works with jetty, but I can't find the configuration files for Jetty so I'm assuming it's being passed and built on the fly by the data/name nodes when they start. The goal is to provide a way to authenticate users v

Re: data loss after cluster wide power loss

2013-07-01 Thread Dave Latham
Much appreciated, Suresh. Let me know if I can provide any more information or if you'd like me to open a JIRA. Dave On Mon, Jul 1, 2013 at 8:48 PM, Suresh Srinivas wrote: > Dave, > > Thanks for the detailed email. Sorry I did not read all the details you > had sent earlier completely (on my p

Re: data loss after cluster wide power loss

2013-07-01 Thread Suresh Srinivas
Dave, Thanks for the detailed email. Sorry I did not read all the details you had sent earlier completely (on my phone). As you said, this is not related to data loss related to HBase log and hsync. I think you are right; the rename operation itself might not have hit the disk. I think we should e

RE: Could we use the same identity store for user groups mapping in MIT Kerberos + OpenLDAP setup

2013-07-01 Thread Zheng, Kai
Azuryy thanks for your info. I would take time to learn about whosso. Any more comment or thought here? Thanks. Regards, Kai From: Azuryy Yu [mailto:azury...@gmail.com] Sent: Saturday, June 29, 2013 8:37 AM To: user@hadoop.apache.org Subject: Re: Could we use the same identity store for user gro

答复: 答复: a question about dfs.replication

2013-07-01 Thread Francis . Hu
Yes , it returns 2 correctly after "hdfs getconf -confkey dfs.replication" but in web page ,it is 3 as below: 发件人: yypvsxf19870706 [mailto:yypvsxf19870...@gmail.com] 发送时间: Monday, July 01, 2013 23:24 收件人: user@hadoop.apache.org 主题: Re: 答复: a question about dfs.replication Hi

Re: Job level parameters

2013-07-01 Thread Azuryy Yu
They are all listed in the mapred-default.xml, and there are detailed description. On Tue, Jul 2, 2013 at 11:14 AM, Felix.徐 wrote: > Hi all, > > Is there a detailed list or document about the job specific parameters of > mapreduce ? > > Thanks! >

Job level parameters

2013-07-01 Thread Felix . 徐
Hi all, Is there a detailed list or document about the job specific parameters of mapreduce ? Thanks!

Re: How to make a Servlet execute a Hadoop MapReduce job and get results back

2013-07-01 Thread Dhaval Shah
You can submit a map reduce job from tomcat itself in blocking mode using java API and read directly from hdfs using java API as well. No need for exec Sent from Yahoo! Mail on Android

Re: How to make a Servlet execute a Hadoop MapReduce job and get results back

2013-07-01 Thread Dhaval Shah
You can submit a map reduce job from tomcat itself in blocking mode using java API and read directly from hdfs using java API as well. No need for exec Sent from Yahoo! Mail on Android

How to make a Servlet execute a Hadoop MapReduce job and get results back

2013-07-01 Thread Huy Pham
I have a tomcat server, have several servlets, a mapreduce job (written using hadoop), also have pig installed, all sit in the same cluster as where hadoop is. Now I need my servlet to be able to execute a mapreduce program (or a pig script), and display the results returned by the mapreduce pr

Re: data loss after cluster wide power loss

2013-07-01 Thread Dave Latham
(Removing hbase list and adding hdfs-dev list as this is pretty internal stuff). Reading through the code a bit: FSDataOutputStream.close calls DFSOutputStream.close calls DFSOutputStream.closeInternal - sets currentPacket.lastPacketInBlock = true - then calls DFSOutputStream.flushInternal - e

Re: intermediate results files

2013-07-01 Thread Mohammad Tariq
I see. This difference is because of the fact that the next block of data will not be written to HDFS until the previous block was successfully written to 'all' the DNs selected for replication. This implies that higher RF means more time for the completion of a block write. Warm Regards, Tariq cl

Re: data loss after cluster wide power loss

2013-07-01 Thread Dave Latham
On Mon, Jul 1, 2013 at 4:52 PM, Azuryy Yu wrote: > how to enable "sync on block close" in HDFS? > Set dfs.datanode.synconclose to true See https://issues.apache.org/jira/browse/HDFS-1539

Re: data loss after cluster wide power loss

2013-07-01 Thread Azuryy Yu
how to enable "sync on block close" in HDFS? --Send from my Sony mobile. On Jul 2, 2013 6:47 AM, "Lars Hofhansl" wrote: > HBase is interesting here, because it rewrites old data into new files. So > a power outage by default would not just lose new data but potentially old > data as well. > You

YARN tasks and child processes

2013-07-01 Thread John Lilley
Is it possible for a child process of a YARN task to persist after the task is complete? I am looking at an alternative to a YARN auxiliary process that may be simpler to implement, if I can have a task spawn a process that persists for some time after the task finishes. Thanks, John

RE: intermediate results files

2013-07-01 Thread John Lilley
I've seen some benchmarks where replication=1 runs at about 50MB/sec and replication=3 runs at about 33MB/sec, but I can't seem to find that now. John From: Mohammad Tariq [mailto:donta...@gmail.com] Sent: Monday, July 01, 2013 5:03 PM To: user@hadoop.apache.org Subject: Re: intermediate results

RE: Assignment of data splits to mappers

2013-07-01 Thread John Lilley
Bertrand, Ah yes, I can see the wisdom of smaller tasks in (1). Given that, does MR attempt to assign multiple blocks per task when the #blocks >> #nodes? Regarding (2) we can run a simple thought experiment. It seems likely that every block will have one dangling record, requiring a small re

Re: intermediate results files

2013-07-01 Thread Mohammad Tariq
Hello John, IMHO, it doesn't matter. Your job will write the result just once. Replica creation is handled at the HDFS layer so it has nothing to with your job. Your job will still be writing at the same speed. Warm Regards, Tariq cloudfront.blogspot.com On Tue, Jul 2, 2013 at 4:16 AM, Jo

Re: data loss after cluster wide power loss

2013-07-01 Thread Lars Hofhansl
HBase is interesting here, because it rewrites old data into new files. So a power outage by default would not just lose new data but potentially old data as well. You can enable "sync on block close" in HDFS, and then at least be sure that closed blocks (and thus files) are synced to disk physi

intermediate results files

2013-07-01 Thread John Lilley
If my reducers are going to create results that are temporary in nature (consumed by the next processing stage) is it recommended to use a replication factor <3 to improve performance? Thanks john

Re: data loss after cluster wide power loss

2013-07-01 Thread Dave Latham
Thanks for the response, Suresh. I'm not sure that I understand the details properly. From my reading of HDFS-744 the hsync API would allow a client to make sure that at any point in time it's writes so far hit the disk. For example, for HBase it could apply a fsync after adding some edits to it

Re: new to hadoop

2013-07-01 Thread Mohammad Tariq
Hello Gaurav, What kind of resources do you need? You can go to the official web site to start with. Warm Regards, Tariq cloudfront.blogspot.com On Mon, Jul 1, 2013 at 1:36 PM, gaurav jain wrote: > hello everyone, > I am new to hadoop. I have to give a project p

Re: data loss after cluster wide power loss

2013-07-01 Thread Suresh Srinivas
Yes this is a known issue. The HDFS part of this was addressed in https://issues.apache.org/jira/browse/HDFS-744 for 2.0.2-alpha and is not available in 1.x release. I think HBase does not use this API yet. On Mon, Jul 1, 2013 at 3:00 PM, Dave Latham wrote: > We're running HBase over HDFS 1.0

data loss after cluster wide power loss

2013-07-01 Thread Dave Latham
We're running HBase over HDFS 1.0.2 on about 1000 nodes. On Saturday the data center we were in had a total power failure and the cluster went down hard. When we brought it back up, HDFS reported 4 files as CORRUPT. We recovered the data in question from our secondary datacenter, but I'm trying

temporary folders for YARN tasks

2013-07-01 Thread John Lilley
When a YARN app and its tasks wants to write temporary files, how does it know where to write the files? I am assuming that each task has some temporary space available, and I hope it is available across multiple disk volumes for parallel performance. Are those files cleaned up automatically afte

Re: Yarn HDFS and Yarn Exceptions when processing "larger" datasets.

2013-07-01 Thread Omkar Joshi
Also due you see any exception in RM / NM logs? Thanks, Omkar Joshi *Hortonworks Inc.* On Mon, Jul 1, 2013 at 11:19 AM, Omkar Joshi wrote: > Hi, > > As I don't know your complete AM code and how your containers are > communicating with each other...Certain things w

Re: Yarn HDFS and Yarn Exceptions when processing "larger" datasets.

2013-07-01 Thread Omkar Joshi
Hi, As I don't know your complete AM code and how your containers are communicating with each other...Certain things which might help you in debugging where you are starting your RM (is it really running on 8030 are you sure there is no previously started RM still running there?) Also in y

Re: 答复: a question about dfs.replication

2013-07-01 Thread yypvsxf19870706
Hi Could you please get the property value by using : hdfs getconf -confkey dfs.replication. 发自我的 iPhone 在 2013-7-1,15:51,Francis.Hu 写道: > > Actually, My java client is running with the same configuration as the > hadoop's . The dfs.replication is already set as 2 in my hadoop's > co

Re: hiveserver2 Thrift Interface With Perl

2013-07-01 Thread Dave Cardwell
I just wanted to update the list to let future searchers know that I found a solution—David Morel of booking.com kindly shared some code with me to get this working, which he has now released as a Perl library that has been working well: https://github.com/dmorel/Thrift-API-HiveClient2 https://met

Getting hadoop counters from oozie id

2013-07-01 Thread souri datta
Hi, If I know the workflow id of an oozie job, is it possible to get the hadoop counters associated with the M/R job? Thanks, Souri

new to hadoop

2013-07-01 Thread gaurav jain
hello everyone, I am new to hadoop. I have to give a project proposal as a part of ICFOSS-ASF mentoring program. I want to start with hadoop. For that purpose could I get some initial help resources, so that I am in a position to understand and give a proposal within 7-8 days. Thank You Gaurav

答复: a question about dfs.replication

2013-07-01 Thread Francis . Hu
Actually, My java client is running with the same configuration as the hadoop's . The dfs.replication is already set as 2 in my hadoop's configuration. So i think the dfs.replication is already overrided by my configuration in hdfs-site.xml. but seems it doesn't work even i overrided the para

Re: a question about dfs.replication

2013-07-01 Thread Емельянов Борис
On 01.07.2013 10:19, Francis.Hu wrote: Hi, All I am installing a cluster with Hadoop 2.0.5-alpha. I have one namenode and two datanodes. The dfs.replication is set as 2 in hdfs-site.xml. After all configuration work is done, I started all nodes. Then I saved a file into HDFS through java cli