Re: [help]how to stop HDFS

2011-11-30 Thread Steve Loughran
On 30/11/11 04:29, Nitin Khandelwal wrote: Thanks, I missed the sbin directory, was using the normal bin directory. Thanks, Nitin On 30 November 2011 09:54, Harsh Jha...@cloudera.com wrote: Like I wrote earlier, its in the $HADOOP_HOME/sbin directory. Not the regular bin/ directory. On Wed,

Re: How is network distance for nodes calculated

2011-11-23 Thread Steve Loughran
On 22/11/11 21:04, Edmon Begoli wrote: I am reading Hadoop Definitive Guide 2nd Edition and I am struggling to figure out the exact Hadoop's formula for network distance calculation (page 64/65). (I have my guesses, but I would like to know the exact formula) It's implemented in

Re: Adding a new platform support to Hadoop

2011-11-17 Thread Steve Loughran
On 17/11/11 15:02, Amir Sanjar wrote: Is there any specific development, build, and packaging guidelines to add support for a new hardware platform, in this case PPC64, to hadoop? Best Regards Amir Sanjar Linux System Management Architect and Lead IBM Senior Software Engineer Phone#

Re: Cannot access JobTracker GUI (port 50030) via web browser while running on Amazon EC2

2011-10-25 Thread Steve Loughran
On 24/10/11 23:46, Mark question wrote: Thank you, I'll try it. Mark On Mon, Oct 24, 2011 at 1:50 PM, Sameer Farooquicassandral...@gmail.comwrote: Mark, We figured it out. It's an issue with RedHat's IPTables. You have to open up those ports: vim /etc/sysconfig/iptables Of course, if you

Re: execute hadoop job from remote web application

2011-10-20 Thread Steve Loughran
On 18/10/11 17:56, Harsh J wrote: Oleg, It will pack up the jar that contains the class specified by setJarByClass into its submission jar and send it up. Thats the function of that particular API method. So, your deduction is almost right there :) On Tue, Oct 18, 2011 at 10:20 PM, Oleg

Re: automatic node discovery

2011-10-18 Thread Steve Loughran
On 18/10/11 10:48, Petru Dimulescu wrote: Hello, I wonder how do you guys see the problem of automatic node discovery: having, for instance, a couple of hadoops, with no configuration explicitly set whatsoever, simply discover each other and work together, like Gridgain does: just fire up two

Re: execute hadoop job from remote web application

2011-10-18 Thread Steve Loughran
On 18/10/11 11:40, Oleg Ruchovets wrote: Hi , what is the way to execute hadoop job on remote cluster. I want to execute my hadoop job from remote web application , but I didn't find any hadoop client (remote API) to do it. Please advice. Oleg the Job class lets you build up and submit jobs

Re: hadoop knowledge gaining

2011-10-10 Thread Steve Loughran
On 07/10/11 15:25, Jignesh Patel wrote: Guys, I am able to deploy the first program word count using hadoop. I am interesting exploring more about hadoop and Hbase and don't know which is the best way to grasp both of them. I have hadoop in action but it has older api. Actually the API

Re: FileSystem closed

2011-09-30 Thread Steve Loughran
On 29/09/2011 18:02, Joey Echeverria wrote: Do you close your FileSystem instances at all? IIRC, the FileSystem instance you use is a singleton and if you close it once, it's closed for everybody. My guess is you close it in your cleanup method and you have JVM reuse turned on. I've hit this

Re: Hadoop performance benchmarking with TestDFSIO

2011-09-29 Thread Steve Loughran
On 28/09/11 22:45, Sameer Farooqui wrote: Hi everyone, I'm looking for some recommendations for how to get our Hadoop cluster to do faster I/O. Currently, our lab cluster is 8 worker nodes and 1 master node (with NameNode and JobTracker). Each worker node has: - 48 GB RAM - 16 processors

Re: Is SAN storage is a good option for Hadoop ?

2011-09-29 Thread Steve Loughran
On 29/09/11 13:28, Brian Bockelman wrote: On Sep 29, 2011, at 1:50 AM, praveenesh kumar wrote: Hi, I want to know can we use SAN storage for Hadoop cluster setup ? If yes, what should be the best pratices ? Is it a good way to do considering the fact the underlining power of Hadoop is

Re: difference between development and production platform???

2011-09-28 Thread Steve Loughran
On 28/09/11 04:19, Hamedani, Masoud wrote: Special Thanks for your help Arko, You mean in Hadoop, NameNode, DataNodes, JobTracker, TaskTrackers and all the clusters should deployed on Linux machines??? We have lots of data (on windows OS) and code (written in C#) for data mining, we wana to use

Re: hadoop question using VMWARE

2011-09-28 Thread Steve Loughran
On 28/09/11 08:37, N Keywal wrote: For example: - It's adding two layers (windows linux), that can both fail, especially under heavy workload (and hadoop is built to use all the resources available). They will need to be managed as well (software upgrades, hardware support...), it's an extra

Re: Environment consideration for a research on scheduling

2011-09-26 Thread Steve Loughran
On 23/09/11 16:09, GOEKE, MATTHEW (AG/1000) wrote: If you are starting from scratch with no prior Hadoop install experience I would configure stand-alone, migrate to pseudo distributed and then to fully distributed verifying functionality at each step by doing a simple word count run. Also,

Re: Can we replace namenode machine with some other machine ?

2011-09-22 Thread Steve Loughran
On 22/09/11 05:42, praveenesh kumar wrote: Hi all, Can we replace our namenode machine later with some other machine. ? Actually I got a new server machine in my cluster and now I want to make this machine as my new namenode and jobtracker node ? Also Does Namenode/JobTracker machine's

Re: Can we replace namenode machine with some other machine ?

2011-09-22 Thread Steve Loughran
On 22/09/11 17:13, Michael Segel wrote: I agree w Steve except on one thing... RAID 5 Bad. RAID 10 (1+0) good. Sorry this goes back to my RDBMs days where RAID 5 will kill your performance and worse... sorry, I should have said RAID =5. The main thing is you don't want the NN data lost.

Re: risks of using Hadoop

2011-09-21 Thread Steve Loughran
On 20/09/11 22:52, Michael Segel wrote: PS... There's this junction box in your machine room that has this very large on/off switch. If pulled down, it will cut power to your cluster and you will lose everything. Now would you consider this a risk? Sure. But is it something you should really

Re: risks of using Hadoop

2011-09-21 Thread Steve Loughran
On 21/09/11 11:30, Dieter Plaetinck wrote: On Wed, 21 Sep 2011 11:21:01 +0100 Steve Loughranste...@apache.org wrote: On 20/09/11 22:52, Michael Segel wrote: PS... There's this junction box in your machine room that has this very large on/off switch. If pulled down, it will cut power to your

Re: risks of using Hadoop

2011-09-19 Thread Steve Loughran
On 18/09/11 02:32, Tom Deutsch wrote: Not trying to give you a hard time Brian - we just have different users/customers/expectations on us. Tom, I suggest you read Apache goes realtime at facebook and consider how you could adopt those features -and how to contribute them back to the ASF.

Re: risks of using Hadoop

2011-09-19 Thread Steve Loughran
On 18/09/11 03:37, Michael Segel wrote: 2) Data Loss. You can mitigate this as well. Do I need to go through all of the options and DR/BCP planning? Sure there's always a chance that you have some Luser who does something brain dead. This is true of all databases and systems. (I know I can

Re: Hadoop with Netapp

2011-09-01 Thread Steve Loughran
On 25/08/11 08:20, Sagar Shukla wrote: Hi Hakan, Please find my comments inline in blue : -Original Message- From: Hakan (c)lter [mailto:hakanil...@gmail.com] Sent: Thursday, August 25, 2011 12:28 PM To: common-user@hadoop.apache.org Subject: Hadoop with Netapp Hi

Re: Turn off all Hadoop logs?

2011-09-01 Thread Steve Loughran
On 29/08/11 20:31, Frank Astier wrote: Is it possible to turn off all the Hadoop logs simultaneously? In my unit tests, I don’t want to see the myriad “INFO” logs spewed out by various Hadoop components. I’m using: ((Log4JLogger) DataNode.LOG).getLogger().setLevel(Level.OFF);

Re: Namenode Scalability

2011-08-17 Thread Steve Loughran
On 17/08/11 08:48, Dieter Plaetinck wrote: Hi, On Wed, 10 Aug 2011 13:26:18 -0500 Michel Segelmichael_se...@hotmail.com wrote: This sounds like a homework assignment than a real world problem. Why? just wondering. The question proposed a data rate comparable with Yahoo, Google and

Re: hadoop cluster mode not starting up

2011-08-16 Thread Steve Loughran
On 16/08/11 11:02, A Df wrote: Hello All: I used a combination of tutorials to setup hadoop but most seems to be using either an old version of hadoop or only using 2 machines for the cluster which isn't really a cluster. Does anyone know of a good tutorial which setups multiple nodes for a

Re: Help on DFSClient

2011-08-13 Thread Steve Loughran
On 06/08/2011 20:41, jagaran das wrote: I am keeping a Stream Open and writing through it using a multithreaded application. The application is in a different box and I am connecting to NN remotely. I was using FileSystem and getting same error and now I am trying DFSClient and getting the

Re: Namenode Scalability

2011-08-13 Thread Steve Loughran
On 10/08/2011 08:58, jagaran das wrote: In my current project we are planning to streams of data to Namenode (20 Node Cluster). Data Volume would be around 1 PB per day. But there are application which can publish data at 1GBPS. That's Gigabyte/s or Gigabit/s? Few queries: 1. Can a

Re: Invalid link http://repo2.maven.org/maven2/org/apache/ivy/ivy/2.1.0/ivy-2.1.0.jar during ivy download whiling mumak build.

2011-08-02 Thread Steve Loughran
On 30/07/11 06:30, arun k wrote: Hi all ! I have added the following code to build.xml and tried to build : $ant package. I have also tried to remove removed the entire ivy2 (~/.ivy2/* ) directory and rebuild but couldn't succeed. setproxy proxyhost=192.168.0.90 proxyport=8080

Re: The best architecture for EC2/Hadoop interface?

2011-08-02 Thread Steve Loughran
On 02/08/11 05:09, Mark Kerzner wrote: Hi, I want to give my users a GUI that would allow them to start Hadoop clusters and run applications that I will provide on the AMIs. What would be a good approach to make it simple for the user? Should I write a Java Swing app that will wrap around the

Re: Submitting and running hadoop jobs Programmatically

2011-07-27 Thread Steve Loughran
On 27/07/11 05:55, madhu phatak wrote: Hi I am submitting the job as follows java -cp Nectar-analytics-0.0.1-SNAPSHOT.jar:/home/hadoop/hadoop-for-nectar/hadoop-0.21.0/conf/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_COMMON_HOME/* com.zinnia.nectar.regression.hadoop.primitive.jobs.SigmaJob

Re: Job progress not showing in Hadoop Tasktracker web interface

2011-07-21 Thread Steve Loughran
On 20/07/11 06:11, Teng, James wrote: You can't run a hadoop job in eclipse, you have to set up an environment on linux system. Maybe you can try to install it on WMware linux system and run the job in pseudo-distributed system. Actually you can bring up a MiniMRCluster in your JUnit test

Re: error of loading logging class

2011-07-21 Thread Steve Loughran
On 20/07/11 07:16, Juwei Shi wrote: Hi, We faced a problem of loading logging class when start the name node. It seems that hadoop can not find commons-logging-*.jar We have tried other commons-logging-1.0.4.jar and commons-logging-api-1.0.4.jar. It does not work! The following are error

Re: Which release to use?

2011-07-19 Thread Steve Loughran
On 19/07/11 12:44, Rita wrote: Arun, I second Joeś comment. Thanks for giving us a heads up. I will wait patiently until 0.23 is considered stable. API-wise, 0.21 is better. I know that as I'm working with 0.20.203 right now, and it is a step backwards. Regarding future releases, the best

Re: Which release to use?

2011-07-17 Thread Steve Loughran
On 16/07/2011 16:53, Rita wrote: I am curious about the IBM product BigInishgts. Where can we download it? It seems we have to register to download it? I think you have to pay to use it

Re: Cluster Tuning

2011-07-15 Thread Steve Loughran
On 08/07/2011 16:25, Juan P. wrote: Here's another thought. I realized that the reduce operation in my map/reduce jobs is a flash. But it goes really slow until the mappers end. Is there a way to configure the cluster to make the reduce wait for the map operations to complete? Specially

Re: Which release to use?

2011-07-15 Thread Steve Loughran
On 15/07/2011 15:58, Michael Segel wrote: Unfortunately the picture is a bit more confusing. Yahoo! is now HortonWorks. Their stated goal is to not have their own derivative release but to sell commercial support for the official Apache release. So those selling commercial support are:

Re: Which release to use?

2011-07-15 Thread Steve Loughran
On 15/07/2011 18:06, Arun C Murthy wrote: Apache Hadoop is a volunteer driven, open-source project. The contributors to Apache Hadoop, both individuals and folks across a diverse set of organizations, are committed to driving the project forward and making timely releases - see discussion on

Re: Hadoop cluster hardware details for big data

2011-07-06 Thread Steve Loughran
On 06/07/11 11:43, Karthik Kumar wrote: Hi, Has anyone here used hadoop to process more than 3TB of data? If so we would like to know how many machines you used in your cluster and about the hardware configuration. The objective is to know how to handle huge data in Hadoop cluster. This is

Re: Hadoop cluster hardware details for big data

2011-07-06 Thread Steve Loughran
On 06/07/11 11:43, Karthik Kumar wrote: Hi, Has anyone here used hadoop to process more than 3TB of data? If so we would like to know how many machines you used in your cluster and about the hardware configuration. The objective is to know how to handle huge data in Hadoop cluster. Actually,

Re: Hadoop cluster hardware details for big data

2011-07-06 Thread Steve Loughran
On 06/07/11 13:18, Michel Segel wrote: Wasn't the answer 42? ;-P 42 = 40 + NN +2ary NN, assuming the JT runs on 2ary or on one of the worker nodes Looking at your calc... You forgot to factor in the number of slots per node. So the number is only a fraction. Assume 10 slots per node. (10

Re: error in reduce task

2011-06-27 Thread Steve Loughran
On 24/06/11 18:16, Niels Boldt wrote: Hi, I'm running nutch in pseudo cluster, eg all daemons are running on the same server. I'm writing to the hadoop list, as it looks like a problem related to hadoop Some of my jobs partially fails and in the error log I get output like 2011-06-24

Re: Upgrading namenode/secondary node hardware

2011-06-17 Thread Steve Loughran
On 16/06/11 14:19, MilleBii wrote: But if my Filesystem is up running fine... do I have to worry at all or will the copy (ftp transfer) of hdfs will be enough. I'm not going to make any predictions there as if/when things go wrong -you do need to shut down the FS before the move -you

Re: Upgrading namenode/secondary node hardware

2011-06-16 Thread Steve Loughran
On 15/06/11 15:54, MilleBii wrote: Thx. #1 don't understand the edit logs remark. well, that's something you need to work on as its the key to keeping your cluster working. The edit log is the journal of changes made to a namenode, which gets streamed to HDD and your secondary Namenode.

Re: Upgrading namenode/secondary node hardware

2011-06-15 Thread Steve Loughran
On 14/06/11 22:01, MilleBii wrote: I want/need to upgrade my namenode/secondary node hardware. Actually also acts as one of the datanodes. Could not find any how-to guides. So what is the process to switch from one hardware to the next. 1. For HDFS data : is it just a matter of copying all the

Re: Hadoop on windows with bat and ant scripts

2011-06-14 Thread Steve Loughran
On 13/06/11 15:27, Bible, Landy wrote: On 06/13/2011 07:52 AM, Loughran, Steve wrote: On 06/10/2011 03:23 PM, Bible, Landy wrote: I'm currently running HDFS on Windows 7 desktops. I had to create a hadoop.bat that provided the same functionality of the shell scripts, and some Java Service

Re: Hadoop on windows with bat and ant scripts

2011-06-13 Thread Steve Loughran
On 06/10/2011 03:23 PM, Bible, Landy wrote: Hi Raja, I'm currently running HDFS on Windows 7 desktops. I had to create a hadoop.bat that provided the same functionality of the shell scripts, and some Java Service Wrapper configs to run the DataNodes and NameNode as windows services. Once I

Re: NameNode heapsize

2011-06-13 Thread Steve Loughran
On 06/10/2011 05:31 PM, si...@ugcv.com wrote: I would add more RAM for sure but there's hardware limitation. How if the motherboard couldn't support more than ... say 128GB ? seems I can't keep adding RAM to resolve it. compressed pointers, do u mean turning on jvm compressed reference ? I

Re: Hadoop on windows with bat and ant scripts

2011-06-13 Thread Steve Loughran
On 06/12/2011 03:01 AM, Raja Nagendra Kumar wrote: Hi, I see hadoop would need unix (on windows with Cygwin) to run. It would be much nice if Hadoop gets away from the shell scripts though appropriate ant scripts or with java Admin Console kind of model. Then it becomes lighter for

Re: Why inter-rack communication in mapreduce slow?

2011-06-07 Thread Steve Loughran
On 06/06/2011 02:40 PM, John Armstrong wrote: On Mon, 06 Jun 2011 09:34:56 -0400,dar...@ontrenet.com wrote: Yeah, that's a good point. In fact, it almost makes me wonder if an ideal setup is not only to have each of the main control daemons on their own nodes, but to put THOSE nodes on

Re: Hadoop Cluster Multi-datacenter

2011-06-07 Thread Steve Loughran
On 06/07/2011 06:07 AM, sanjeev.ta...@us.pwc.com wrote: Hello, I wanted to know if anyone has any tips or tutorials on howto install the hadoop cluster on multiple datacenters Nobody has come out and said they've built a single HDFS filesystem from multiple sites, primarly because the

Re: NameNode is starting with exceptions whenever its trying to start datanodes

2011-06-07 Thread Steve Loughran
On 06/07/2011 10:50 AM, praveenesh kumar wrote: The logs say The ratio of reported blocks 0.9091 has not reached the threshold 0.9990. Safe mode will be turned off automatically. not enough datanodes reported in, or they are missing data

Re: Why inter-rack communication in mapreduce slow?

2011-06-06 Thread Steve Loughran
On 06/06/11 08:22, elton sky wrote: hello everyone, As I don't have experience with big scale cluster, I cannot figure out why the inter-rack communication in a mapreduce job is significantly slower than intra-rack. I saw cisco catalyst 4900 series switch can reach upto 320Gbps forwarding

Re: Starting a Hadoop job outside the cluster

2011-06-06 Thread Steve Loughran
My Job submit code is http://smartfrog.svn.sourceforge.net/viewvc/smartfrog/trunk/core/hadoop-components/hadoop-ops/src/org/smartfrog/services/hadoop/operations/components/submitter/ something to run tool classes

Re: Identifying why a task is taking long on a given hadoop node

2011-06-05 Thread Steve Loughran
On 03/06/2011 12:24, Mayuresh wrote: Hi, I am really having a hard time debugging this. I have a hadoop cluster and one of the maps is taking time. I checked the datanode logs and can see no activity for around 10 minutes! The usual cause here is imminent disk failure, as reads start to take

Re: java.lang.NoClassDefFoundError: com.sun.security.auth.UnixPrincipal

2011-05-27 Thread Steve Loughran
On 05/26/2011 07:45 PM, subhransu wrote: Hello Geeks, I am a new bee to use hadoop and i am currently installed hadoop-0.20.203.0 I am running the sample programs part of this package but getting this error Any pointer to fix this ??? ~/Hadoop/hadoop-0.20.203.0 788 bin/hadoop jar

Re: Hadoop and WikiLeaks

2011-05-23 Thread Steve Loughran
On 23/05/11 01:10, Edward Capriolo wrote: Correct. But it is a place to discuss changing the content of http://hadoop.apache.org which is what I am advocating. Todd's going to fix it. I just copied and pasted in the newspaper quote: it's not that I wanted to make any statement whatsoever,

Re: Exception in thread AWT-EventQueue-0 java.lang.NullPointerException

2011-05-17 Thread Steve Loughran
On 16/05/11 21:12, Lạc Trung wrote: I'm using Hadoop-0.21. --- hut.edu.vn At the top, it's your code, so you get to fix it. The good thing about open source is you can go all the way in. This is what I would do in the same situation -Grab the 0.21 source JAR -add it your IDE -have a look

Re: Cluster hard drive ratios

2011-05-06 Thread Steve Loughran
On 05/05/11 19:14, Matthew Foley wrote: a node (or rack) is going down, don't replicate == DataNode Decommissioning. This feature is available. The current usage is to add the hosts to be decommissioned to the exclusion file named in dfs.hosts.exclude, then use DFSAdmin to invoke

Re: Cluster hard drive ratios

2011-05-05 Thread Steve Loughran
On 04/05/11 19:59, Matt Goeke wrote: Mike, Thanks for the response. It looks like this discussion forked on the CDH list so I have two different conversations now. Also, you're dead on that one of the presentations I was referencing was Ravi's. With your setup I agree that it would have made

Re: Applications creates bigger output than input?

2011-05-02 Thread Steve Loughran
On 30/04/2011 05:31, elton sky wrote: Thank you for suggestions: Weblog analysis, market basket analysis and generating search index. I guess for these applications we need more reduces than maps, for handling large intermediate output, isn't it. Besides, the input split for map should be

Re: Execution time.

2011-04-27 Thread Steve Loughran
On 26/04/11 14:16, real great.. wrote: Thanks a lot.I have managed to do it. And my final year project is on power aware Hadoop. i do realise its against ethics to get the code that way..:) Good. What do you mean by power aware -awareness of the topology of UPS sources inside a datacentre

Re: Cluster hardware question

2011-04-27 Thread Steve Loughran
On 26/04/11 14:55, Xiaobo Gu wrote: Hi, People say a balanced server configration is as following: 2 4 Core CPU, 24G RAM, 4 1TB SATA Disks But we have been used to use storages servers with 24 1T SATA Disks, we are wondering will Hadoop be CPU bounded if this kind of servers are used.

Re: Unsplittable files on HDFS

2011-04-27 Thread Steve Loughran
On 27/04/11 10:48, Niels Basjes wrote: Hi, I did the following with a 1.6GB file hadoop fs -Ddfs.block.size=2147483648 -put /home/nbasjes/access-2010-11-29.log.gz /user/nbasjes and I got Total number of blocks: 1 4189183682512190568:10.10.138.61:50010

Re: Seeking Advice on Upgrading a Cluster

2011-04-26 Thread Steve Loughran
On 21/04/11 18:33, Geoffry Roberts wrote: What will give me the most bang for my buck? - Should I bring all machines up to 8G of memory? or is 4G good enough? (8 is the max.) depends on whether your code is running out of memory - Should I double up the NICs and use LACP? I

Re: Fixing a bad HD

2011-04-26 Thread Steve Loughran
On 26/04/11 05:20, Bharath Mundlapudi wrote: Right, if you have a hardware which supports hot-swappable disk, this might be easiest one. But still you will need to restart the datanode to detect this new disk. There is an open Jira on this. -Bharath That'll be HDFS-664

Re: Fixing a bad HD

2011-04-26 Thread Steve Loughran
On 26/04/11 05:20, Bharath Mundlapudi wrote: Right, if you have a hardware which supports hot-swappable disk, this might be easiest one. But still you will need to restart the datanode to detect this new disk. There is an open Jira on this. -Bharath Correction, there is a patch up there

Re: HOD exception: java.io.IOException: No valid local directories in property: mapred.local.dir

2011-04-12 Thread Steve Loughran
On 11/04/2011 16:48, Boyu Zhang wrote: Exception in thread main org.apache.hadoop.ipc.RemoteException: java.io.IOException: No valid local directories in property: mapred.local.dir The job tracker can't find any of the local filesystem directories listed in the mapred.local.dir property,

Re: Reg HDFS checksum

2011-04-12 Thread Steve Loughran
On 12/04/2011 07:06, Josh Patterson wrote: If you take a look at: https://github.com/jpatanooga/IvoryMonkey/blob/master/src/tv/floe/IvoryMonkey/hadoop/fs/ExternalHDFSChecksumGenerator.java you'll see a single process version of what HDFS does under the hood, albeit in a highly distributed

Re: Hadoop Pipes Error

2011-03-31 Thread Steve Loughran
On 31/03/11 07:53, Adarsh Sharma wrote: Thanks Amareshwari, here is the posting : The *nopipe* example needs more documentation. It assumes that it is run with the InputFormat from src/test/org/apache/*hadoop*/mapred/*pipes*/ *WordCountInputFormat*.java, which has a very specific input split

Re: does counters go the performance down seriously?

2011-03-29 Thread Steve Loughran
On 28/03/11 23:34, JunYoung Kim wrote: hi, this linke is about hadoop usage for the good practices. http://developer.yahoo.com/blogs/hadoop/posts/2010/08/apache_hadoop_best_practices_a/ by Arun C Murthy if I want to use about 50,000 counters for a job, does it cause serious performance

Re: ant version problem

2011-03-28 Thread Steve Loughran
On 27/03/11 21:02, Daniel McEnnis wrote: Steve, Here it is: user@ubuntu:~/src/trunk$ ant -diagnostics --- Ant diagnostics report --- Apache Ant version 1.8.0 compiled on May 9 2010 --- Implementation Version

Re: observe the effect of changes to Hadoop

2011-03-27 Thread Steve Loughran
On 25/03/2011 14:10, bikash sharma wrote: Hi, For my research project, I need to add a couple of functions in JobTracker.java source file to include additional information about TaskTrackers resource usage through heartbeat messages. I made those changes to JobTracker.java file. However, I am

Re: ant version problem

2011-03-27 Thread Steve Loughran
On 27/03/2011 02:01, Daniel McEnnis wrote: Dear Hadoop, Which version of ant do I need to keep the hadoop build from failing. Netbeans ant works as well as eclipse ant works. However, ant 1.8.2 does not, nor does the default ant from Ubuntu 10.10. Snippet from failure to follow: 1.8.2 will

Re: CDH and Hadoop

2011-03-24 Thread Steve Loughran
On 23/03/11 15:32, Michael Segel wrote: Rita, It sounds like you're only using Hadoop and have no intentions to really get into the internals. I'm like most admins/developers/IT guys and I'm pretty lazy. I find it easier to set up the yum repository and then issue the yum install hadoop

Re: Creating bundled jar files for running under hadoop

2011-03-23 Thread Steve Loughran
On 22/03/11 13:34, Andy Doddington wrote: I am trying to create a bundled jar file for running using the hadoop ‘jar’ command. However, when I try to do this it fails to find the jar files and other resources that I have placed into the jar (pointed at by the Class-Path property in the

Re: decommissioning node woes

2011-03-21 Thread Steve Loughran
On 19/03/11 16:00, Ted Dunning wrote: Unfortunately this doesn't help much because it is hard to get the ports to balance the load. On Fri, Mar 18, 2011 at 8:30 PM, Michael Segelmichael_se...@hotmail.comwrote: With a 1GBe port, you could go 100Mbs for the bandwidth limit. If you bond your

Re: Installing Hadoop on Debian Squeeze

2011-03-21 Thread Steve Loughran
On 21/03/11 09:00, Dieter Plaetinck wrote: On Thu, 17 Mar 2011 19:33:02 +0100 Thomas Kochtho...@koch.ro wrote: Currently my advise is to use the Debian packages from cloudera. That's the problem, it appears there are none. Like I said in my earlier mail, Debian is not in Cloudera's list of

Public Talks from Yahoo! and LinkedIn in Bristol, England, Friday Mar 25

2011-03-18 Thread Steve Loughran
This isn't relevant for people who don't live in or near South England or Wales, but for those that do, I'm pleased to announce that Owen O'Malley and Sanjay Radia of Yahoo! and Jakob Homan of LinkedIn will all be giving public talks on Hadoop on Friday March 25 at HP Laboratories, in

Re: Hadoop code base splits

2011-03-17 Thread Steve Loughran
On 17/03/11 07:05, Matthew John wrote: Hi, Can someone provide me some pointers on the following details of Hadoop code base: 1) breakdown of HDFS code base (approximate lines of code) into following modules: - HDFS at the Datanodes - Namenode - Zookeeper

Re: Load testing in hadoop

2011-03-15 Thread Steve Loughran
On 15/03/11 04:59, Kannu wrote: Please tell me how to use synthetic load generator in hadoop or suggest me any other way of load testing in hadoop cluster. thanks, kannu terasort is the one most people use, as it generates its own datasets. Otherwise you need a few TB of data and some

Re: Speculative execution

2011-03-03 Thread Steve Loughran
On 02/03/11 21:01, Keith Wiley wrote: I realize that the intended purpose of speculative execution is to overcome individual slow tasks...and I have read that it explicitly is *not* intended to start copies of a task simultaneously and to then race them, but rather to start copies of tasks

Re: recommendation on HDDs

2011-02-14 Thread Steve Loughran
On 10/02/11 22:25, Michael Segel wrote: Shrinivas, Assuming you're in the US, I'd recommend the following: Go with 2TB 7200 SATA hard drives. (Not sure what type of hardware you have) What we've found is that in the data nodes, there's an optimal configuration that balances price versus

Re: recommendation on HDDs

2011-02-14 Thread Steve Loughran
On 12/02/11 16:26, Michael Segel wrote: All, I'd like to clarify somethings... First the concept is to build out a cluster of commodity hardware. So when you do your shopping you want to get the most bang for your buck. That is the 'sweet spot' that I'm talking about. When you look at your

Re: CUDA on Hadoop

2011-02-10 Thread Steve Loughran
On 09/02/11 17:31, He Chen wrote: Hi sharma I shared our slides about CUDA performance on Hadoop clusters. Feel free to modified it, please mention the copyright! This is nice. If you stick it up online you should link to it from the Hadoop wiki pages -maybe start a hadoop+cuda page and

Re: hadoop infrastructure questions (production environment)

2011-02-09 Thread Steve Loughran
On 08/02/11 15:45, Oleg Ruchovets wrote: Hi , we are going to production and have some questions to ask: We are using 0.20_append version (as I understand it is hbase 0.90 requirement). 1) Currently we have to process 50GB text files per day , it can grow to 150GB -- what

Re: CUDA on Hadoop

2011-02-09 Thread Steve Loughran
On 09/02/11 13:58, Harsh J wrote: You can check-out this project which did some work for Hama+CUDA: http://code.google.com/p/mrcl/ Amazon let you bring up a Hadoop cluster on machines with GPUs you can code against, but I haven't heard of anyone using it. The big issue is bandwidth; it just

Re: How to speed up of Map/Reduce job?

2011-02-01 Thread Steve Loughran
On 01/02/11 08:19, Igor Bubkin wrote: Hello everybody I have a problem. I installed Hadoop on 2-nodes cluster and run Wordcount example. It takes about 20 sec for processing of 1,5MB text file. We want to use Map/Reduce in real time (interactive: by user's requests). User can't wait for his

Re: Hadoop is for whom? Data architect or Java Architect or All

2011-01-27 Thread Steve Loughran
On 27/01/11 07:28, Manuel Meßner wrote: Hi, you may want to take a look into the streaming api, which allows users to write there map-reduce jobs with any language, which is capable of writing to stdout and reading from stdin. http://hadoop.apache.org/mapreduce/docs/current/streaming.html

Re: Best way to limit the number of concurrent tasks per job on hadoop 0.20.2

2011-01-27 Thread Steve Loughran
On 27/01/11 10:51, Renaud Delbru wrote: Hi Koji, thanks for sharing the information, Is the 0.20-security branch planned to be a official release at some point ? Cheers If you can play with the beta you can see that it works for you and if not, get bugs fixed during the beta cycle

Re: Why Hadoop is slow in Cloud

2011-01-21 Thread Steve Loughran
On 20/01/11 23:24, Marc Farnum Rendino wrote: On Wed, Jan 19, 2011 at 2:50 PM, Edward Caprioloedlinuxg...@gmail.com wrote: As for virtualization,paravirtualization,emulation.(whatever ulization) Wow; that's a really big category. There are always a lot of variables, but the net result

Re: Why Hadoop is slow in Cloud

2011-01-21 Thread Steve Loughran
On 21/01/11 09:20, Evert Lammerts wrote: Even with performance hit, there are still benefits running Hadoop this way -as you only consume/pay for CPU time you use, if you are only running batch jobs, its lower cost than having a hadoop cluster that is under- used. -if your data is stored

Re: namenode format error during setting up hadoop using eclipse in windows 7

2011-01-20 Thread Steve Loughran
On 20/01/11 10:26, arunk786 wrote: Arun K@sairam ~/hadoop-0.19.2 $ bin/hadoop namenode -format cygwin warning: MS-DOS style path detected: C:\cygwin\home\ARUNK~1\HADOOP~1.2\/build/native Preferred POSIX equivalent is: /home/ARUNK~1/HADOOP~1.2/build/native CYGWIN environment variable

Re: How to replace Jetty-6.1.14 with Jetty 7 in Hadoop?

2011-01-19 Thread Steve Loughran
On 18/01/11 19:58, Koji Noguchi wrote: Try moving up to v 6.1.25, which should be more straightforward. FYI, when we tried 6.1.25, we got hit by a deadlock. http://jira.codehaus.org/browse/JETTY-1264 Koji Interesting. Given that there is now 6.1.26 out, that would be the one to play with.

Re: Why Hadoop is slow in Cloud

2011-01-17 Thread Steve Loughran
On 17/01/11 04:11, Adarsh Sharma wrote: Dear all, Yesterday I performed a kind of testing between *Hadoop in Standalone Servers* *Hadoop in Cloud. *I establish a Hadoop cluster of 4 nodes ( Standalone Machines ) in which one node act as Master ( Namenode , Jobtracker ) and the remaining nodes

Re: How to replace Jetty-6.1.14 with Jetty 7 in Hadoop?

2011-01-17 Thread Steve Loughran
On 16/01/11 09:41, xiufeng liu wrote: Hi, In my cluster, Hadoop somehow cannot work, and I found that it was due to the Jetty-6.1.14 which is not able to start up. However, Jetty 7 can work in my cluster. Could any body know how to replace Jetty6.1.14 with Jetty7? Thanks afancy The switch

Re: TeraSort question.

2011-01-13 Thread Steve Loughran
On 11/01/11 16:40, Raj V wrote: Ted Thanks. I have all the graphs I need that include, map reduce timeline, system activity for all the nodes when the sort was running. I will publish them once I have them in some presentable format., For legal reasons, I really don't want to send the

Re: Why Hadoop uses HTTP for file transmission between Map and Reduce?

2011-01-13 Thread Steve Loughran
On 13/01/11 08:34, li ping wrote: That is also my concerns. Is it efficient for data transmission. It's long lived TCP connections, reasonably efficient for bulk data xfer, has all the throttling of TCP built in, and comes with some excellently debugged client and server code in the form of

Re: Hadoop Certification Progamme

2010-12-15 Thread Steve Loughran
On 09/12/10 03:40, Matthew John wrote: Hi all,. Is there any valid Hadoop Certification available ? Something which adds credibility to your Hadoop expertise. Well, there's always providing enough patches to the code to get commit rights :)

Re: Hadoop/Elastic MR on AWS

2010-12-15 Thread Steve Loughran
On 10/12/10 06:14, Amandeep Khurana wrote: Mark, Using EMR makes it very easy to start a cluster and add/reduce capacity as and when required. There are certain optimizations that make EMR an attractive choice as compared to building your own cluster out. Using EMR also ensures you are using a

Re: Question from a Desperate Java Newbie

2010-12-15 Thread Steve Loughran
On 10/12/10 09:08, Edward Choi wrote: I was wrong. It wasn't because of the read once free policy. I tried again with Java first again and this time it didn't work. I looked up google and found the Http Client you mentioned. It is the one provided by apache, right? I guess I will have to try

Re: Hadoop/Elastic MR on AWS

2010-12-15 Thread Steve Loughran
On 09/12/10 18:57, Aaron Eng wrote: Pros: - Easier to build out and tear down clusters vs. using physical machines in a lab - Easier to scale up and scale down a cluster as needed Cons: - Reliability. In my experience I've had machines die, had machines fail to start up, had network outages

  1   2   3   >