Re: Which hardware to choose

2012-10-02 Thread hadoopman
Had to ask :D On 10/02/2012 07:19 PM, Russell Jurney wrote: I believe he means per node. Russell Jurney http://datasyndrome.com On Oct 2, 2012, at 6:15 PM, hadoopman wrote: Only 24 map and 8 reduce tasks for 38 data nodes? are you sure that's right? Sounds VERY low for a cluster

Re: Which hardware to choose

2012-10-02 Thread hadoopman
Only 24 map and 8 reduce tasks for 38 data nodes? are you sure that's right? Sounds VERY low for a cluster that size. We have only 10 c2100's and are running I believe 140 map and 70 reduce slots so far with pretty decent performance. On 10/02/2012 12:55 PM, Alexander Pivovarov wrote: 38

Re: Java Heap space error

2012-03-08 Thread hadoopman
I'm curious if you have been able to track down the cause of the error? We've seen similar problems with loading data and I've discovered if I presort my data before the load that things go a LOT smoother. When running queries against our data sometimes we've seen it where the jobtracker just

Hadoop and Ubuntu / Java

2011-12-20 Thread hadoopman
http://www.omgubuntu.co.uk/2011/12/java-to-be-removed-from-ubuntu-uninstalled-from-user-machines/ I'm curious what this will mean for Hadoop on Ubuntu systems moving forward. I've tried openJDK nearly two years ago with Hadoop. Needless to say it was a real problem. Hopefully we can still

Re: Senthil wants to chat

2011-08-27 Thread hadoopman
SPAM 2.0 :D On 08/27/2011 10:06 AM, Shahnawaz Saifi wrote: Whats' this? On Sat, Aug 27, 2011 at 9:35 AM, Senthil wrote:

Hive dynamic partition error?

2011-07-11 Thread hadoopman
So we're seeing the following error during some of our hive loads: 2011-07-05 12:26:52,927 Stage-2 map = 100%, reduce = 100% Ended Job = job_201106302113_3864 Loading data to table default.merged_weblogs partition (day=null) Failed with exception Number of dynamic partitions created is 1013, wh

Re: OutOfMemoryError: GC overhead limit exceeded

2011-06-22 Thread hadoopman
I've run into similar problems in my hive jobs and will look at the 'mapred.child.ulimit' option. One thing that we've found is when loading data with insert overwrite into our hive tables we've needed to include a 'CLUSTER BY' or 'DISTRIBUTE BY' option. Generally that's fixed our memory issu

Re: Poor IO performance on a 10 node cluster.

2011-06-01 Thread hadoopman
Some things which helped us include setting your vm.swappiness to 0 and mounting your disks with noatime,nodiratime options. Also make sure your disks aren't setup with RAID (JBOD is recommended) You might want to run terasort as you tweak your environment. It's very helpful when checking if

hadoop/hive data loading

2011-05-10 Thread hadoopman
When we load data into hive sometimes we've run into situations where the load fails and the logs show a heap out of memory error. If I load just a few days (or months) of data then no problem. But then if I try to load two years (for example) of data then I've seen it fail. Not with every f

Re: Hadoop demand

2011-04-29 Thread hadoopman
My guess is it's like back in the days when Linux was considered a 'bad' option for running a production system and people would freak out when they found out about it. It was so new and people were just learning what it's all about. Today it's very mainstream but it took people a while to fi

Re: fair scheduler issue

2011-04-26 Thread hadoopman
r...@gmail.com On Tue, Apr 26, 2011 at 5:59 AM, hadoopman wrote: Has anyone had problems with the latest version of hadoop and the fair scheduler not placing jobs into pools correctly? We're digging into it currently. An older version of hadoop (using our config file) is worki

Re: fair scheduler issue

2011-04-26 Thread hadoopman
hanks& Regards, Saurabh Bhutyani Call : 9820083104 Gtalk: s4saur...@gmail.com On Tue, Apr 26, 2011 at 5:59 AM, hadoopman wrote: Has anyone had problems with the latest version of hadoop and the fair scheduler not placing jobs into pools correctly? We're digging into it curren

fair scheduler issue

2011-04-25 Thread hadoopman
Has anyone had problems with the latest version of hadoop and the fair scheduler not placing jobs into pools correctly? We're digging into it currently. An older version of hadoop (using our config file) is working fine however the latest version seems to be putting everything into the defaul

Re: Hive regex SerDe issue?

2011-04-13 Thread hadoopman
here. If you use a 32-bit Java this would be a problem. On Wed, Apr 13, 2011 at 3:16 PM, hadoopman wrote: Is there an issue with using the regex SerDe with loading into Hive text files above 2 gigs in size? I've been experiencing out of memory errors with a select group of logs when ru

Hive regex SerDe issue?

2011-04-13 Thread hadoopman
Is there an issue with using the regex SerDe with loading into Hive text files above 2 gigs in size? I've been experiencing out of memory errors with a select group of logs when running a hive job. I have been able to load the data if I use split to cut it in half or thirds. No problem. Goo

Re: Setting input paths

2011-04-06 Thread hadoopman
I have a process which is loading data into hive hourly. Loading data hourly isn't a problem however when I load historical data say 24-48 hours I receive the below error msg. In googling I've come across some suggestions that jvm memory needs to be increased. Are there any other options or

Re: Using space as field separator fails. How do I fix this?

2011-04-04 Thread hadoopman
Great tip. I'll give it a try. Thanks! On 04/04/2011 10:17 PM, Alex Kozlov wrote: Try using octal, I.e. '\040'. On Apr 4, 2011, at 8:21 PM, hadoopman wrote: I had a similar problem though my logs were terminated with carriage return. Many of the fields in my logs

Re: Using space as field separator fails. How do I fix this?

2011-04-04 Thread hadoopman
I had a similar problem though my logs were terminated with carriage return. Many of the fields in my logs are deliminated with a space. We tried using \s but that basically removed every instance of the letter s (yeah I thought that was amusing too). In some cases we were able to do a \\t b

Re: HDFS and distcp issue??

2010-12-06 Thread hadoopman
On 12/06/2010 07:48 PM, Dmitriy Ryaboy wrote: Do you have the failing task's log? -Dmitriy On Sat, Dec 4, 2010 at 12:47 PM, hadoopman wrote: I'll have to look for it. This is my first full blown installation of Hadoop. Still a LOT to learn Is that the name it's t

HDFS and distcp issue??

2010-12-04 Thread hadoopman
I've run into an interesting problem with syncing a couple of clusters using distcp. We've validated that it works to a local installation from our remote cluster. I suspect our firewalls 'may' be responsible for the problem we're experiencing. We're using ports 9000, 9001 and 50010.I've ver

Re: HDFS Rsync process??

2010-11-30 Thread hadoopman
On 11/30/2010 03:51 AM, Steve Loughran wrote: On 30/11/10 03:59, hadoopman wrote: you don't need all the files in the cluster in sync as a lot of them are intermediate and transient files. Instead use dfscopy to copy source files to the two clusters, this runs across the machines i

HDFS Rsync process??

2010-11-29 Thread hadoopman
We have two Hadoop clusters in two separate buildings. Both clusters are loading the same data from the same sources (the second cluster is for DR). We're looking at how we can recover the primary cluster and catch it back up again as new data will continue to feed into the DR cluster. It's