Re: How 'commodity' is 'commodity'

2009-09-30 Thread Steve Loughran
Edward Capriolo wrote: There is many ways to look at it, but 1 if you are just running Task Trackers and not a DataNode on your workstations you have 0 data locality. That is a bad thing because after all hadoop wants to move the processing close to the data. If the disks backing the DataNode a

Re: Running Hadoop on cluster with NFS booted systems

2009-09-30 Thread Steve Loughran
Todd Lipcon wrote: Yep, this is a common problem. The fix that Brian outlined helps a lot, but if you are *really* strapped for random bits, you'll still block. This is because even if you've set the random source, it still uses the real /dev/random to grab a seed for the prng, at least on my sys

Re: ask help for hsql conflict problem.

2009-09-30 Thread Steve Loughran
On Tue, Sep 29, 2009 at 4:58 PM, Jianwu Wang wrote: Hi there, When I have hadoop running (version 0.20.0, Pseudo-Distributed Mode), I can not start my own java application. The exception complains that 'java.sql.SQLException: failed to connect to url "jdbc:hsqldb:hsql://localhost/hsqldb".

Re: Running Hadoop on cluster with NFS booted systems

2009-09-30 Thread Brian Bockelman
On Sep 30, 2009, at 4:24 AM, Steve Loughran wrote: Todd Lipcon wrote: Yep, this is a common problem. The fix that Brian outlined helps a lot, but if you are *really* strapped for random bits, you'll still block. This is because even if you've set the random source, it still uses the real /

Storing contents of a file in a java object

2009-09-30 Thread Rakhi Khatwani
Hi, i m writing a map reduce program which reads a file from HDFS and stores the contents in a static map (declared n initialized before executing map reduce). but however after executing the map-reduce program, my map returns 0 elements. is there any way i can make the data persistent in

Re: Running Hadoop on cluster with NFS booted systems

2009-09-30 Thread Steve Loughran
Brian Bockelman wrote: On Sep 30, 2009, at 4:24 AM, Steve Loughran wrote: Todd Lipcon wrote: Yep, this is a common problem. The fix that Brian outlined helps a lot, but if you are *really* strapped for random bits, you'll still block. This is because even if you've set the random source, it

Re: Running Hadoop on cluster with NFS booted systems

2009-09-30 Thread Brian Bockelman
On Sep 30, 2009, at 8:33 AM, Steve Loughran wrote: Brian Bockelman wrote: On Sep 30, 2009, at 4:24 AM, Steve Loughran wrote: Todd Lipcon wrote: Yep, this is a common problem. The fix that Brian outlined helps a lot, but if you are *really* strapped for random bits, you'll still block. Thi

Re: Advice on new Datacenter Hadoop Cluster?

2009-09-30 Thread stephen mulcahy
Todd Lipcon wrote: Most people building new clusters at this point seem to be leaning towards dual quad core Nehalem with 4x1TB 7200RPM SATA and at least 8G RAM. We went with a similar configuration for a recently purchased cluster but opted for qual quad core Opterons (Shanghai) rather than N

Re: Advice on new Datacenter Hadoop Cluster?

2009-09-30 Thread Patrick Angeles
We went with 2 x Nehalems, 4 x 1TB drives and 24GB RAM. The ram might be overkill... but it's DDR3 so you get either 12 or 24GB. Each box has 16 virtual cores so 12GB might not have been enough. These boxes are around $4k each, but can easily outperform any $1K box dollar per dollar (and performanc

Pig and Hive on the same data?

2009-09-30 Thread dumbfounder
We would like to use the same data for Pig and Hive queries for flexibility, has anyone done this without having 2 copies of the data? Hive seems to only want to work with CTRL-A delimited data, and I don't see a way to specify CTRL-A as a delimiter for Pig. Is there another efficient regex that p

Re: Storing contents of a file in a java object

2009-09-30 Thread Guilherme Germoglio
Hello Raakhi, Since mapreduce tasks are executed on different virtual machines across the cluster, you won't be able to share a static map between them. However, you could use a different approach. For example, you could use HBase (hbase.org) to hold your objects during the execution of your progra

Error: Unexpected -D while processing...

2009-09-30 Thread Gabriel Moraru
Hi. I get a strange error in Hadoop 0.18.3 (Cloudera) while I am trying to run a streaming job. The error is: "ERROR streaming.StreamJob: Unexpected -D while processing -input|-output|-mapper|-combiner|-reducer| etc etc etc" If I remove all -D parameters the job is accepted by Hadoop. However, I

Re: Pig and Hive on the same data?

2009-09-30 Thread Ashutosh Chauhan
Hi Chris, Pig doesn't mandate a Ctrl-A or any other character to be used as field delimiter. You can tell Pig which delimiter to use. For example, you can specify Ctrl-A as field delimiter as following: A = load 'mydata' using PigStorage('\u0001'); If you don't specify any delimiter, e.g. A = l

Re: Error: Unexpected -D while processing...

2009-09-30 Thread Matt Massie
Gabriel- Removing the -D is the right thing to do. The warning re: GenericOptionsParser can be safely ignored. If you want more info about the warning to put your mind at ease, take a look at src/mapred/org/apache/hadoop/mapred/JobClient.java (line 1550 for context on the warning) src/core/org/a

Re: Using Ganglia with hadoop 0.19.0 on Amazon EC2

2009-09-30 Thread Matt Massie
On Mon, Sep 14, 2009 at 11:25 AM, Samprita Hegde wrote: > @ Matt, >   I have done all the steps that you have mentioned. Can you please tell > what are the next steps that I shoudl do? I just noticed this was directed at me. Sorry for the delayed response. I'm assuming at this point, that you

Re: Pig and Hive on the same data?

2009-09-30 Thread Edward Capriolo
On Wed, Sep 30, 2009 at 11:45 AM, Ashutosh Chauhan wrote: > Hi Chris, > > Pig doesn't mandate a Ctrl-A or any other character to be used as field > delimiter. You can tell Pig which delimiter to use. For example, you can > specify Ctrl-A as field delimiter  as following: > > A = load 'mydata' usin

Re: Advice on new Datacenter Hadoop Cluster?

2009-09-30 Thread Ted Dunning
2TB drives are just now dropping to parity with 1TB on a $/GB basis. If you want space rather than speed, this is a good option. If you want speed rather than space, more spindles and smaller disks are better. Ironically, 500GB drives now often cost more than 1TB drives (that is $, not $/GB). On

Re: Seattle / PNW Hadoop/Lucene/HBase Meetup, Wed Sep 30th

2009-09-30 Thread Nick Dimiduk
As Bradford is out of town this evening, I will take up the mantel of Person-on-Point. Contact me with questions re: tonight's gathering. See you tonight! -Nick 614.657.0267 On Mon, Sep 28, 2009 at 4:33 PM, Bradford Stephens < bradfordsteph...@gmail.com> wrote: > Hello everyone! > Don't forget

Re: ask help for hsql conflict problem.

2009-09-30 Thread Jianwu Wang
Hi Steve, Thanks for your info. I have to use hsql in my own java application. Now I worked it out by using another port for the hsql server in my own java application. But hsql says one hsql server should be able to work with multiple databases (http://hsqldb.org/doc/guide/ch01.html#N1

Re: Advice on new Datacenter Hadoop Cluster?

2009-09-30 Thread Kevin Sweeney
I really appreciate everyone's input. We've been going back and forth on the server size issue here. There are a few reasons we shot for the $1k price, one because we wanted to be able to compare our datacenter costs vs. the cloud costs. Another is that we have spec'd out a fast Intel node with ove

Re: Advice on new Datacenter Hadoop Cluster?

2009-09-30 Thread Ted Dunning
Depending on your needs and the size of your cluster, the out-of-band management can be of significant interest. It is a pretty simple cost/benefit analysis that trades your sysops time (which is probably about the equivalent of $50-150 per hour fully loaded and accounting for opportunity cost) ve

Re: Storing contents of a file in a java object

2009-09-30 Thread Jakob Homan
Raakhi- Guilherme is correct. Each mapper (and reducer) runs independently and communication between them is not provided for nor encouraged. You may wish to look into the DistributedCached (http://wiki.apache.org/hadoop/FAQ#A8, http://hadoop.apache.org/common/docs/current/mapred_tutorial.