Edward Capriolo wrote:
There is many ways to look at it, but 1 if you are just running Task
Trackers and not a DataNode on your workstations you have 0 data
locality. That is a bad thing because after all hadoop wants to move
the processing close to the data. If the disks backing the DataNode
a
Todd Lipcon wrote:
Yep, this is a common problem. The fix that Brian outlined helps a lot, but
if you are *really* strapped for random bits, you'll still block. This is
because even if you've set the random source, it still uses the real
/dev/random to grab a seed for the prng, at least on my sys
On Tue, Sep 29, 2009 at 4:58 PM, Jianwu Wang wrote:
Hi there,
When I have hadoop running (version 0.20.0, Pseudo-Distributed Mode), I
can not start my own java application. The exception complains that
'java.sql.SQLException: failed to connect to url
"jdbc:hsqldb:hsql://localhost/hsqldb".
On Sep 30, 2009, at 4:24 AM, Steve Loughran wrote:
Todd Lipcon wrote:
Yep, this is a common problem. The fix that Brian outlined helps a
lot, but
if you are *really* strapped for random bits, you'll still block.
This is
because even if you've set the random source, it still uses the real
/
Hi,
i m writing a map reduce program which reads a file from HDFS and
stores the contents in a static map (declared n initialized before executing
map reduce). but however after executing the map-reduce program, my map
returns 0 elements. is there any way i can make the data persistent in
Brian Bockelman wrote:
On Sep 30, 2009, at 4:24 AM, Steve Loughran wrote:
Todd Lipcon wrote:
Yep, this is a common problem. The fix that Brian outlined helps a
lot, but
if you are *really* strapped for random bits, you'll still block.
This is
because even if you've set the random source, it
On Sep 30, 2009, at 8:33 AM, Steve Loughran wrote:
Brian Bockelman wrote:
On Sep 30, 2009, at 4:24 AM, Steve Loughran wrote:
Todd Lipcon wrote:
Yep, this is a common problem. The fix that Brian outlined helps
a lot, but
if you are *really* strapped for random bits, you'll still block.
Thi
Todd Lipcon wrote:
Most people building new clusters at this point seem to be leaning towards
dual quad core Nehalem with 4x1TB 7200RPM SATA and at least 8G RAM.
We went with a similar configuration for a recently purchased cluster
but opted for qual quad core Opterons (Shanghai) rather than N
We went with 2 x Nehalems, 4 x 1TB drives and 24GB RAM. The ram might be
overkill... but it's DDR3 so you get either 12 or 24GB. Each box has 16
virtual cores so 12GB might not have been enough. These boxes are around $4k
each, but can easily outperform any $1K box dollar per dollar (and
performanc
We would like to use the same data for Pig and Hive queries for flexibility,
has anyone done this without having 2 copies of the data? Hive seems to only
want to work with CTRL-A delimited data, and I don't see a way to specify
CTRL-A as a delimiter for Pig. Is there another efficient regex that p
Hello Raakhi,
Since mapreduce tasks are executed on different virtual machines across the
cluster, you won't be able to share a static map between them. However, you
could use a different approach. For example, you could use HBase (hbase.org)
to hold your objects during the execution of your progra
Hi.
I get a strange error in Hadoop 0.18.3 (Cloudera) while I am trying to run a
streaming job.
The error is: "ERROR streaming.StreamJob: Unexpected -D while processing
-input|-output|-mapper|-combiner|-reducer| etc etc etc"
If I remove all -D parameters the job is accepted by Hadoop. However, I
Hi Chris,
Pig doesn't mandate a Ctrl-A or any other character to be used as field
delimiter. You can tell Pig which delimiter to use. For example, you can
specify Ctrl-A as field delimiter as following:
A = load 'mydata' using PigStorage('\u0001');
If you don't specify any delimiter, e.g. A = l
Gabriel-
Removing the -D is the right thing to do. The warning re:
GenericOptionsParser can be safely ignored.
If you want more info about the warning to put your mind at ease, take a look at
src/mapred/org/apache/hadoop/mapred/JobClient.java (line 1550 for
context on the warning)
src/core/org/a
On Mon, Sep 14, 2009 at 11:25 AM, Samprita Hegde wrote:
> @ Matt,
> I have done all the steps that you have mentioned. Can you please tell
> what are the next steps that I shoudl do?
I just noticed this was directed at me. Sorry for the delayed response.
I'm assuming at this point, that you
On Wed, Sep 30, 2009 at 11:45 AM, Ashutosh Chauhan
wrote:
> Hi Chris,
>
> Pig doesn't mandate a Ctrl-A or any other character to be used as field
> delimiter. You can tell Pig which delimiter to use. For example, you can
> specify Ctrl-A as field delimiter as following:
>
> A = load 'mydata' usin
2TB drives are just now dropping to parity with 1TB on a $/GB basis.
If you want space rather than speed, this is a good option. If you want
speed rather than space, more spindles and smaller disks are better.
Ironically, 500GB drives now often cost more than 1TB drives (that is $, not
$/GB).
On
As Bradford is out of town this evening, I will take up the mantel of
Person-on-Point. Contact me with questions re: tonight's gathering.
See you tonight!
-Nick
614.657.0267
On Mon, Sep 28, 2009 at 4:33 PM, Bradford Stephens <
bradfordsteph...@gmail.com> wrote:
> Hello everyone!
> Don't forget
Hi Steve,
Thanks for your info. I have to use hsql in my own java application.
Now I worked it out by using another port for the hsql server in my
own java application. But hsql says one hsql server should be able to
work with multiple databases
(http://hsqldb.org/doc/guide/ch01.html#N1
I really appreciate everyone's input. We've been going back and forth on the
server size issue here. There are a few reasons we shot for the $1k price,
one because we wanted to be able to compare our datacenter costs vs. the
cloud costs. Another is that we have spec'd out a fast Intel node with
ove
Depending on your needs and the size of your cluster, the out-of-band
management can be of significant interest. It is a pretty simple
cost/benefit analysis that trades your sysops time (which is probably about
the equivalent of $50-150 per hour fully loaded and accounting for
opportunity cost) ve
Raakhi-
Guilherme is correct. Each mapper (and reducer) runs independently
and communication between them is not provided for nor encouraged. You
may wish to look into the DistributedCached
(http://wiki.apache.org/hadoop/FAQ#A8,
http://hadoop.apache.org/common/docs/current/mapred_tutorial.
22 matches
Mail list logo