Re: Thoughts about Hadoop cluster hardware

2010-07-17 Thread U235Sentinel
Awesome!  I appreciate it.  I'm off on training right now so I'm just
starting to catch up.  I'll check out those servers and see how they compare

thanks a bunch!

On Tue, Jul 13, 2010 at 8:36 PM, Allen Wittenauer
wrote:

>
> On Jul 13, 2010, at 5:00 PM, u235sentinel wrote:
>
> > So we're talking to Dell about their new PowerEdge c2100 servers for a
> Hadoop cluster but I'm wondering.  Isn't this still a little overboard for
> nodes in a cluster?  I'm wondering if we bought say 100 poweredge 2750's
> instead of just 50 c2100's.  The price would be about the same for the
> configuration we're talking about and we would get twice as many nodes.
>
> Ultimately, it depends upon your job flow and how much data you have.
>
> FWIW we're currently using a Sun equivalent of the C2100s w/8 of the 12
> drive slots filled.  You need a *LOT* of iops to make it worth while.  [From
> what I've seen, even people who think they have a lot of iops generally have
> other problems with their code/tuning that are causing the iops.   So even
> if you think you have a lot, you may not.]
>
> > I'm curious if any other's are running Dell PowerEdge servers with
> Hadoop.
> >
> > We've also been kicking the idea around of going with blade servers (Dell
> and/or HP).
>
> If you are thinking traditional blade where storage is comes mainly from
> NAS or SAN, you are going to be very, very unhappy unless your data set is
> very, very tiny.
>
> Check out the PoweredBy page on the wiki.  Quite a few folks list their
> gear. FWIW, we're currently evaluating HP SLs and should be getting some
> Dell C6100s in soon, assuming Dell can deliver the eval unit on time.


Thoughts about Hadoop cluster hardware

2010-07-13 Thread u235sentinel
So we're talking to Dell about their new PowerEdge c2100 servers for a 
Hadoop cluster but I'm wondering.  Isn't this still a little overboard 
for nodes in a cluster?  I'm wondering if we bought say 100 poweredge 
2750's instead of just 50 c2100's.  The price would be about the same 
for the configuration we're talking about and we would get twice as many 
nodes.


I'm curious if any other's are running Dell PowerEdge servers with Hadoop.

We've also been kicking the idea around of going with blade servers 
(Dell and/or HP).


Just curious

Thanks!!


Sensage to Hadoop conversion?

2010-04-29 Thread u235sentinel

Is there a way to convert sensage systems over to the hadoop store??

While we're miles away from switching, there is a growing interest and 
I'm going at this on my own for now :=)


Re: Does Hadoop compress files?

2010-04-04 Thread u235sentinel
Ok that's what I was thinking.  I was wondering if Hadoop did on the fly 
compression as it stored files in HDFS like Sensage does.  But it sounds 
like Hadoop will take a compressed file and store it as compressed which 
is fine by me.  Sensage will do that same.


I believe this answers the question.  Sonal's link suggests there is 
support for compression using zlib, gzip and bzip2. 

One more question though.  So storing files in compressed format, any 
issues with searching that data?  I'm curious if there is a disadvantage 
in doing this.  I could build bigger and badder servers but was hoping 
for compression.


Thanks



Eric Sammer wrote:

To clarify, there is no implicit compression in HDFS. In other words,
if you want your data to be compressed, you have to write it that way.
If you plan on writing map reduce jobs to process the compressed data,
you'll want to use a splittable compression format. This generally
means LZO or block compressed SequenceFiles which others have
mentioned.
  





  




Does Hadoop compress files?

2010-04-03 Thread u235sentinel
I'm starting to evaluate Hadoop.  We are currently running Sensage and 
store a lot of log files in our current environment.  I've been looking 
at the Hadoop forums and googling (of course) but haven't learned if 
Hadoop HDFS does any compression to files we store.


On the average we're storing about 600 gigs a week in log files (more or 
less).  Generally we need to store about 1 1/2 - 2 years of logs.  With 
Sensage compression we can store about 200+ Tb of logs in our current 
environment.


As I said, we're starting to evaluate if Hadoop would be a good 
replacement to our Sensage environment (or at least augment it).


Thanks a bunch!!