Re: RAID vs. JBOD

2009-01-15 Thread Steve Loughran

Runping Qi wrote:

Hi,

We at Yahoo did some Hadoop benchmarking experiments on clusters with JBOD
and RAID0. We found that under heavy loads (such as gridmix), JBOD cluster
performed better.

Gridmix tests:

Load: gridmix2
Cluster size: 190 nodes
Test results:

RAID0: 75 minutes
JBOD:  67 minutes
Difference: 10%

Tests on HDFS writes performances

We ran map only jobs writing data to dfs concurrently on different clusters.
The overall dfs write throughputs on the jbod cluster are 30% (with a 58
nodes cluster) and 50% (with an 18 nodes cluster) better than that on the
raid0 cluster, respectively.

To understand why, we did some file level benchmarking on both clusters.
We found that the file write throughput on a JBOD machine is 30% higher than
that on a comparable machine with RAID0. This performance difference may be
explained by the fact that the throughputs of different disks can vary 30%
to 50%. With such variations, the overall throughput of a raid0 system may
be bottlenecked by the slowest disk.
 


-- Runping



This is really interesting. Thank you for sharing these results!

Presumably the servers were all set up with nominally homogenous 
hardware? And yet still the variations existed. That would be something 
to experiment with on new versus old clusters to see if it gets worse 
over time.


Here we have a batch of desktop workstations all bought at the same 
time, to the same spec, but one of them, lucky is more prone to race 
conditions than any of the others. We don't know why, and assume its do 
with the (multiple) Xeon CPU chips being at different ends of the bell 
curve or something. all we know is: test on that box before shipping to 
find race conditions early.


-steve


Re: Indexed Hashtables

2009-01-15 Thread Steve Loughran

Sean Shanny wrote:

Delip,

So far we have had pretty good luck with memcached.  We are building a 
hadoop based solution for data warehouse ETL on XML based log files that 
represent click stream data on steroids.


We process about 34 million records or about 70 GB data a day.  We have 
to process dimensional data in our warehouse and then load the surrogate 
keyvalue pairs in memcached so we can traverse the XML files once 
again to perform the substitutions.  We are using the memcached solution 
because is scales out just like hadoop.  We will have code that allows 
us to fall back to the DB if the memcached lookup fails but that should 
not happen to often.




LinkedIn have just opened up something they run internally, Project 
Voldemort:


http://highscalability.com/product-project-voldemort-distributed-database
http://project-voldemort.com/

It's a DHT, Java based. I haven't played with it yet, but it looks like 
a good part of the portfolio.




Is it possible to submit job to JobClient and exit immediately?

2009-01-15 Thread Andrew
For now, I use such code blocks in all my MR jobs:

try {
JobClient.runJob(job);
} catch (IOException exc) {
LOG.info(Job failed, exc);
}
System.exit(0);

But this code waits until MR job to complete. Thus, I have to run it on 
machine that is always online to jobtracker.

My purpose is to write code that submits job to job tracker and exits without 
waiting for job to complete. I've tried 

try {
JobID jobid = new JobClient(job).submitJob(job).getID();
} catch (IOException exc) {
LOG.info(Job failed, exc);
}
System.exit(0);

This code fails  with java.lang.IllegalStateException: Shutdown in progress. 
It seems, that JobClient creates non-daemon threads on submitJob() 
invocation.


So, is there the way to submit job and exit immediately?

-- 
Andrew Gudkov
PGP key id: CB9F07D8 (cryptonomicon.mit.edu)
Jabber: gu...@jabber.ru


Re: Is it possible to submit job to JobClient and exit immediately?

2009-01-15 Thread Amar Kamat

Andrew wrote:

For now, I use such code blocks in all my MR jobs:

try {
JobClient.runJob(job);
  

JobClient jc = new JobClient(job);
jc.submitJob(job); // submits a job and comes out

} catch (IOException exc) {
LOG.info(Job failed, exc);
}
System.exit(0);

But this code waits until MR job to complete. Thus, I have to run it on 
machine that is always online to jobtracker.


My purpose is to write code that submits job to job tracker and exits without 
waiting for job to complete. I've tried 


try {
JobID jobid = new JobClient(job).submitJob(job).getID();
} catch (IOException exc) {
LOG.info(Job failed, exc);
}
System.exit(0);

This code fails  with java.lang.IllegalStateException: Shutdown in progress. 
  
Can you check what state the jobtracker via its web-ui? Can you see what 
is happening to the jobtracker by checking its logs?

Amar
It seems, that JobClient creates non-daemon threads on submitJob() 
invocation.



So, is there the way to submit job and exit immediately?

  




Re: RAID vs. JBOD

2009-01-15 Thread Runping Qi

Yes, all the machines in the tests are new, with the same spec.
The 30% to 50% throughput variations of the disks were observed on the disks
of the same machines.

Runping



On 1/15/09 2:41 AM, Steve Loughran ste...@apache.org wrote:

 Runping Qi wrote:
 Hi,
 
 We at Yahoo did some Hadoop benchmarking experiments on clusters with JBOD
 and RAID0. We found that under heavy loads (such as gridmix), JBOD cluster
 performed better.
 
 Gridmix tests:
 
 Load: gridmix2
 Cluster size: 190 nodes
 Test results:
 
 RAID0: 75 minutes
 JBOD:  67 minutes
 Difference: 10%
 
 Tests on HDFS writes performances
 
 We ran map only jobs writing data to dfs concurrently on different clusters.
 The overall dfs write throughputs on the jbod cluster are 30% (with a 58
 nodes cluster) and 50% (with an 18 nodes cluster) better than that on the
 raid0 cluster, respectively.
 
 To understand why, we did some file level benchmarking on both clusters.
 We found that the file write throughput on a JBOD machine is 30% higher than
 that on a comparable machine with RAID0. This performance difference may be
 explained by the fact that the throughputs of different disks can vary 30%
 to 50%. With such variations, the overall throughput of a raid0 system may
 be bottlenecked by the slowest disk.
  
 
 -- Runping
 
 
 This is really interesting. Thank you for sharing these results!
 
 Presumably the servers were all set up with nominally homogenous
 hardware? And yet still the variations existed. That would be something
 to experiment with on new versus old clusters to see if it gets worse
 over time.
 
 Here we have a batch of desktop workstations all bought at the same
 time, to the same spec, but one of them, lucky is more prone to race
 conditions than any of the others. We don't know why, and assume its do
 with the (multiple) Xeon CPU chips being at different ends of the bell
 curve or something. all we know is: test on that box before shipping to
 find race conditions early.
 
 -steve



Re: Indexed Hashtables

2009-01-15 Thread pr-hadoop

Delip,

what about Hadoop MapFile?

http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/io/MapFile.html

Regards,

Peter



Re: Indexed Hashtables

2009-01-15 Thread Jim Twensky
Delip,

Why do you think Hbase will be an overkill? I do something similar to what
you're trying to do with Hbase and I haven't encountered any significant
problems so far. Can you give some more info on the size of the data you
have?

Jim

On Wed, Jan 14, 2009 at 8:47 PM, Delip Rao delip...@gmail.com wrote:

 Hi,

 I need to lookup a large number of key/value pairs in my map(). Is
 there any indexed hashtable available as a part of Hadoop I/O API?
 I find Hbase an overkill for my application; something on the lines of
 HashStore (www.cellspark.com/hashstore.html) should be fine.

 Thanks,
 Delip



Cascading 1.0.0 Released

2009-01-15 Thread Chris K Wensel

Hi all

Just a quick note to let everyone know that Cascading 1.0.0 is out.
http://www.cascading.org/

Cascading is an API for defining and executing data processing flows  
without needing to think in MapReduce.


This release supports only Hadoop 0.19.x. Minor releases will be  
available to track Hadoop's progress.


A list of some of the high level features can be read about here:
http://www.cascading.org/documentation/features/

Or you can peruse the User Guide:
http://www.cascading.org/documentation/userguide.html

Developer Support and OEM/Commercial Licensing is available through  
Concurrent, Inc.

http://www.concurrentinc.com/

And finally, Advanced Hadoop and Cascading training (and consulting)  
is available through Scale Unlimited:

http://www.scaleunlimited.com/

cheers,
chris

--
Chris K Wensel
ch...@wensel.net
http://www.cascading.org/
http://www.scaleunlimited.com/



Re: Hadoop 0.17.1 = EOFException reading FSEdits file, what causes this? how to prevent?

2009-01-15 Thread Aaron Kimball
Does the file exist or maybe was it deleted? Also, are the permissions on
that directory set correctly, or could they have been changed out from under
you by accident?

- Aaron

On Tue, Jan 13, 2009 at 9:53 AM, Joe Montanez jmonta...@veoh.com wrote:

 Hi:



 I'm using Hadoop 0.17.1 and I'm encountering EOFException reading the
 FSEdits file.  I don't have a clear understanding what is causing this
 and how to prevent this.  Has anyone seen this and can advise?



 Thanks in advance,

 Joe



 2009-01-12 22:51:45,573 ERROR org.apache.hadoop.dfs.NameNode:
 java.io.EOFException

at java.io.DataInputStream.readFully(DataInputStream.java:180)

at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106)

at
 org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:90)

at
 org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:599)

at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:766)

at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:640)

at
 org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:223)

at
 org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)

at
 org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:274)

at
 org.apache.hadoop.dfs.FSNamesystem.init(FSNamesystem.java:255)

at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:133)

at org.apache.hadoop.dfs.NameNode.init(NameNode.java:178)

at org.apache.hadoop.dfs.NameNode.init(NameNode.java:164)

at
 org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)

at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)



 2009-01-12 22:51:45,574 INFO org.apache.hadoop.dfs.NameNode:
 SHUTDOWN_MSG:






Re: Hadoop 0.17.1 = EOFException reading FSEdits file, what causes this? how to prevent?

2009-01-15 Thread Konstantin Shvachko

Joe,

It looks like you edits file is corrupted or truncated.
Most probably the last modification was not written to it,
when the name-node was turned off. This may happen if the
node crashes depending on the underlying local file system I guess.

Here are some options for you to consider:
- try an alternative replica of the image directory if you had one.
- try to edit the edits file if you know the internal format.
- try to modify local copy of your name-node code, which should
catch EOFException and ignore it.
- Use a checkpointed image if you can afford to loose latest modifications to 
the fs.
- Formatting of cause is the last resort since you loose everything.

Thanks,
--Konstantin

Joe Montanez wrote:

Hi:

 


I'm using Hadoop 0.17.1 and I'm encountering EOFException reading the
FSEdits file.  I don't have a clear understanding what is causing this
and how to prevent this.  Has anyone seen this and can advise?

 


Thanks in advance,

Joe

 


2009-01-12 22:51:45,573 ERROR org.apache.hadoop.dfs.NameNode:
java.io.EOFException

at java.io.DataInputStream.readFully(DataInputStream.java:180)

at org.apache.hadoop.io.UTF8.readFields(UTF8.java:106)

at
org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:90)

at
org.apache.hadoop.dfs.FSEditLog.loadFSEdits(FSEditLog.java:599)

at org.apache.hadoop.dfs.FSImage.loadFSEdits(FSImage.java:766)

at org.apache.hadoop.dfs.FSImage.loadFSImage(FSImage.java:640)

at
org.apache.hadoop.dfs.FSImage.recoverTransitionRead(FSImage.java:223)

at
org.apache.hadoop.dfs.FSDirectory.loadFSImage(FSDirectory.java:80)

at
org.apache.hadoop.dfs.FSNamesystem.initialize(FSNamesystem.java:274)

at
org.apache.hadoop.dfs.FSNamesystem.init(FSNamesystem.java:255)

at org.apache.hadoop.dfs.NameNode.initialize(NameNode.java:133)

at org.apache.hadoop.dfs.NameNode.init(NameNode.java:178)

at org.apache.hadoop.dfs.NameNode.init(NameNode.java:164)

at
org.apache.hadoop.dfs.NameNode.createNameNode(NameNode.java:848)

at org.apache.hadoop.dfs.NameNode.main(NameNode.java:857)

 


2009-01-12 22:51:45,574 INFO org.apache.hadoop.dfs.NameNode:
SHUTDOWN_MSG:

 





Locks in hadoop

2009-01-15 Thread Sagar Naik

I would like to implement a  locking mechanism across the hdfs cluster
I assume there is no inherent support for it

I was going to do it with files. According to my knowledge, file 
creation is an atomic operation. So the file-based lock should work.
I need to think through with all conditions but if some one has better 
idea/solution, pl share


Thanks
-Sagar



Re: Locks in hadoop

2009-01-15 Thread Konstantin Shvachko

Did you look at Zookeeper?
Thanks,
--Konstantin

Sagar Naik wrote:

I would like to implement a  locking mechanism across the hdfs cluster
I assume there is no inherent support for it

I was going to do it with files. According to my knowledge, file 
creation is an atomic operation. So the file-based lock should work.
I need to think through with all conditions but if some one has better 
idea/solution, pl share


Thanks
-Sagar




Re: hadoop job -history

2009-01-15 Thread Amareshwari Sriramadasu

jobOutputDir is the location specified by the configuration property 
hadoop.job.history.user.location. If you don't specify anything for the property, 
the job history logs will be created in job's output directory. So, to view your history give 
your jobOutputDir, if you havent specified any location.
Hope this helps.

Thanks
Amareshwari


Bill Au wrote:

I am having trouble getting the hadoop command job -hisotry to work.  What
am I suppose to use for jobOutputDir?  I can see the job history from the
JobTracker web ui.  I tried specifing the history directory on the
JobTracker but it didn't work:

$ hadoop job -history logs/history/
Exception in thread main java.io.IOException: Not able to initialize
History viewer
at
org.apache.hadoop.mapred.HistoryViewer.init(HistoryViewer.java:88)
at
org.apache.hadoop.mapred.JobClient.viewHistory(JobClient.java:1596)
at org.apache.hadoop.mapred.JobClient.run(JobClient.java:1560)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1727)
Caused by: java.io.IOException: History directory
logs/history/_logs/historydoes not exist
at
org.apache.hadoop.mapred.HistoryViewer.init(HistoryViewer.java:70)
... 5 more

  




hadoop 0.19.0 and data node failure

2009-01-15 Thread Kumar Pandey
To test hadoop's fault tolerence I tried the following node A -- name node
and secondaryname node
nodeB  - datanode
nodeC  - datanode

replica set to 2.
When A, B and C are running I'm able to make a round trip for a wav file.

Now to test fault tolerence I brought nodeB down and tried to write a file.
Writing failed even though nodeC was up and running with following msg.
More interestingly the file of size was listed in the name node.
I would have expected hadoop to write the file to NodeB

##error msg###
[had...@cancunvm1 testfiles]$ hadoop fs -copyFromLocal
9979_D4FE01E0-DD119BDE-3000CB83-EB857348.wav
jukebox/9979_D4FE01E0-DD119BDE-3000CB83-EB857348_21.wav

09/01/16 01:47:09 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.net.SocketTimeoutException
09/01/16 01:47:09 INFO hdfs.DFSClient: Abandoning block
blk_4025795281260753088_1216
09/01/16 01:47:09 INFO hdfs.DFSClient: Waiting to find target node:
10.0.3.136:50010
09/01/16 01:47:18 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.net.NoRouteToHostException: No route to host
09/01/16 01:47:18 INFO hdfs.DFSClient: Abandoning block
blk_-2076345051085316536_1216
09/01/16 01:47:27 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.net.NoRouteToHostException: No route to host
09/01/16 01:47:27 INFO hdfs.DFSClient: Abandoning block
blk_2666380449580768625_1216
09/01/16 01:47:36 INFO hdfs.DFSClient: Exception in createBlockOutputStream
java.net.NoRouteToHostException: No route to host
09/01/16 01:47:36 INFO hdfs.DFSClient: Abandoning block
blk_742770163755453348_1216
09/01/16 01:47:42 WARN hdfs.DFSClient: DataStreamer Exception:
java.io.IOException: Unable to create new block.
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2723)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:1997)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2183)

09/01/16 01:47:42 WARN hdfs.DFSClient: Error Recovery for block
blk_742770163755453348_1216 bad datanode[0] nodes == null
09/01/16 01:47:42 WARN hdfs.DFSClient: Could not get block locations.
Aborting...
copyFromLocal: No route to host
Exception closing file
/user/hadoop/jukebox/9979_D4FE01E0-DD119BDE-3000CB83-EB857348_21.wav
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:198)
at org.apache.hadoop.hdfs.DFSClient.access$600(DFSClient.java:65)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3084)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3053)
at
org.apache.hadoop.hdfs.DFSClient$LeaseChecker.close(DFSClient.java:942)
at org.apache.hadoop.hdfs.DFSClient.close(DFSClient.java:210)