RE: want to try HBase on a large cluster running Lustre - any advice?

Taylor, Ronald C Wed, 07 Dec 2011 07:59:44 -0800

Hi Rita,

Re your suggestions: I do not own this new cluster at PNNL.  It costs $4 
million or so and a lot of different lab groups will be using it. I want to try 
it because I can get a fair amount of time, nodes, and Lustre disk space at a 
nominal cost as an internal researcher in this division, as part of the 
institutional base capability. It already does have a 1 TB hard drive on each 
of the 480-odd nodes - but the policy (which I cannot change) is that the local 
disk space gets reclaimed for other jobs when your job is done. The only 
permanent storage (with yearly renewal) is on the Lustre file system. Which is 
on RAID6.


I am not that concerned about losing data. This is not a production install 
where I would have users screaming at me to retrieve their lost data. For now, 
this is a research project with a relatively small number of users, and I know 
what data is going in, and where the data comes from. Source files will stay 
available. Hbase files on Lustre can also be backed up to another  file archive 
cluster in a separate building- which itself gets backed up to tape. If a node 
goes down and I lose some data, I'll know the tables affected because I'll know 
what was running.  I've worked with relational databases (and logic-based 
databases, and object-oriented dbs, and, gnu help me, XML-based dbs) for many 
years - and I've gone through rebuilds. Already have a script that recreates 
almost all of the tables from scratch, importing the data sets. (Takes a heck 
of a long time to run, of course.)

So - this is not a 24/7 transactional db. It is early-stage warehouse for 
researchers that will have periodic updates.  I am more concerned about getting 
a nice, long-term environment for properly designing and trying out a data 
warehouse for large scale biological data integration than in worrying about 
max speed, or  minor, recoverable data loss due to node  drop.  (Though that 
would change, of course, if a larger number of users comes to pass.) . I just 
read that David  Haussler at UC Santa Cruz is creating a five-petabyte 
repository of 20,0000 genomes for cancer research. (Note to self: send him an 
email and find out what he's using - Hadoop? HBase or some other NoSQL db?)  
Think along those lines for what the PNNL data warehouse would be for - except 
we are more interested in next gen sequencing RNA-seq transcriptomics data and 
mass spec proteomics data (combined with annotated genomes, of course, and 
other background public data on the organisms, proteins, etc.).

So I want a nice environment in which I can do analytics, and where I look into 
laying the foundation for possible automated reasoning capabilities 
(parallelized Java rule engines, anyone? Possible Mahout use?). And possible 
ontology use.  I want to be able to run analysis programs that simply were not 
possible otherwise - and then publish the new biological knowledge, since I'm 
in research and want to keep my job. So I want something that uses HBase for 
scaling, and which I don't have to hugely alter  later on with a shift to 
another platform. Want to choose the right tools (Hadoop, HBase, the Hadoop 
ecosystem, that's the plan) that will leverage my prior work best - and allow 
wide flexibility in revising tables as I learn what works and what does not in 
supporting our analyses . Programmer effort (to wit, me) is the most important 
criterion here. If I have to move from this shared cluster later to a dedicated 
(and  vastly smaller) Hadoop cluster which have to I pay for, no real damage 
done - the HBase db layout and Hadoop programs stay the same. But if  this does 
work - on this very large ~100 Teraflop + Lustre cluster -, then for heavy-duty 
Big Data analysis programs I might be able to put in requests to use a large 
fraction of the cores at times (approaching 15,000 cores next year). Nice 
flexibility, if this all works out. And PNNL institutional support will be 
maintaining the cluster - I would have not hardware worries, no worries about 
doing OS updates, handling emergencies, and so on. I and my data warehouse 
would glide along, undisturbed by those mundane details. Hopefully a big diff 
from running my current cluster, let me tell you.

Heard back from a couple other folks, who suggested I could  at least give 
HBase a try on the Lustre system. We will see. I'll let the list know whether 
this works  - or if I fall flat on my face. But I'm fairly optimistic right now 
- the cluster' s support staff here are superb and, as I said, they already 
have Cloudera CDH3-U2 up and running, working on terabytes of data (in HDFS, 
temporarily placed on local disks), invoking such on subsets of nodes using 
custom scripts. Just need to add in HBase, pointing it as Lustre. Fingers 
crossed.

Ron

Ronald Taylor, Ph.D.
Computational Biology & Bioinformatics Group
Pacific Northwest National Laboratory (U.S. Dept of Energy/Battelle)
Richland, WA 99352
phone: (509) 372-6568
email: [email protected]

From: Rita [mailto:[email protected]]
Sent: Tuesday, December 06, 2011 3:53 PM
To: [email protected]
Cc: [email protected]; Taylor, Ronald C; Carlson, Timothy S; Regimbal, 
Kevin; Wiley, Steven
Subject: Re: want to try HBase on a large cluster running Lustre - any advice?

How would you handle a node failure? Do you have shared storage which exports 
LUNs to the datanodes? The beauty of hbase+hdfs is you can afford nodes going 
down (depending on your replication policy).

Lustre is a great filesystem for scratch high performance filesystem but using 
it as a backend for hbase, I think isnt a good idea.

I suggest you buy a disk for each node and then run hdfs and hbase on top of 
it. Thats what I do.


On Mon, Dec 5, 2011 at 5:04 PM, Taylor, Ronald C 
<[email protected]<mailto:[email protected]>> wrote:
Hello Lars,

Thanks for your previous help. Got a new question for you.  I now have the 
opportunity to try using Hadoop and HBase on a newly installed cluster here, at 
a nominal cost. A lot of compute power (480+ nodes, 16 cores per node going up 
to 32 by the end of FY12, 64 GB RAM per node, with a few fat nodes with 256GB). 
One local drive of 1TB per node, and a four petabyte Lustre file system. Hadoop 
jobs are already running on this new cluster, on terabyte size data sets.

Here's the drawback: I cannot permanently store HBase tables on local disk. 
After a job finishes, the disks are reclaimed. So - if I want to build a 
continuously available data warehouse (basically for analytics runs, not for 
real-time web access by a large community at present - just me and other 
internal bioinformatics folk here at PNNL)  I need to put the HBase tables on 
the Lustre file system.

Now, all the nodes in this cluster have a very fast InfiniBand QDR network 
interconnect. I think it's something like 40 gigabits/sec, as compared to the 1 
gigabit/sec that you might see in a run-of-the-mill Hadoop cluster. And I just 
read a  couple white papers that say that if the network interconnect is good  
enough, the loss of data locality when you use Lustre with Hadoop is not such a 
bad thing. That is, I Googled and found several papers on HDFS vs Lustre. The 
latest one I found (2011) is a white paper from a company called Xyratex. 
Here's a quote from it:

The use of clustered file systems as a backend for Hadoop storage has been 
studied previously. The performance
of distributed file systems such as Lustre2 , Ceph3 , PVFS4 , and GPFS5 with 
Hadoop has been compared to that
of HDFS. Most of these investigations have shown that non-HDFS file systems 
perform more poorly than HDFS,
although with various optimizations and tuning efforts, a clustered file system 
can reach parity with HDFS. However,
a consistent limitation in the studies of HDFS and non-HDFS performance with 
Hadoop is that they used the network
infrastructure to which Hadoop is limited, TCP/IP, typically over 1 GigE. In 
HPC environments, where much faster
network interconnects are available, significantly better clustered file system 
performance with Hadoop is possible.

Anyway, I am not principally worried about speed or efficiency right now - this 
cluster is big enough that even if I do not use it most efficiently, I'll still 
be doing better than with my very small current cluster, which has very limited 
RAM and antique processors.

My question is: will HBase work at all on Lustre? That is, on pp. 52-54 of your 
O'Reilly HBase book, you say that

"... you are not locked into HDFS because the "FileSystem" used by HBase has a 
pluggable architecture and can be used to replace HDFS with any other supported 
system. The possibilities are endless and waiting for the brave at heart."  ... 
"You can select a different filesystem implementation by using a URI pattern, 
where the scheme (the part before the first ":",  i.e., the colon) part of the 
URI identifies the driver to be used."

We use HDFS by setting the URI to

 hdfs://<namenode>:port/<path>

And you say to simply use the local file system a desktop Linux box (which 
would not replicate data or maintain copies of the files - no fault tolerance) 
one uses

 file:///<path<file:///\\%3cpath>>

So - can I simply change this one param, and point HBase to a location in the 
Lustre file system? That Is, use

 <property>
   <name>hbase.rootdir</name>
   
<value>file:///pic/scratch/rtaylor/hbase</value<file:///\\pic\scratch\rtaylor\hbase%3c\value>>
 </property>

Where  "/pic" points to the root of the Lustre system. Or use something 
similar? I am told that all of the Lustre OSTs are backed by RAID6, so my HBase 
tables would be fairly safe from hardware failure. If you put a file into the 
Lustre file system, chances are very slim you are going to lose it from a 
hardware failure. Also, I can make copies periodically to our gigantic  file 
storage cluster in a separate building.  This does not need to be a production 
HBase system (at least, for now). This is more of a data warehouse / analytics 
/data integration environment for several bioinformatics scientists, a system 
that we can afford to go down from time to time, in a research environment.

Note that when I use Hadoop by itself, or Hadoop with HBase tables as sources 
and sinks, only the Hbase accesses would be from the Lustre file system. The 
Hadoop program would still be able to use HDFS on local disks on the subset of 
nodes allotted to it on the cluster - as the Hadoop programs now running on 
this new cluster as doing. My problem is just I don't want to have to rebuild 
the Hbase tables every time I want to do something, since the local disk space 
is retrieved for other possible users after a job finishes. But I can get 
permanent (well, yearly renewal) disk space on the Lustre system.

 So - any advice, before I give this a try? Will changing this one HBase config 
parameter suffice to get me started? Or are there other things involved?

 - Ron

Ronald Taylor, Ph.D.
Computational Biology & Bioinformatics Group
Pacific Northwest National Laboratory (U.S. Dept of Energy/Battelle)
Richland, WA 99352
phone: (509) 372-6568<tel:%28509%29%20372-6568>
email: [email protected]<mailto:[email protected]>


-----Original Message-----
From: Lars George [mailto:[email protected]<mailto:[email protected]>]
Sent: Wednesday, November 30, 2011 6:34 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: getting HBase up after an unexpected power failure - need some 
advice

Hey,

Looks like you have a corrupted ZK. Try and stop ZK (after stopping HBase of 
course) and restart it. If that also fails, then wipe the data dir ZK uses 
(check the config, for example the zoo.cfg for stand alone ZK nodes). ZK is 
going to recreate the data files and it should be able to move forward.

Cheers,
Lars



--
--- Get your facts first, then you can distort them as you please.--

RE: want to try HBase on a large cluster running Lustre - any advice?

Reply via email to