Glad to hear that your stability is getting better. 

DRBD works best on dedicated hardware RAIDs with dedicated 1GbE replication 
links. 

Some might disagree with me, but I really recommend dedicated external storage 
for OCFS2. You can use any old server with lots of drive space...Something as 
old as a Dell PE2950 will even work if it meets your needs. Throw in some 
drives, make a RAID, install CentOS  set up an iSCSI target (iet) and you're 
good to go. 

Then, if you want to improve your availability, add DRBD and heartbeat to the 
mix.


Best,
Michael








-----Original Message-----
From: Nathan Patwardhan [mailto:npatward...@llbean.com] 
Sent: Wednesday, May 02, 2012 3:30 PM
To: Kushnir, Michael (NIH/NLM/LHC) [C]; ocfs2-users@oss.oracle.com
Subject: RE: RHEL 5.8, ocfs2 v1.4.4, stability issues

>  -----Original Message-----
>  From: Kushnir, Michael (NIH/NLM/LHC) [C]  
> [mailto:michael.kush...@nih.gov]
>  Sent: Friday, April 27, 2012 11:59 AM
>  To: Nathan Patwardhan; ocfs2-users@oss.oracle.com
>  Subject: RE: RHEL 5.8, ocfs2 v1.4.4, stability issues
>  
>  If memory serves me correctly, shared VMDK for clustering only works 
> if  both cluster nodes sit on the same ESX server. This pretty much 
> defeats the  purpose of clustering because your ESX server becomes a 
> single point of  failure. ESX servers crash and fail sometimes... Not 
> frequently, but they do.
>  
>  
>  
>  When setting up RDM make sure you configure it for physical mode 
> (pass  thru). If you use virtual mode, your nodes will once again be 
> limited to a  single ESX server and your max RDM size will be limited to ~2TB.

We haven't gotten to implement RDM yet, but will.  We're discussing this 
internally right now so I don't have any summaries about that.

As a transitional step, I decided to see if I could stabilize ocfs2 by 
introducing drbd and REMOVING the shared vmdks from both ESX guests.  After 
each ESX guest was given a single and additional 10GB vmdk and I added each 
vmdk (/dev/sdb1 on each guest) to drbd and got drbd going, I did a mkfs.ocfs2 
and got everything mounted as it should be, then started splunk.

We've been stable for almost 24 hours but most importantly we have NOT seen any 
of the errors in syslog nor have experienced any ocfs2 outages or system 
crashes as we had seen when systems had been using a shared vmdk.  drdb 
performance isn't the greatest and is actually pretty poor in the context of 
ESX, but at the same time proving out ocfs2 reliability definitely seems to be 
taking shape.  I am looking forward to seeing how we do performance-wise once 
we introduce RDM.

--
Nathan Patwardhan, Sr. System Engineer
npatward...@llbean.com
x26662


_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Reply via email to