Re: [Ocfs2-users] Debugging help / Guidance on architecture

Sunil Mushran Tue, 19 May 2009 15:58:43 -0700

Damon Miller wrote:
> The two servers are actually connected to the same switch.  We are using 
> iptables for basic packet filtering on all of our hosts, but TCP/7777 is open 
> on all machines participating in the cluster.  iSCSI is also enabled on 
> TCP/3260.  Here are the relevant excerpts from 'iptables -L -n' on the iSCSI 
> target:
>
> ACCEPT     tcp  --  10.10.89.0/24        0.0.0.0/0           state NEW tcp 
> dpt:3260 
> ACCEPT     tcp  --  10.10.89.0/24        0.0.0.0/0           state NEW tcp 
> dpt:7777


If this happens again, I would makesure that iptables is not malfunctioning.
You could maybe log the REJECTs.

> Excellent point.  In terms of the filesystem layout, we assign one directory 
> to each of our customers (with a very few exceptions).  Within each of these 
> directories is another set of directories representing each extension or DID 
> (phone number) that receives voicemail.  Finally, each extension has its own 
> directory hierarchy responsible for storing messages and greetings.  Here's a 
> quick example showing one mailbox for a single customer (actual number and 
> name obscured for privacy purposes):
>
> cust
> |-- 502___1667
> |   |-- INBOX
> |   |   |-- msg0000.txt
> |   |   |-- msg0000.wav
> |   |   |-- msg0001.txt
> |   |   |-- msg0001.wav
> |   |   |-- msg0002.txt
> |   |   |-- msg0002.wav
> |   |   |-- msg0003.txt
> |   |   |-- msg0003.wav
> |   |   |-- msg0004.txt
> |   |   `-- msg0004.wav
> |   |-- Old
> |   |-- busy.wav
> |   |-- greet
> |   |-- greet.wav
> |   |-- temp
> |   |-- tmp
> |   |-- unavail
> |   `-- unavail.wav
>
> [etc.]
>
> The whole tree currently houses 194,713 files and 52,768 directories 
> consuming just under 60 GB of storage.
>
> In terms of utilization, I just scanned the tree for all messages created 
> within the past 24 hours and came up with a total of 87,207 consuming nearly 
> 50 GB (>80% of the total usage).  That's spread across at least seven 
> different servers.  Usage varies considerably during that 24-hour period but 
> we've done some basic estimation along these lines.
>
> The average write I/O over the 24-hour period is 600 KB/s (again, spread 
> across seven servers).  Peak utilization is on the order of 5-6 times this, 
> or roughly 3.5 MB/s.  This number is a little misleading in the current 
> configuration as messages are first stored locally and then propogated via 
> Unison but it's certainly relevant for the shared storage approach.
>
> We're also using OCFS2 for call recordings, though we first store them on a 
> local ramdisk to ensure sufficient throughput and then copy them to 
> persistent storage.  We are able to guarantee globally unique filenames to 
> eliminate conflicts.
>
> While I'm providing too much information I might as well describe the 
> transition and remote replication plans.  :)
>
> Assuming we can reach the necessary comfort level with OCFS2, the transition 
> plan is to incrementally migrate each of our customer servers away from 
> file-based replication by establishing an initial replica on the iSCSI target 
> which would be mounted by members of the OCFS2 cluster.  Doing so would 
> obviate the need for maintaining a file-based replica on these members, thus 
> allowing us to slowly move away from the file-based solution.  However, until 
> all customer servers are using shared storage we would need to continue 
> sync'ing the iSCSI target to our current repository with Unison.  In practice 
> this means mounting the OCFS2 volume through the iSCSI target's loopback 
> address and running Unison against the current repository.
>
> Lastly, we would deploy a replica of this OCFS2/iSCSI-based solution in our 
> other primary datacenter and migrate its servers as described above.  Once 
> completed, the plan is to use file-based between the two iSCSI targets in 
> order to propogate changes across the WAN.  Conflicts could occur here if 
> customers fail over to their tertiary server but these are infrequent and 
> we're comfortable resolving them manually.
>
> Hopefully this provides some context.

Nod.

> Agreed, and this is an area I frankly do not fully understand.
>
> One problem we're hoping to solve with OCFS2 is the frequent conflicts we see 
> as a result of the current file-based approach.  Asterisk makes no attempt to 
> ensure unique filenames for messages, thus each server effectively operates 
> independently.  If a secondary server is used prior to file synchronization, 
> it's quite possible that a new message will introduce a conflict.  Asterisk 
> will simply increment the message number based on its local filesystem (e.g. 
> "msg0001" -> "msg0002") and store the recording.  This will generate a 
> conflict if these files exist on the primary server as a result of earlier, 
> unsynchronized messages.  The hope is that OCFS2 will provide filesystem 
> consistency across all member nodes such that filename-based conflicts as 
> described above will be avoided.

Sure. You could also add a node number to the message filename.

> In terms of actual I/O contention, we've established basic operating 
> characteristics governing the number of customers we can place on a single 
> server.  This is dictated by a combination of CPU and I/O capacity, the 
> latter of these being impacted significantly by the file-based replication 
> approach.  What I do not yet have is an understanding of how OCFS2 affects 
> this recipe (ideally for the better!).

Couple of points. Ensure that _a_ directory does not have more than
10-20K files. Roughly speaking. The current version of the fs does
not support indexed dirs. (The feature has been added in mainline
and will be available with the next release of the fs.) You can have
unlimited files in a dir, just that without indexing, creating files
gets slower as it has to sequentially match the names to ensure there
is no name clash. Also related is that currently the fs limits the
number of sub-dirs to 32000 in _a_ directory. This is a hard limit.
The reason is the same. Again, limit will be relaxed with indexed dirs.

If the .txt are small files (<3K), you will be benefit from the
inline-data feature we will be enabling in ocfs2 1.4.2. (The feature
stores small files in the inode itself.)



_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users

Re: [Ocfs2-users] Debugging help / Guidance on architecture

Reply via email to