Re: [Lustre-discuss] New Test Framework Development - Requirements Capture
Just a reminder that we have a meeting on Tuesday to discuss the new test framework development and that the wiki page on the opensfs site is ready for capturing everybody's thoughts. The plan for the next meeting to be to discuss the requirements that have been captured. Thanks Chris Call Info Tuesday 11th December 16:00 UTC Bridge Info 916-356-2663, Bridge: 1, Passcode: 1146033 ... .. Join online meeting https://meet.intel.com/chris.gearing/M6HD8CYF First online meeting? http://r.office.microsoft.com/r/rlidOC10?clid=1033p1=4p2=1041pc=ocver=4 subver=0bld=7185bldver=0 Hi, During the last meeting we decided to setup a Wiki page to allow everyone to capture their thoughts, requirements and ideas for possible inclusion into the new framework environment. This page is now available on the OpenSFS Wiki and we would welcome your input before the 7th December, please provide input how big or small. http://wiki.opensfs.org/New_test_framework Many thanks Chris - Intel Corporation (UK) Limited Registered No. 1134945 (England) Registered Office: Pipers Way, Swindon SN3 1RJ VAT No: 860 2173 47 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
[Lustre-discuss] Applications of Lustre - streaming?
Can Lustre be used to store data like streaming audio / video? I’ve been scolded about considering it for DB storage but I’m looking at the relative merits of Lustre vs HDFS. I’m moving to a clustered DB setup and wondering about Cassandra / Lustre vs Hadoop (IE HBase / HDFS). One offers flexibility in terms of mixing hardware components while the other is a ‘one stop shop’. Not trying to elicit a religious war – and yes, I’ve been reading as much as I can find about this. Just hoping for the opinion(s) of this side of the table. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Applications of Lustre - streaming?
Hi, On 12/07/2012 10:26 AM, Jon Yeargers wrote: Can Lustre be used to store data like streaming audio / video? Yes I’ve been scolded about considering it for DB storage but I’m looking at the relative merits of Lustre vs HDFS. db reads/writes tends to lead to small I/O which lustre does not handle as well as large I/O I’m moving to a clustered DB setup and wondering about Cassandra / Lustre vs Hadoop (IE HBase / HDFS). One offers flexibility in terms of mixing hardware components while the other is a ‘one stop shop’. Honestly not sure, If you do perform some benchmarking between the two, I, and I'm sure others would be greatly interested in seeing how the various FS technologies stack up! Not trying to elicit a religious war – and yes, I’ve been reading as much as I can find about this. Just hoping for the opinion(s) of this side of the table. I don't think you'll find that here =) -cf ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Applications of Lustre - streaming?
On 2012-12-07, at 10:26, Jon Yeargers yearg...@ohsu.edumailto:yearg...@ohsu.edu wrote: Can Lustre be used to store data like streaming audio / video? I’ve been scolded about considering it for DB storage but I’m looking at the relative merits of Lustre vs HDFS. I've been using Lustre for years with my home MythTV (Linux PVR) setup. The only major change I made was to reduce the readahead window size so that there wasn't lag when videos first start playing due to the large readahead window being filled. Of course, the suitability for a given workload depends on the hardware being used. Lustre will definitely give you better performance for the same hardware than HDFS, but if you need highly available data, the storage needs to be able to failover between servers. Cheers, Andreas I’m moving to a clustered DB setup and wondering about Cassandra / Lustre vs Hadoop (IE HBase / HDFS). One offers flexibility in terms of mixing hardware components while the other is a ‘one stop shop’. Not trying to elicit a religious war – and yes, I’ve been reading as much as I can find about this. Just hoping for the opinion(s) of this side of the table. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.orgmailto:Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Applications of Lustre - streaming?
The redundancy of HDFS is very appealing. I've been weighing the merits of this vs a RAID-6 / server on Lustre. HDFS recommends avoiding RAID for the very reason that the data is (typically) saved in several locations. -Original Message- From: Dilger, Andreas [mailto:andreas.dil...@intel.com] Sent: Friday, December 07, 2012 9:35 AM To: Jon Yeargers Cc: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] Applications of Lustre - streaming? On 2012-12-07, at 10:26, Jon Yeargers yearg...@ohsu.edumailto:yearg...@ohsu.edu wrote: Can Lustre be used to store data like streaming audio / video? I’ve been scolded about considering it for DB storage but I’m looking at the relative merits of Lustre vs HDFS. I've been using Lustre for years with my home MythTV (Linux PVR) setup. The only major change I made was to reduce the readahead window size so that there wasn't lag when videos first start playing due to the large readahead window being filled. Of course, the suitability for a given workload depends on the hardware being used. Lustre will definitely give you better performance for the same hardware than HDFS, but if you need highly available data, the storage needs to be able to failover between servers. Cheers, Andreas I’m moving to a clustered DB setup and wondering about Cassandra / Lustre vs Hadoop (IE HBase / HDFS). One offers flexibility in terms of mixing hardware components while the other is a ‘one stop shop’. Not trying to elicit a religious war – and yes, I’ve been reading as much as I can find about this. Just hoping for the opinion(s) of this side of the table. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.orgmailto:Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Applications of Lustre - streaming?
Hello, The question of hdfs storage via lustre has been in the foreground of my thinking. the hadoop hdfs processes are not aware of block devices: they only know of a filesystem mount point to begin storing data in hdfs. THUS… If we provide a filesystem interface (say a lustre mount point) whose latencies and throughput approach that of local disk storage (say, via infiniband), could we not have the various hadoop nodes store their data in the lustre filesystem? would hadoop even care? I realize that this may not be a good place to bring it up. But there you go… One of these days, (with all of my ample spare time), I will benchmark it. and report of course… --jason From: Jon Yeargers yearg...@ohsu.edumailto:yearg...@ohsu.edu Date: Friday, December 7, 2012 9:26 AM To: lustre-discuss@lists.lustre.orgmailto:lustre-discuss@lists.lustre.org lustre-discuss@lists.lustre.orgmailto:lustre-discuss@lists.lustre.org Subject: [Lustre-discuss] Applications of Lustre - streaming? Can Lustre be used to store data like streaming audio / video? I’ve been scolded about considering it for DB storage but I’m looking at the relative merits of Lustre vs HDFS. I’m moving to a clustered DB setup and wondering about Cassandra / Lustre vs Hadoop (IE HBase / HDFS). One offers flexibility in terms of mixing hardware components while the other is a ‘one stop shop’. Not trying to elicit a religious war – and yes, I’ve been reading as much as I can find about this. Just hoping for the opinion(s) of this side of the table. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Applications of Lustre - streaming?
If it weren’t for the positive aspects of HDFS I wouldn’t really be considering HBase (over Cassandra). Any notion of the merits of Lustre’s kernel-based mounts vs a FUSE-based mount (HDFS)? Whichever filesystem I go with I will need to store ‘flat files’ in. From: Jason Brooks Sent: Friday, December 07, 2012 9:38 AM To: Jon Yeargers; lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] Applications of Lustre - streaming? Hello, The question of hdfs storage via lustre has been in the foreground of my thinking. the hadoop hdfs processes are not aware of block devices: they only know of a filesystem mount point to begin storing data in hdfs. THUS… If we provide a filesystem interface (say a lustre mount point) whose latencies and throughput approach that of local disk storage (say, via infiniband), could we not have the various hadoop nodes store their data in the lustre filesystem? would hadoop even care? I realize that this may not be a good place to bring it up. But there you go… One of these days, (with all of my ample spare time), I will benchmark it. and report of course… --jason From: Jon Yeargers yearg...@ohsu.edumailto:yearg...@ohsu.edu Date: Friday, December 7, 2012 9:26 AM To: lustre-discuss@lists.lustre.orgmailto:lustre-discuss@lists.lustre.org lustre-discuss@lists.lustre.orgmailto:lustre-discuss@lists.lustre.org Subject: [Lustre-discuss] Applications of Lustre - streaming? Can Lustre be used to store data like streaming audio / video? I’ve been scolded about considering it for DB storage but I’m looking at the relative merits of Lustre vs HDFS. I’m moving to a clustered DB setup and wondering about Cassandra / Lustre vs Hadoop (IE HBase / HDFS). One offers flexibility in terms of mixing hardware components while the other is a ‘one stop shop’. Not trying to elicit a religious war – and yes, I’ve been reading as much as I can find about this. Just hoping for the opinion(s) of this side of the table. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Applications of Lustre - streaming?
I have used fuse for other filesystems: its great if all you need is access to the data, but the performance is HORRIBLE. --jason From: Jon Yeargers yearg...@ohsu.edumailto:yearg...@ohsu.edu Date: Friday, December 7, 2012 9:42 AM To: Jason Brooks brook...@ohsu.edumailto:brook...@ohsu.edu, lustre-discuss@lists.lustre.orgmailto:lustre-discuss@lists.lustre.org lustre-discuss@lists.lustre.orgmailto:lustre-discuss@lists.lustre.org Subject: RE: [Lustre-discuss] Applications of Lustre - streaming? If it weren’t for the positive aspects of HDFS I wouldn’t really be considering HBase (over Cassandra). Any notion of the merits of Lustre’s kernel-based mounts vs a FUSE-based mount (HDFS)? Whichever filesystem I go with I will need to store ‘flat files’ in. From: Jason Brooks Sent: Friday, December 07, 2012 9:38 AM To: Jon Yeargers; lustre-discuss@lists.lustre.orgmailto:lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] Applications of Lustre - streaming? Hello, The question of hdfs storage via lustre has been in the foreground of my thinking. the hadoop hdfs processes are not aware of block devices: they only know of a filesystem mount point to begin storing data in hdfs. THUS… If we provide a filesystem interface (say a lustre mount point) whose latencies and throughput approach that of local disk storage (say, via infiniband), could we not have the various hadoop nodes store their data in the lustre filesystem? would hadoop even care? I realize that this may not be a good place to bring it up. But there you go… One of these days, (with all of my ample spare time), I will benchmark it. and report of course… --jason From: Jon Yeargers yearg...@ohsu.edumailto:yearg...@ohsu.edu Date: Friday, December 7, 2012 9:26 AM To: lustre-discuss@lists.lustre.orgmailto:lustre-discuss@lists.lustre.org lustre-discuss@lists.lustre.orgmailto:lustre-discuss@lists.lustre.org Subject: [Lustre-discuss] Applications of Lustre - streaming? Can Lustre be used to store data like streaming audio / video? I’ve been scolded about considering it for DB storage but I’m looking at the relative merits of Lustre vs HDFS. I’m moving to a clustered DB setup and wondering about Cassandra / Lustre vs Hadoop (IE HBase / HDFS). One offers flexibility in terms of mixing hardware components while the other is a ‘one stop shop’. Not trying to elicit a religious war – and yes, I’ve been reading as much as I can find about this. Just hoping for the opinion(s) of this side of the table. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Applications of Lustre - streaming?
It is the question of how to handle redundancy that stops me from immediately testing this idea of mine. Well, that and time, werewithal, etc... Hadoop is great because it uses the speed and latency of local disks to work with data, and does not require systems be homogeneous. With the data replicated, it not only has the storage redundancy, but also x-1 hosts that can work with the data if a host goes down. The down side appears to be that hadoop can't really handle many nodes going down ungracefully. My personal goal with my idea is to make a host a dumb compute node that I can shutdown with impunity. On 12/7/12 9:37 AM, Jon Yeargers yearg...@ohsu.edu wrote: The redundancy of HDFS is very appealing. I've been weighing the merits of this vs a RAID-6 / server on Lustre. HDFS recommends avoiding RAID for the very reason that the data is (typically) saved in several locations. -Original Message- From: Dilger, Andreas [mailto:andreas.dil...@intel.com] Sent: Friday, December 07, 2012 9:35 AM To: Jon Yeargers Cc: lustre-discuss@lists.lustre.org Subject: Re: [Lustre-discuss] Applications of Lustre - streaming? On 2012-12-07, at 10:26, Jon Yeargers yearg...@ohsu.edumailto:yearg...@ohsu.edu wrote: Can Lustre be used to store data like streaming audio / video? I¹ve been scolded about considering it for DB storage but I¹m looking at the relative merits of Lustre vs HDFS. I've been using Lustre for years with my home MythTV (Linux PVR) setup. The only major change I made was to reduce the readahead window size so that there wasn't lag when videos first start playing due to the large readahead window being filled. Of course, the suitability for a given workload depends on the hardware being used. Lustre will definitely give you better performance for the same hardware than HDFS, but if you need highly available data, the storage needs to be able to failover between servers. Cheers, Andreas I¹m moving to a clustered DB setup and wondering about Cassandra / Lustre vs Hadoop (IE HBase / HDFS). One offers flexibility in terms of mixing hardware components while the other is a Œone stop shop¹. Not trying to elicit a religious war and yes, I¹ve been reading as much as I can find about this. Just hoping for the opinion(s) of this side of the table. ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.orgmailto:Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] Applications of Lustre - streaming?
On 12/7/12 9:34 AM, Dilger, Andreas wrote: I've been using Lustre for years with my home MythTV (Linux PVR) setup. Nerd. :) -- -- Jeff Johnson Co-Founder Aeon Computing jeff.john...@aeoncomputing.com www.aeoncomputing.com t: 858-412-3810 x101 f: 858-412-3845 m: 619-204-9061 /* New Address */ 4170 Morena Boulevard, Suite D - San Diego, CA 92117 ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?
On Dec 6, 2012, at 2:58 PM, Grigory Shamov wrote: So, on one of our OSS servers the load is now 160. According to collectl, only one OST does most of the job. (We dont do striping on this FS; unless users to it manually on their subdirectories). This sounds similar to situations we see every now and then. The load on the oss server climbs until it is roughly equally to the number of oss threads (which sounds like your case with load=oss_threads=160), but only a single ost is performing any significant IO. This seems to arise when parallel jobs access the same file which has stripe_count=1. The oss is bombarded with so many requests to a single ost that they backlog and tie up all the oss threads. At that point, all IO to the oss slows to a crawl no matter which ost on the oss is being used. This becomes problematic because even a modest sized job can effectively DOS and oss server. When you encounter these problems, is the IO to the affected ost primarly one-way (ie - mostly reads or mostly writes)? In our cases, we tend to see this when parallel jobs are reading from a common file. There are a couple of things that I have found that help: 1) Increase the file striping a lot. This helps spread the load over more osts. We have had success with striping even relatively small files (~10 GB) over 100+ osts. Not only does it reduce load on the oss, but it usually speeds up the application significantly. 2) Make sure caching is enabled on the oss. For us, this seems to help mostly when lots of processes are reading in the same file. Not sure if your situation is exactly like what I have seen, but maybe some of that info can help a bit. -- Rick Mohr Senior HPC System Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss
Re: [Lustre-discuss] noatime or atime_diff for Lustre 1.8.7?
2) Make sure caching is enabled on the oss. How do you check/enable for this? Is it not enabled by default? Cheers, Mark - Original Message - From: Mohr Jr, Richard Frank (Rick Mohr) rm...@utk.edu To: Grigory Shamov ga...@yahoo.com Cc: lustre-discuss@lists.lustre.org Sent: Saturday, 8 December, 2012 5:19:31 AM Subject: Re: [Lustre-discuss] noatime or atime_diff for Lustre 1.8.7? On Dec 6, 2012, at 2:58 PM, Grigory Shamov wrote: So, on one of our OSS servers the load is now 160. According to collectl, only one OST does most of the job. (We dont do striping on this FS; unless users to it manually on their subdirectories). This sounds similar to situations we see every now and then. The load on the oss server climbs until it is roughly equally to the number of oss threads (which sounds like your case with load=oss_threads=160), but only a single ost is performing any significant IO. This seems to arise when parallel jobs access the same file which has stripe_count=1. The oss is bombarded with so many requests to a single ost that they backlog and tie up all the oss threads. At that point, all IO to the oss slows to a crawl no matter which ost on the oss is being used. This becomes problematic because even a modest sized job can effectively DOS and oss server. When you encounter these problems, is the IO to the affected ost primarly one-way (ie - mostly reads or mostly writes)? In our cases, we tend to see this when parallel jobs are reading from a common file. There are a couple of things that I have found that help: 1) Increase the file striping a lot. This helps spread the load over more osts. We have had success with striping even relatively small files (~10 GB) over 100+ osts. Not only does it reduce load on the oss, but it usually speeds up the application significantly. 2) Make sure caching is enabled on the oss. For us, this seems to help mostly when lots of processes are reading in the same file. Not sure if your situation is exactly like what I have seen, but maybe some of that info can help a bit. -- Rick Mohr Senior HPC System Administrator National Institute for Computational Sciences http://www.nics.tennessee.edu ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss ___ Lustre-discuss mailing list Lustre-discuss@lists.lustre.org http://lists.lustre.org/mailman/listinfo/lustre-discuss