[gentoo-user] Re: File system testing
J. Roeleveld joost at antarean.org writes: Out of curiosity, what do you want to simulate? subsurface flows in porous medium. AKA carbon sequestration by injection wells. You know, provide proof that those that remove hydrocarbons and actuall put the CO2 back and significantly mitigate the effects of their ventures. It's like this. I have been stuggling with my 17 year old genius son who is a year away from entering medical school, with learning responsibility. So I got him a hyperactive, highly intelligent (mix-doberman) puppy to nurture, raise, train, love and be resonsible for. It's one genious pup, teaching another pup about being responsible. So goes the earl_bidness...imho. Many folks are recommending to skip Hadoop/HDFS all together I agree, Hadoop/HDFS is for data analysis. Like building a profile about people based on the information companies like Facebook, Google, NSA, Walmart, Governments, Banks, collect about their customers/users/citizens/slaves/ and go straight to mesos/spark. RDD (in-memory) cluster calculations are at the heart of my needs. The opposite end of the spectrum, loads of small files and small apps; I dunno about, but, I'm all ears. In the end, my (3) node scientific cluster will morph and support the typical myriad of networked applications, but I can take a few years to figure that out, or just copy what smart guys like you and joost do. Nope, I'm simply following what you do and provide suggestions where I can. Most of the clusters and distributed computing stuff I do is based on adding machines to distribute the load. But the mechanisms for these are implemented in the applications I work with, not what I design underneath. The filesystems I am interested in are different to the ones you want. Maybe. I do not know what I want yet. My vision is very light weight workstations running lxqt (small memory footprint) or such, and a bad_arse cluster for the heavy lifting running on whatever heterogenous resoruces I have. From what I've read, the cluster and the file systems are all redundant that the cluster level (mesos/spark anyway) regardless of one any give processor/system is doing. All of Alans fantasies (needs) can be realized once the cluster stuff is master. (chronos, ansible etc etc). I need to provided access to software installation files to a VM server and access to documentation which is created by the users. The VM server is physically next to what I already mentioned as server A. Access to the VM from the remote site will be using remote desktop connections. But to allow faster and easier access to the documentation, I need a server B at the remote site which functions as described. AFS might be suitable, but I need to be able to layer Samba on top of that to allow a seamless operation. I don't want the laptops to have their own cache and then having to figure out how to solve the multiple different changes to documents containing layouts. (MS Word and OpenDocument files). Ok so your customers (hperactive problem users) inteface to your cluster to do their work. When finished you write things out to other servers with all of the VM servers. Lots of really cool tools are emerging in the cluster space. I think these folks have mesos + spark + samba + nfs all in one box. [1] Build rather than purchase? WE have to figure out what you and Alan need, on a cluster, because it is what most folks need/want. It the admin_advantage part of cluster. (There also the Big Science (me) and Web centric needs. Right now they are realted project, but things will coalesce, imho. There is even Spark_sql for postgres admins [2]. [1] http://www.quantaqct.com/en/01_product/02_detail.php?mid=29sid=162id=163qs=102 [2] https://spark.apache.org/sql/ We use Lustre for our high performance general storage. I don't have any numbers, but I'm pretty sure it is *really* fast (10Gbit/s over IB sounds familiar, but don't quote me on that). AT Umich, you guys should test the FhGFS/btrfs combo. The folks at UCI swear about it, although they are only publishing a wee bit. (you know, water cooler gossip).. Surely the Wolverines do not want those californians getting up on them? Are you guys planning a mesos/spark test? Personally, I would read up on these and see how they work. Then, based on that, decide if they are likely to assist in the specific situation you are interested in. It's a ton of reading. It's not apples-to-apple_cider type of reading. My head hurts. Take a walk outside. Clear air should help you with the headaches :P Basketball, Boobs and Burbon use to work quite well. Now it's mostly basketball, but I'm working on someone very cute.. I'm leaning to DFS/LFS (2) Luster/btrfs and FhGFS/btrfs I have insufficient knowledge to advise on either of these. One question, why BTRFS instead of ZFS? I think btrfs has
Re: [gentoo-user] Re: File system testing
On Fri, Sep 19, 2014 at 9:41 AM, James wirel...@tampabay.rr.com wrote: I think btrfs has tremendous potential. I tried ZFS a few times, but the installs are not part of gentoo, so they got borked uEFI, grubs to uuids, etc etc also were in the mix. That was almost a year ago. For what ever reason the clustering folks I have read and communicated with are using ext4, xfs and btrfs. Prolly mostly because those are mostly used in their (systemd) inspired) distros? I do think that btrfs in the long-term is more likely to be mainstream on linux, but I wouldn't be surprised if getting zfs working on Gentoo is much easier now. Richard Yao is both a Gentoo dev and significant zfs on linux contributor, so I suspect he is doing much of the latter on the former. Yep. the license issue with ZFS is a real killer for me. Besides, as an old state-machine, C hack, anything with B-tree is fabulous. Prejudices? Yep, but here, I'm sticking with my gut. Multi port ram can do mavelous things with Btree data structures. The rest will become available/stable. Simply, I just trust btrfs, in my gut. I don't know enough about zfs to compare them, but the design of btrfs has a certain amount of beauty/symmetry/etc to it IMHO. I only have studied it enough to be dangerous and give some intro talks to my LUG, but just about everything is stored in b-trees, the design allows both fixed and non-fixed length nodes within the trees, and just about everything about the filesystem is dynamic other than the superblocks, which do little more than ID the filesystem and point to the current tree roots. The important stuff is all replicated and versioned. I wouldn't be surprised if it shared many of these design features with other modern filesystems, and I do not profess to be an expert on modern filesystem design, so I won't make any claims about btrfs being better/worse than other filesystems in this regard. However, I would say that anybody interested in data structures would do well to study it. -- Rich
Re: [gentoo-user] Re: File system testing
On Friday, September 19, 2014 01:41:26 PM James wrote: J. Roeleveld joost at antarean.org writes: Out of curiosity, what do you want to simulate? subsurface flows in porous medium. AKA carbon sequestration by injection wells. You know, provide proof that those that remove hydrocarbons and actuall put the CO2 back and significantly mitigate the effects of their ventures. Interesting topic. Can't provide advice on that topic. It's like this. I have been stuggling with my 17 year old genius son who is a year away from entering medical school, with learning responsibility. So I got him a hyperactive, highly intelligent (mix-doberman) puppy to nurture, raise, train, love and be resonsible for. It's one genious pup, teaching another pup about being responsible. Overactive kids, always fun. I try to keep mine busy without computers and TVs for now. (She's going to be 3 in November) So goes the earl_bidness...imho. Many folks are recommending to skip Hadoop/HDFS all together I agree, Hadoop/HDFS is for data analysis. Like building a profile about people based on the information companies like Facebook, Google, NSA, Walmart, Governments, Banks, collect about their customers/users/citizens/slaves/ and go straight to mesos/spark. RDD (in-memory) cluster calculations are at the heart of my needs. The opposite end of the spectrum, loads of small files and small apps; I dunno about, but, I'm all ears. In the end, my (3) node scientific cluster will morph and support the typical myriad of networked applications, but I can take a few years to figure that out, or just copy what smart guys like you and joost do. Nope, I'm simply following what you do and provide suggestions where I can. Most of the clusters and distributed computing stuff I do is based on adding machines to distribute the load. But the mechanisms for these are implemented in the applications I work with, not what I design underneath. The filesystems I am interested in are different to the ones you want. Maybe. I do not know what I want yet. My vision is very light weight workstations running lxqt (small memory footprint) or such, and a bad_arse cluster for the heavy lifting running on whatever heterogenous resoruces I have. From what I've read, the cluster and the file systems are all redundant that the cluster level (mesos/spark anyway) regardless of one any give processor/system is doing. All of Alans fantasies (needs) can be realized once the cluster stuff is master. (chronos, ansible etc etc). Alan = your son? or? I would, from the workstation point of view, keep the cluster as a single entity, to keep things easier. A cluster FS for workstation/desktop use is generally not suitable for a High Performance Cluster (HPC) (or vice-versa) I need to provided access to software installation files to a VM server and access to documentation which is created by the users. The VM server is physically next to what I already mentioned as server A. Access to the VM from the remote site will be using remote desktop connections. But to allow faster and easier access to the documentation, I need a server B at the remote site which functions as described. AFS might be suitable, but I need to be able to layer Samba on top of that to allow a seamless operation. I don't want the laptops to have their own cache and then having to figure out how to solve the multiple different changes to documents containing layouts. (MS Word and OpenDocument files). Ok so your customers (hperactive problem users) inteface to your cluster to do their work. When finished you write things out to other servers with all of the VM servers. Lots of really cool tools are emerging in the cluster space. Actually, slightly different scenario. Most work is done at customers systems. Occasionally we need to test software versions prior to implementing these at customers. For that, we use VMs. The VM-server we have is currently sufficient for this. When it isn't, we'll need to add a 2nd VMserver. On the NAS, we store: - Documentation about customers + Howto documents on how to best install the software. - Installation files downloaded from vendors (We also deal with older versions that are no longer available. We need to have our own collection to handle that) As we are looking into also working from a different location, we need: - Access to the VM-server (easy, using VPN and Remote Desktops) - Access to the files (I prefer to have a local 'cache' at the remote location) It's the access to files part where I need to have some sort of distributed filesystem. I think these folks have mesos + spark + samba + nfs all in one box. [1] [1] http://www.quantaqct.com/en/01_product/02_detail.php?mid=29sid=162id=163q s=102 Had a quick look, these use MS Windows Storage 2012, this is only failover on the storage side. I don't see anything related to
Re: [gentoo-user] Re: File system testing
On Friday, September 19, 2014 10:56:59 AM Rich Freeman wrote: On Fri, Sep 19, 2014 at 9:41 AM, James wirel...@tampabay.rr.com wrote: I think btrfs has tremendous potential. I tried ZFS a few times, but the installs are not part of gentoo, so they got borked uEFI, grubs to uuids, etc etc also were in the mix. That was almost a year ago. For what ever reason the clustering folks I have read and communicated with are using ext4, xfs and btrfs. Prolly mostly because those are mostly used in their (systemd) inspired) distros? I do think that btrfs in the long-term is more likely to be mainstream on linux, but I wouldn't be surprised if getting zfs working on Gentoo is much easier now. Richard Yao is both a Gentoo dev and significant zfs on linux contributor, so I suspect he is doing much of the latter on the former. Don't have the link handy, but there is an howto about it that, when followed, will give a ZFS pool running on Gentoo in a very short time. (emerge zfs is the longest part of the whole thing) Not even needed to reboot. Yep. the license issue with ZFS is a real killer for me. Besides, as an old state-machine, C hack, anything with B-tree is fabulous. Prejudices? Yep, but here, I'm sticking with my gut. Multi port ram can do mavelous things with Btree data structures. The rest will become available/stable. Simply, I just trust btrfs, in my gut. I don't know enough about zfs to compare them, but the design of btrfs has a certain amount of beauty/symmetry/etc to it IMHO. I only have studied it enough to be dangerous and give some intro talks to my LUG, but just about everything is stored in b-trees, the design allows both fixed and non-fixed length nodes within the trees, and just about everything about the filesystem is dynamic other than the superblocks, which do little more than ID the filesystem and point to the current tree roots. The important stuff is all replicated and versioned. I wouldn't be surprised if it shared many of these design features with other modern filesystems, and I do not profess to be an expert on modern filesystem design, so I won't make any claims about btrfs being better/worse than other filesystems in this regard. However, I would say that anybody interested in data structures would do well to study it. I like the idea of both and hope BTRFS will also come with the raid-6-like features and good support for larger drive counts (I've got 16 available for the filestorage) to make it, for me, a viable alternative to ZFS. -- Joost
Re: [gentoo-user] Re: File system testing
On 18/09/2014 14:12, Alec Ten Harmsel wrote: On 09/18/2014 05:17 AM, Kerin Millar wrote: On 17/09/2014 21:20, Alec Ten Harmsel wrote: As far as HDFS goes, I would only set that up if you will use it for Hadoop or related tools. It's highly specific, and the performance is not good unless you're doing a massively parallel read (what it was designed for). I can elaborate why if anyone is actually interested. I, for one, am very interested. --Kerin Alright, here goes: Rich Freeman wrote: FYI - one very big limitation of hdfs is its minimum filesize is something huge like 1MB or something like that. Hadoop was designed to take a REALLY big input file and chunk it up. If you use hdfs to store something like /usr/portage it will turn into the sort of monstrosity that you'd actually need a cluster to store. This is exactly correct, except we run with a block size of 128MB, and a large cluster will typically have a block size of 256MB or even 512MB. HDFS has two main components: a NameNode, which keeps track of which blocks are a part of which file (in memory), and the DataNodes that actually store the blocks. No data ever flows through the NameNode; it negotiates transfers between the client and DataNodes and negotiates transfers for jobs. Since the NameNode stores metadata in-memory, small files are bad because RAM gets wasted. What exactly is Hadoop/HDFS used for? The most common uses are generating search indices on data (which is a batch job) and doing non-realtime processing of log streams and/or data streams (another batch job) and allowing a large number of analysts run disparate queries on the same large dataset (another batch job). Batch processing - processing the entire dataset - is really where Hadoop shines. When you put a file into HDFS, it gets split based on the block size. This is done so that a parallel read will be really fast - each map task reads in a single block and processes it. Ergo, if you put in a 1GB file with a 128MB block size and run a MapReduce job, 8 map tasks will be launched. If you put in a 1TB file, 8192 tasks would be launched. Tuning the block size is important to optimize the overhead of launching tasks vs. potentially under-utilizing a cluster. Typically, a cluster with a lot of data has a bigger block size. The downsides of HDFS: * Seeked reads are not supported afaik because no one needs that for batch processing * Seeked writes into an existing file are not supported because either blocks would be added in the middle of a file and wouldn't be 128MB, or existing blocks would be edited, resulting in blocks larger than 128MB. Both of these scenarios are bad. Since HDFS users typically do not need seeked reads or seeked writes, these downsides aren't really a big deal. If something's not clear, let me know. Thank you for taking the time to explain. --Kerin
Re: [gentoo-user] Re: File system testing
On Wednesday, September 17, 2014 09:05:09 PM James wrote: J. Roeleveld joost at antarean.org writes: AFS has caching and can survive temporary disappearance of the server. Excellent for low bandwidth connections. Most DFS have mechanisms to deal with transient failures, but not as generaous on the time-scale as AFS. I believe, if I recall correctly, these hi-latency, low bandwith recovery mechanism keen design paramters, at least bake in the CMU develop cycples, for AFS? While attractive for your situation, these features might actually be detrimental to a hi_performance distributed cluster's needs for a DFS? I tend to agree. I'm not sure how up-to-date AFS is, but from re-reading the wikipedia pages, it sounds like what I need. Provided I can get it to work together with Samba. I need to allow MS Windows laptops access to the files on the remote location. For me, I need to be able to provide Samba filesharing on top of that layer on 2 different locations as I don't see the network bandwidth to be sufficient for normal operations. (ADSL uplinks tend to be dead slow) Yea, I'm not going to be testing OpenAFS for my needs, unless I read some compelling publish data on it's applicability to high end clusters best choice as a DFS. I wouldn't either. It's probably great for SETI etc etc. Doubtful :) Did you see the following wikipedia page: http://en.wikipedia.org/wiki/List_of_file_systems It contains a nice long list of various distributed, clustered, filesystems. I just miss an indication on how well these are still supported and on which OSs these (can) work. -- Joost
Re: [gentoo-user] Re: File system testing
On Wednesday, September 17, 2014 04:20:24 PM Alec Ten Harmsel wrote: As far as HDFS goes, I would only set that up if you will use it for Hadoop or related tools. It's highly specific, and the performance is not good unless you're doing a massively parallel read (what it was designed for). I can elaborate why if anyone is actually interested. We use Lustre for our high performance general storage. I don't have any numbers, but I'm pretty sure it is *really* fast (10Gbit/s over IB sounds familiar, but don't quote me on that). I think any shared filesystem will be fast if you have a lot of bandwidth :) When comparing network filesystems it makes sense to keep the hardware identical reduce the overhead to a percentage. Eg. What is the theoretical maximum speed for the used network. (10Gbit/s) and what is the actual maximum speed you get with: 1) a single really large file (200GB) 2) a lot (100,000) smaller files (2MB) Then you can make an estimate on what to expect when using a 1Gbit/s network. I somehow don't expect James to have InfiniBand available for his research? Personally, when choosing between InfiniBand and Ethernet, I'm tempted to go with dedicated bonded 10Gbit/s links because of the price- difference. (A quick research shows me that Infiniband is about 3x as expensive for the same throughput) Personally, I would read up on these and see how they work. Then, based on that, decide if they are likely to assist in the specific situation you are interested in. Always good advice. It saves time to do some simple research (the reading type) before actually doing tests. -- Joost
Re: [gentoo-user] Re: File system testing
On Wednesday, September 17, 2014 08:56:28 PM James wrote: Alec Ten Harmsel alec at alectenharmsel.com writes: As far as HDFS goes, I would only set that up if you will use it for Hadoop or related tools. It's highly specific, and the performance is not good unless you're doing a massively parallel read (what it was designed for). I can elaborate why if anyone is actually interested. Acutally, from my research and my goal (one really big scientific simulation running constantly). Out of curiosity, what do you want to simulate? Many folks are recommending to skip Hadoop/HDFS all together I agree, Hadoop/HDFS is for data analysis. Like building a profile about people based on the information companies like Facebook, Google, NSA, Walmart, Governments, Banks, collect about their customers/users/citizens/slaves/ and go straight to mesos/spark. RDD (in-memory) cluster calculations are at the heart of my needs. The opposite end of the spectrum, loads of small files and small apps; I dunno about, but, I'm all ears. In the end, my (3) node scientific cluster will morph and support the typical myriad of networked applications, but I can take a few years to figure that out, or just copy what smart guys like you and joost do. Nope, I'm simply following what you do and provide suggestions where I can. Most of the clusters and distributed computing stuff I do is based on adding machines to distribute the load. But the mechanisms for these are implemented in the applications I work with, not what I design underneath. The filesystems I am interested in are different to the ones you want. I need to provided access to software installation files to a VM server and access to documentation which is created by the users. The VM server is physically next to what I already mentioned as server A. Access to the VM from the remote site will be using remote desktop connections. But to allow faster and easier access to the documentation, I need a server B at the remote site which functions as described. AFS might be suitable, but I need to be able to layer Samba on top of that to allow a seamless operation. I don't want the laptops to have their own cache and then having to figure out how to solve the multiple different changes to documents containing layouts. (MS Word and OpenDocument files) We use Lustre for our high performance general storage. I don't have any numbers, but I'm pretty sure it is *really* fast (10Gbit/s over IB sounds familiar, but don't quote me on that). AT Umich, you guys should test the FhGFS/btrfs combo. The folks at UCI swear about it, although they are only publishing a wee bit. (you know, water cooler gossip).. Surely the Wolverines do not want those californians getting up on them? Are you guys planning a mesos/spark test? Personally, I would read up on these and see how they work. Then, based on that, decide if they are likely to assist in the specific situation you are interested in. It's a ton of reading. It's not apples-to-apple_cider type of reading. My head hurts. Take a walk outside. Clear air should help you with the headaches :P I'm leaning to DFS/LFS (2) Luster/btrfs and FhGFS/btrfs Thoughts/comments? I have insufficient knowledge to advise on either of these. One question, why BTRFS instead of ZFS? My current understanding is: - ZFS is production ready, but due to licensing issues, not included in the kernel - BTRFS is included, but not yet production ready with all planned features For me, Raid6-like functionality is an absolute requirement and latest I know is that that isn't implemented in BTRFS yet. Does anyone know when that will be implemented and reliable? Eg. what time-frame are we talking about? -- Joost
Re: [gentoo-user] Re: File system testing
On 17/09/2014 21:20, Alec Ten Harmsel wrote: As far as HDFS goes, I would only set that up if you will use it for Hadoop or related tools. It's highly specific, and the performance is not good unless you're doing a massively parallel read (what it was designed for). I can elaborate why if anyone is actually interested. I, for one, am very interested. --Kerin
Re: [gentoo-user] Re: File system testing
The HTML...it hurts my eyes... :) On Thu, Sep 18, 2014 at 4:24 AM, J. Roeleveld jo...@antarean.org wrote: On Wednesday, September 17, 2014 08:56:28 PM James wrote: Alec Ten Harmsel alec at alectenharmsel.com writes: As far as HDFS goes, I would only set that up if you will use it for Hadoop or related tools. It's highly specific, and the performance is not good unless you're doing a massively parallel read (what it was designed for). I can elaborate why if anyone is actually interested. FYI - one very big limitation of hdfs is its minimum filesize is something huge like 1MB or something like that. Hadoop was designed to take a REALLY big input file and chunk it up. If you use hdfs to store something like /usr/portage it will turn into the sort of monstrosity that you'd actually need a cluster to store. My current understanding is: - ZFS is production ready, but due to licensing issues, not included in the kernel - BTRFS is included, but not yet production ready with all planned features Your understanding of their maturity is fairly accurate. They also aren't 100% moving in the same direction - btrfs aims more to be a general-purpose filesystem replacement especially for smaller systems, and zfs is more focused on the enterprise, so it lacks features like raid reshaping (who needs to add 1 disk to a raid5 when you can just add 5 more disks to your 30 disk storage system). I think btrfs has a bit more hope of being an ext4 replacement some day for both this reason and the licensing issue. That in no way detracts from the usefulness of zfs, especially for larger deployments where the few areas where btrfs is more flexible would probably be looked at as gimmicks (kind of like being able to build your whole OS from source :) ). For me, Raid6-like functionality is an absolute requirement and latest I know is that that isn't implemented in BTRFS yet. Does anyone know when that will be implemented and reliable? Eg. what time-frame are we talking about? I suspect we're talking months before it is really implemented, and much longer before it is reliable. Right now btrfs can write raid6, but it can't really read it. That is, it operates just fine until you actually lose a disk containing something other than parity, and then it loses access to the data. This code is only in the kernel for development purposes and nobody advocates using it for production. Most of the code in btrfs which is reliable has been around for years, like raid1 support, and obviously it will be years until the raid5/6 code reaches that point. I am using btrfs mainly because once that day comes it will be much easier to migrate to it from btrfs raid1 than from zfs (which has no mechanism for migrating raid levels in-place (that is, within an existing vdev) - you would need to add new drives to the pool, migrate the data, and remove the old drives from the pool, which is nice if you have a big stack of drives and spare sata ports lying around like you would in a SAN). -- Rich
Re: [gentoo-user] Re: File system testing
On Thursday, September 18, 2014 05:48:58 AM Rich Freeman wrote: The HTML...it hurts my eyes... :) Apologies. My current understanding is: - ZFS is production ready, but due to licensing issues, not included in the kernel - BTRFS is included, but not yet production ready with all planned features Your understanding of their maturity is fairly accurate. They also aren't 100% moving in the same direction - btrfs aims more to be a general-purpose filesystem replacement especially for smaller systems, and zfs is more focused on the enterprise, so it lacks features like raid reshaping (who needs to add 1 disk to a raid5 when you can just add 5 more disks to your 30 disk storage system). Thank you for this info. I wasn't aware of this difference in 'design'. Sounds like ZFS will be more suited for me then. I think btrfs has a bit more hope of being an ext4 replacement some day for both this reason and the licensing issue. That in no way detracts from the usefulness of zfs, especially for larger deployments where the few areas where btrfs is more flexible would probably be looked at as gimmicks (kind of like being able to build your whole OS from source :) ). Next time I am rebuilding the desktops, I will likely switch them to BTRFS. Sounds like BTRFS will be more suited there. For me, Raid6-like functionality is an absolute requirement and latest I know is that that isn't implemented in BTRFS yet. Does anyone know when that will be implemented and reliable? Eg. what time-frame are we talking about? I suspect we're talking months before it is really implemented, and much longer before it is reliable. Right now btrfs can write raid6, but it can't really read it. That is, it operates just fine until you actually lose a disk containing something other than parity, and then it loses access to the data. This code is only in the kernel for development purposes and nobody advocates using it for production. Most of the code in btrfs which is reliable has been around for years, like raid1 support, and obviously it will be years until the raid5/6 code reaches that point. I am using btrfs mainly because once that day comes it will be much easier to migrate to it from btrfs raid1 than from zfs (which has no mechanism for migrating raid levels in-place (that is, within an existing vdev) - you would need to add new drives to the pool, migrate the data, and remove the old drives from the pool, which is nice if you have a big stack of drives and spare sata ports lying around like you would in a SAN). Exactly, although I prefer not to change the filesystem on a live system anytime soon. When it comes to redoing the filesystem like that, restoring from backups will be the fastest solution. -- Joost
Re: [gentoo-user] Re: File system testing
On 09/18/2014 05:17 AM, Kerin Millar wrote: On 17/09/2014 21:20, Alec Ten Harmsel wrote: As far as HDFS goes, I would only set that up if you will use it for Hadoop or related tools. It's highly specific, and the performance is not good unless you're doing a massively parallel read (what it was designed for). I can elaborate why if anyone is actually interested. I, for one, am very interested. --Kerin Alright, here goes: Rich Freeman wrote: FYI - one very big limitation of hdfs is its minimum filesize is something huge like 1MB or something like that. Hadoop was designed to take a REALLY big input file and chunk it up. If you use hdfs to store something like /usr/portage it will turn into the sort of monstrosity that you'd actually need a cluster to store. This is exactly correct, except we run with a block size of 128MB, and a large cluster will typically have a block size of 256MB or even 512MB. HDFS has two main components: a NameNode, which keeps track of which blocks are a part of which file (in memory), and the DataNodes that actually store the blocks. No data ever flows through the NameNode; it negotiates transfers between the client and DataNodes and negotiates transfers for jobs. Since the NameNode stores metadata in-memory, small files are bad because RAM gets wasted. What exactly is Hadoop/HDFS used for? The most common uses are generating search indices on data (which is a batch job) and doing non-realtime processing of log streams and/or data streams (another batch job) and allowing a large number of analysts run disparate queries on the same large dataset (another batch job). Batch processing - processing the entire dataset - is really where Hadoop shines. When you put a file into HDFS, it gets split based on the block size. This is done so that a parallel read will be really fast - each map task reads in a single block and processes it. Ergo, if you put in a 1GB file with a 128MB block size and run a MapReduce job, 8 map tasks will be launched. If you put in a 1TB file, 8192 tasks would be launched. Tuning the block size is important to optimize the overhead of launching tasks vs. potentially under-utilizing a cluster. Typically, a cluster with a lot of data has a bigger block size. The downsides of HDFS: * Seeked reads are not supported afaik because no one needs that for batch processing * Seeked writes into an existing file are not supported because either blocks would be added in the middle of a file and wouldn't be 128MB, or existing blocks would be edited, resulting in blocks larger than 128MB. Both of these scenarios are bad. Since HDFS users typically do not need seeked reads or seeked writes, these downsides aren't really a big deal. If something's not clear, let me know. Alec
[gentoo-user] Re: File system testing
Hervé Guillemet herve at guillemet.org writes: Le 16/09/2014 21:07, James a écrit : By now many are familiar with my keen interest in clustering gentoo systems. So, what most cluster technologies use is a distributed file system on top of the local (HD/SDD) file system. Have you found this document : http://hal.inria.fr/hal-00789086/PDF/a_survey_of_dfs.pdf Hello Herve, Yes, I read the document and it is a good introduction to some of my issues on which file system(s) to use for clustering. But, it's more of a survey than a comparison/benchmark study, which would be really beneficial. DFS are moving so fast now, and their setups and features are rarely a one to one match. For example, (currently) the best load balancing you find, is actually in the apps that run above the cluster software. [1] Some of the performance/resource-utilizations of the files systems/resources are determined by real-time analytics with graphical displays. I'm not sure that load balancing even belongs in a DFS, yet in the paper you reference, it was prominently discussed. Things are moving so fast there in the distributed-*/cluster/cluster-tools/cluster-apps space, one really need a system set up to apply almost daily patches for testing. I never realize just how much reading is necessary just to understand the current landscape in clustering. I'm trying to figure out an echo_system where gentoo folks can experiment wtih mesos clustering for scientific applications. After that, the more general case should be mature enough for general purpose applications. I'm avoiding the clustered web arena, as that is just too much for me to digest; so somebody else could champion that part of all of those Apache-cluster technologies. Thanks for the document link! James [1]
[gentoo-user] Re: File system testing
J. Roeleveld joost at antarean.org writes: Distributed File Systems (DFS): Local (Device) File Systems LFS: Is my understanding correct that the top list all require one of the bottom list? Eg. the clustering FSs only ensure the files on the LFSs are duplicated/spread over the various nodes? I would normally expect the clustering FS to be either the full layer or a clustered block-device where an FS can be placed on top. I have not performed these installation yet. My research indicates that first you put the Local FS on the drive, just like any installation of Linux. Then you put the distributed FS on top of this. Some DFS might not require a LFS, but FhGFS does and does HDFS. I will not acutally be able to accurately answer your questions, until I start to build up the 3 system cluster. (a week or 2 away) is my best guess. Otherwise it seems more like a network filesystem with caching options (See AFS). OK, I'll add AFS. You may be correct on this one or AFS might be both. I am also interested in these filesystems, but for a slightly different scenario: Ok, so I the test-dummy-crash-victim I'd be honored to have, you, Alan, Neil, Mic etc etc back-seat-0drive on this adventure! (The more I read the more it's time for burbon, bash, and a bit of cursing to get started...) - 2 servers in remote locations (different offices) - 1 of these has all the files stored (server A) at the main office - The other (server B - remote office) needs to offer all files from serverA When server B needs to supply a file, it needs to check if the local copy is still the valid version. If yes, supply the local copy, otherwise download from server A. When a file is changed, server A needs to be updated. While server B is sharing a file, the file needs to be locked on server A preventing simultaneous updates. OOch, file locking (precious tells me that is alway tricky). (pist, systemd is causing fits for the clustering geniuses; some are espousing a variety of cgroup gymnastics for phantom kills) Spark is fault tolerant, regardless of node/memory/drive failures above the fault tolerance that a file system configuration many support. If fact, files lost can be 'regenerated' but it is computationally expensive. You have to get your file system(s) set up. Then install mesos-0.20.0 and then spark. I have mesos mostly ready. I should have spark in alpha-beta this weekend. I'm fairly clueless on the DFS/LFS issue, so a DFS that needs no LFS might be a good first choice for testing the (3) system cluster. I prefer not to supply the same amount of storage at server B as server A has. The remote location generally only needs access to 5% of the total amount of files stored on server A. But not always the same 5%. Does anyone know of a filesystem that can handle this? So in clustering, from what I have read, there are all kinds of files passed around between the nodes and the master(s). Many are critical files not part of the application or scientific calculations. So in time, I think in a clustering evironment, all you seek is very possible, but it's a hunch, gut feeling, not fact. I'd put raid mirros underdneath that system, if it makes sense, for now, or just dd the stuff with a script of something kludgy (Alan is the king of kludge) On gentoo planet one of the devs has Consul in his overlays. Read up on that for ideas that may be relevant to what you need. Joost James
Re: [gentoo-user] Re: File system testing
On Wednesday, September 17, 2014 03:55:56 PM James wrote: J. Roeleveld joost at antarean.org writes: Distributed File Systems (DFS): Local (Device) File Systems LFS: Is my understanding correct that the top list all require one of the bottom list? Eg. the clustering FSs only ensure the files on the LFSs are duplicated/spread over the various nodes? I would normally expect the clustering FS to be either the full layer or a clustered block-device where an FS can be placed on top. I have not performed these installation yet. My research indicates that first you put the Local FS on the drive, just like any installation of Linux. Then you put the distributed FS on top of this. Some DFS might not require a LFS, but FhGFS does and does HDFS. I will not acutally be able to accurately answer your questions, until I start to build up the 3 system cluster. (a week or 2 away) is my best guess. Playing around with clusters is on my list, but due to other activities having a higher priority, I haven't had much time yet. Otherwise it seems more like a network filesystem with caching options (See AFS). OK, I'll add AFS. You may be correct on this one or AFS might be both. Personally, I would read up on these and see how they work. Then, based on that, decide if they are likely to assist in the specific situation you are interested in. AFS, NFS, CIFS,... can be used for clusters, but, apart from NFS, I wouldn't expect much performance out of them. If you need it to be fault-tolerant and not overly rely on a single point of failure, I wouldn't be using any of these. Only AFS, from my original investigation, showed some fault-tolerence, but needed too many resources (disk-space) on the clients. I am also interested in these filesystems, but for a slightly different scenario: Ok, so I the test-dummy-crash-victim I'd be honored to have, you, Alan, Neil, Mic etc etc back-seat-0drive on this adventure! (The more I read the more it's time for burbon, bash, and a bit of cursing to get started...) Good luck and even though I'd love to join in with the testing, I simply do not have the time to keep up. I would probably just slow you down. - 2 servers in remote locations (different offices) - 1 of these has all the files stored (server A) at the main office - The other (server B - remote office) needs to offer all files from serverA When server B needs to supply a file, it needs to check if the local copy is still the valid version. If yes, supply the local copy, otherwise download from server A. When a file is changed, server A needs to be updated. While server B is sharing a file, the file needs to be locked on server A preventing simultaneous updates. OOch, file locking (precious tells me that is alway tricky). I need it to be locked on server A while server B has a proper write-lock to avoid 2 modifications to compete with each other. (pist, systemd is causing fits for the clustering geniuses; some are espousing a variety of cgroup gymnastics for phantom kills) phantom kills? Spark is fault tolerant, regardless of node/memory/drive failures above the fault tolerance that a file system configuration many support. If fact, files lost can be 'regenerated' but it is computationally expensive. Too much for me. You have to get your file system(s) set up. Then install mesos-0.20.0 and then spark. I have mesos mostly ready. I should have spark in alpha-beta this weekend. I'm fairly clueless on the DFS/LFS issue, so a DFS that needs no LFS might be a good first choice for testing the (3) system cluster. That, or a 4th node acting like a NAS sharing the filesystem over NFS. I prefer not to supply the same amount of storage at server B as server A has. The remote location generally only needs access to 5% of the total amount of files stored on server A. But not always the same 5%. Does anyone know of a filesystem that can handle this? So in clustering, from what I have read, there are all kinds of files passed around between the nodes and the master(s). Many are critical files not part of the application or scientific calculations. So in time, I think in a clustering evironment, all you seek is very possible, but it's a hunch, gut feeling, not fact. I'd put raid mirros underdneath that system, if it makes sense, for now, or just dd the stuff with a script of something kludgy (Alan is the king of kludge) Hmm... mirroring between servers. Always an option, except it will not work for me in this case: 1) Remote location will have a domestic ADSL line. I'll be lucky if it has a 500kbps uplink 2) Server A, currently, has around 7TB of current data that also needs to be available on the remote site. With a 8mbps downlink, waiting for a file to be copied to the remote site is acceptable. After modifications, the new version can be copied back to serverA slowly during network-idle-time or when server A
Re: [gentoo-user] Re: File system testing
As far as HDFS goes, I would only set that up if you will use it for Hadoop or related tools. It's highly specific, and the performance is not good unless you're doing a massively parallel read (what it was designed for). I can elaborate why if anyone is actually interested. We use Lustre for our high performance general storage. I don't have any numbers, but I'm pretty sure it is *really* fast (10Gbit/s over IB sounds familiar, but don't quote me on that). Personally, I would read up on these and see how they work. Then, based on that, decide if they are likely to assist in the specific situation you are interested in. Always good advice. Alec
[gentoo-user] Re: File system testing
Alec Ten Harmsel alec at alectenharmsel.com writes: As far as HDFS goes, I would only set that up if you will use it for Hadoop or related tools. It's highly specific, and the performance is not good unless you're doing a massively parallel read (what it was designed for). I can elaborate why if anyone is actually interested. Acutally, from my research and my goal (one really big scientific simulation running constantly). Many folks are recommending to skip Hadoop/HDFS all together and go straight to mesos/spark. RDD (in-memory) cluster calculations are at the heart of my needs. The opposite end of the spectrum, loads of small files and small apps; I dunno about, but, I'm all ears. In the end, my (3) node scientific cluster will morph and support the typical myriad of networked applications, but I can take a few years to figure that out, or just copy what smart guys like you and joost do. We use Lustre for our high performance general storage. I don't have any numbers, but I'm pretty sure it is *really* fast (10Gbit/s over IB sounds familiar, but don't quote me on that). AT Umich, you guys should test the FhGFS/btrfs combo. The folks at UCI swear about it, although they are only publishing a wee bit. (you know, water cooler gossip).. Surely the Wolverines do not want those californians getting up on them? Are you guys planning a mesos/spark test? Personally, I would read up on these and see how they work. Then, based on that, decide if they are likely to assist in the specific situation you are interested in. It's a ton of reading. It's not apples-to-apple_cider type of reading. My head hurts. I'm leaning to DFS/LFS (2) Luster/btrfs and FhGFS/btrfs Thoughts/comments? James
[gentoo-user] Re: File system testing
J. Roeleveld joost at antarean.org writes: AFS has caching and can survive temporary disappearance of the server. Excellent for low bandwidth connections. Most DFS have mechanisms to deal with transient failures, but not as generaous on the time-scale as AFS. I believe, if I recall correctly, these hi-latency, low bandwith recovery mechanism keen design paramters, at least bake in the CMU develop cycples, for AFS? While attractive for your situation, these features might actually be detrimental to a hi_performance distributed cluster's needs for a DFS? For me, I need to be able to provide Samba filesharing on top of that layer on 2 different locations as I don't see the network bandwidth to be sufficient for normal operations. (ADSL uplinks tend to be dead slow) Yea, I'm not going to be testing OpenAFS for my needs, unless I read some compelling publish data on it's applicability to high end clusters best choice as a DFS. It's probably great for SETI etc etc. James