Re: [gentoo-user] crontab - and' condition
On 19/09/2014 06:21, Joseph wrote: On 09/18/14 19:14, Alan McKinnon wrote: On 18/09/2014 18:44, Joseph wrote: I want to run a cron job only once a month. The problem is the computer is only on on weekdays Mon-Fri. 1-5 cron tab as this below is an or condition as it has entries in Days of the Months and Day of the Week 5 18 1 * 2 rsync -av ... so it will run on days 1 or Tuesday of each months. Is it possible to create and condition, eg. run it on Tuesday between days 1 to 7; depend on which day Tuesday falls on? Not in one line. Split it into two crontab entries. Interesting. How do you split cron job? I couldn't find any examples. No wait, that won't work. What you want to accomplish cannot be done with a single crontab job. Use periodic/monthly like the other poster said or use anacron so the job will run when the machine is next powered on. -- Alan McKinnon alan.mckin...@gmail.com
Re: [gentoo-user] crontab - and' condition
Am 18.09.2014 um 18:44 schrieb Joseph: I want to run a cron job only once a month. The problem is the computer is only on on weekdays Mon-Fri. 1-5 cron tab as this below is an or condition as it has entries in Days of the Months and Day of the Week 5 18 1 * 2 rsync -av ... so it will run on days 1 or Tuesday of each months. Is it possible to create and condition, eg. run it on Tuesday between days 1 to 7; depend on which day Tuesday falls on? You can run it every Tuesday and check for day of month externally: 5 18 * * 2 test $(date +%d) -le 7 rsync -av ... or run it on 5 18 1-7 * * and test for Tuesdays, but the former gives less useless invocations. ~frukto
[gentoo-user] Re: File system testing
J. Roeleveld joost at antarean.org writes: Out of curiosity, what do you want to simulate? subsurface flows in porous medium. AKA carbon sequestration by injection wells. You know, provide proof that those that remove hydrocarbons and actuall put the CO2 back and significantly mitigate the effects of their ventures. It's like this. I have been stuggling with my 17 year old genius son who is a year away from entering medical school, with learning responsibility. So I got him a hyperactive, highly intelligent (mix-doberman) puppy to nurture, raise, train, love and be resonsible for. It's one genious pup, teaching another pup about being responsible. So goes the earl_bidness...imho. Many folks are recommending to skip Hadoop/HDFS all together I agree, Hadoop/HDFS is for data analysis. Like building a profile about people based on the information companies like Facebook, Google, NSA, Walmart, Governments, Banks, collect about their customers/users/citizens/slaves/ and go straight to mesos/spark. RDD (in-memory) cluster calculations are at the heart of my needs. The opposite end of the spectrum, loads of small files and small apps; I dunno about, but, I'm all ears. In the end, my (3) node scientific cluster will morph and support the typical myriad of networked applications, but I can take a few years to figure that out, or just copy what smart guys like you and joost do. Nope, I'm simply following what you do and provide suggestions where I can. Most of the clusters and distributed computing stuff I do is based on adding machines to distribute the load. But the mechanisms for these are implemented in the applications I work with, not what I design underneath. The filesystems I am interested in are different to the ones you want. Maybe. I do not know what I want yet. My vision is very light weight workstations running lxqt (small memory footprint) or such, and a bad_arse cluster for the heavy lifting running on whatever heterogenous resoruces I have. From what I've read, the cluster and the file systems are all redundant that the cluster level (mesos/spark anyway) regardless of one any give processor/system is doing. All of Alans fantasies (needs) can be realized once the cluster stuff is master. (chronos, ansible etc etc). I need to provided access to software installation files to a VM server and access to documentation which is created by the users. The VM server is physically next to what I already mentioned as server A. Access to the VM from the remote site will be using remote desktop connections. But to allow faster and easier access to the documentation, I need a server B at the remote site which functions as described. AFS might be suitable, but I need to be able to layer Samba on top of that to allow a seamless operation. I don't want the laptops to have their own cache and then having to figure out how to solve the multiple different changes to documents containing layouts. (MS Word and OpenDocument files). Ok so your customers (hperactive problem users) inteface to your cluster to do their work. When finished you write things out to other servers with all of the VM servers. Lots of really cool tools are emerging in the cluster space. I think these folks have mesos + spark + samba + nfs all in one box. [1] Build rather than purchase? WE have to figure out what you and Alan need, on a cluster, because it is what most folks need/want. It the admin_advantage part of cluster. (There also the Big Science (me) and Web centric needs. Right now they are realted project, but things will coalesce, imho. There is even Spark_sql for postgres admins [2]. [1] http://www.quantaqct.com/en/01_product/02_detail.php?mid=29sid=162id=163qs=102 [2] https://spark.apache.org/sql/ We use Lustre for our high performance general storage. I don't have any numbers, but I'm pretty sure it is *really* fast (10Gbit/s over IB sounds familiar, but don't quote me on that). AT Umich, you guys should test the FhGFS/btrfs combo. The folks at UCI swear about it, although they are only publishing a wee bit. (you know, water cooler gossip).. Surely the Wolverines do not want those californians getting up on them? Are you guys planning a mesos/spark test? Personally, I would read up on these and see how they work. Then, based on that, decide if they are likely to assist in the specific situation you are interested in. It's a ton of reading. It's not apples-to-apple_cider type of reading. My head hurts. Take a walk outside. Clear air should help you with the headaches :P Basketball, Boobs and Burbon use to work quite well. Now it's mostly basketball, but I'm working on someone very cute.. I'm leaning to DFS/LFS (2) Luster/btrfs and FhGFS/btrfs I have insufficient knowledge to advise on either of these. One question, why BTRFS instead of ZFS? I think btrfs has
Re: [gentoo-user] Re: File system testing
On Fri, Sep 19, 2014 at 9:41 AM, James wirel...@tampabay.rr.com wrote: I think btrfs has tremendous potential. I tried ZFS a few times, but the installs are not part of gentoo, so they got borked uEFI, grubs to uuids, etc etc also were in the mix. That was almost a year ago. For what ever reason the clustering folks I have read and communicated with are using ext4, xfs and btrfs. Prolly mostly because those are mostly used in their (systemd) inspired) distros? I do think that btrfs in the long-term is more likely to be mainstream on linux, but I wouldn't be surprised if getting zfs working on Gentoo is much easier now. Richard Yao is both a Gentoo dev and significant zfs on linux contributor, so I suspect he is doing much of the latter on the former. Yep. the license issue with ZFS is a real killer for me. Besides, as an old state-machine, C hack, anything with B-tree is fabulous. Prejudices? Yep, but here, I'm sticking with my gut. Multi port ram can do mavelous things with Btree data structures. The rest will become available/stable. Simply, I just trust btrfs, in my gut. I don't know enough about zfs to compare them, but the design of btrfs has a certain amount of beauty/symmetry/etc to it IMHO. I only have studied it enough to be dangerous and give some intro talks to my LUG, but just about everything is stored in b-trees, the design allows both fixed and non-fixed length nodes within the trees, and just about everything about the filesystem is dynamic other than the superblocks, which do little more than ID the filesystem and point to the current tree roots. The important stuff is all replicated and versioned. I wouldn't be surprised if it shared many of these design features with other modern filesystems, and I do not profess to be an expert on modern filesystem design, so I won't make any claims about btrfs being better/worse than other filesystems in this regard. However, I would say that anybody interested in data structures would do well to study it. -- Rich
Re: [gentoo-user] Re: File system testing
On Friday, September 19, 2014 01:41:26 PM James wrote: J. Roeleveld joost at antarean.org writes: Out of curiosity, what do you want to simulate? subsurface flows in porous medium. AKA carbon sequestration by injection wells. You know, provide proof that those that remove hydrocarbons and actuall put the CO2 back and significantly mitigate the effects of their ventures. Interesting topic. Can't provide advice on that topic. It's like this. I have been stuggling with my 17 year old genius son who is a year away from entering medical school, with learning responsibility. So I got him a hyperactive, highly intelligent (mix-doberman) puppy to nurture, raise, train, love and be resonsible for. It's one genious pup, teaching another pup about being responsible. Overactive kids, always fun. I try to keep mine busy without computers and TVs for now. (She's going to be 3 in November) So goes the earl_bidness...imho. Many folks are recommending to skip Hadoop/HDFS all together I agree, Hadoop/HDFS is for data analysis. Like building a profile about people based on the information companies like Facebook, Google, NSA, Walmart, Governments, Banks, collect about their customers/users/citizens/slaves/ and go straight to mesos/spark. RDD (in-memory) cluster calculations are at the heart of my needs. The opposite end of the spectrum, loads of small files and small apps; I dunno about, but, I'm all ears. In the end, my (3) node scientific cluster will morph and support the typical myriad of networked applications, but I can take a few years to figure that out, or just copy what smart guys like you and joost do. Nope, I'm simply following what you do and provide suggestions where I can. Most of the clusters and distributed computing stuff I do is based on adding machines to distribute the load. But the mechanisms for these are implemented in the applications I work with, not what I design underneath. The filesystems I am interested in are different to the ones you want. Maybe. I do not know what I want yet. My vision is very light weight workstations running lxqt (small memory footprint) or such, and a bad_arse cluster for the heavy lifting running on whatever heterogenous resoruces I have. From what I've read, the cluster and the file systems are all redundant that the cluster level (mesos/spark anyway) regardless of one any give processor/system is doing. All of Alans fantasies (needs) can be realized once the cluster stuff is master. (chronos, ansible etc etc). Alan = your son? or? I would, from the workstation point of view, keep the cluster as a single entity, to keep things easier. A cluster FS for workstation/desktop use is generally not suitable for a High Performance Cluster (HPC) (or vice-versa) I need to provided access to software installation files to a VM server and access to documentation which is created by the users. The VM server is physically next to what I already mentioned as server A. Access to the VM from the remote site will be using remote desktop connections. But to allow faster and easier access to the documentation, I need a server B at the remote site which functions as described. AFS might be suitable, but I need to be able to layer Samba on top of that to allow a seamless operation. I don't want the laptops to have their own cache and then having to figure out how to solve the multiple different changes to documents containing layouts. (MS Word and OpenDocument files). Ok so your customers (hperactive problem users) inteface to your cluster to do their work. When finished you write things out to other servers with all of the VM servers. Lots of really cool tools are emerging in the cluster space. Actually, slightly different scenario. Most work is done at customers systems. Occasionally we need to test software versions prior to implementing these at customers. For that, we use VMs. The VM-server we have is currently sufficient for this. When it isn't, we'll need to add a 2nd VMserver. On the NAS, we store: - Documentation about customers + Howto documents on how to best install the software. - Installation files downloaded from vendors (We also deal with older versions that are no longer available. We need to have our own collection to handle that) As we are looking into also working from a different location, we need: - Access to the VM-server (easy, using VPN and Remote Desktops) - Access to the files (I prefer to have a local 'cache' at the remote location) It's the access to files part where I need to have some sort of distributed filesystem. I think these folks have mesos + spark + samba + nfs all in one box. [1] [1] http://www.quantaqct.com/en/01_product/02_detail.php?mid=29sid=162id=163q s=102 Had a quick look, these use MS Windows Storage 2012, this is only failover on the storage side. I don't see anything related to
Re: [gentoo-user] Re: File system testing
On Friday, September 19, 2014 10:56:59 AM Rich Freeman wrote: On Fri, Sep 19, 2014 at 9:41 AM, James wirel...@tampabay.rr.com wrote: I think btrfs has tremendous potential. I tried ZFS a few times, but the installs are not part of gentoo, so they got borked uEFI, grubs to uuids, etc etc also were in the mix. That was almost a year ago. For what ever reason the clustering folks I have read and communicated with are using ext4, xfs and btrfs. Prolly mostly because those are mostly used in their (systemd) inspired) distros? I do think that btrfs in the long-term is more likely to be mainstream on linux, but I wouldn't be surprised if getting zfs working on Gentoo is much easier now. Richard Yao is both a Gentoo dev and significant zfs on linux contributor, so I suspect he is doing much of the latter on the former. Don't have the link handy, but there is an howto about it that, when followed, will give a ZFS pool running on Gentoo in a very short time. (emerge zfs is the longest part of the whole thing) Not even needed to reboot. Yep. the license issue with ZFS is a real killer for me. Besides, as an old state-machine, C hack, anything with B-tree is fabulous. Prejudices? Yep, but here, I'm sticking with my gut. Multi port ram can do mavelous things with Btree data structures. The rest will become available/stable. Simply, I just trust btrfs, in my gut. I don't know enough about zfs to compare them, but the design of btrfs has a certain amount of beauty/symmetry/etc to it IMHO. I only have studied it enough to be dangerous and give some intro talks to my LUG, but just about everything is stored in b-trees, the design allows both fixed and non-fixed length nodes within the trees, and just about everything about the filesystem is dynamic other than the superblocks, which do little more than ID the filesystem and point to the current tree roots. The important stuff is all replicated and versioned. I wouldn't be surprised if it shared many of these design features with other modern filesystems, and I do not profess to be an expert on modern filesystem design, so I won't make any claims about btrfs being better/worse than other filesystems in this regard. However, I would say that anybody interested in data structures would do well to study it. I like the idea of both and hope BTRFS will also come with the raid-6-like features and good support for larger drive counts (I've got 16 available for the filestorage) to make it, for me, a viable alternative to ZFS. -- Joost
Re: [gentoo-user] Re: File system testing
On 18/09/2014 14:12, Alec Ten Harmsel wrote: On 09/18/2014 05:17 AM, Kerin Millar wrote: On 17/09/2014 21:20, Alec Ten Harmsel wrote: As far as HDFS goes, I would only set that up if you will use it for Hadoop or related tools. It's highly specific, and the performance is not good unless you're doing a massively parallel read (what it was designed for). I can elaborate why if anyone is actually interested. I, for one, am very interested. --Kerin Alright, here goes: Rich Freeman wrote: FYI - one very big limitation of hdfs is its minimum filesize is something huge like 1MB or something like that. Hadoop was designed to take a REALLY big input file and chunk it up. If you use hdfs to store something like /usr/portage it will turn into the sort of monstrosity that you'd actually need a cluster to store. This is exactly correct, except we run with a block size of 128MB, and a large cluster will typically have a block size of 256MB or even 512MB. HDFS has two main components: a NameNode, which keeps track of which blocks are a part of which file (in memory), and the DataNodes that actually store the blocks. No data ever flows through the NameNode; it negotiates transfers between the client and DataNodes and negotiates transfers for jobs. Since the NameNode stores metadata in-memory, small files are bad because RAM gets wasted. What exactly is Hadoop/HDFS used for? The most common uses are generating search indices on data (which is a batch job) and doing non-realtime processing of log streams and/or data streams (another batch job) and allowing a large number of analysts run disparate queries on the same large dataset (another batch job). Batch processing - processing the entire dataset - is really where Hadoop shines. When you put a file into HDFS, it gets split based on the block size. This is done so that a parallel read will be really fast - each map task reads in a single block and processes it. Ergo, if you put in a 1GB file with a 128MB block size and run a MapReduce job, 8 map tasks will be launched. If you put in a 1TB file, 8192 tasks would be launched. Tuning the block size is important to optimize the overhead of launching tasks vs. potentially under-utilizing a cluster. Typically, a cluster with a lot of data has a bigger block size. The downsides of HDFS: * Seeked reads are not supported afaik because no one needs that for batch processing * Seeked writes into an existing file are not supported because either blocks would be added in the middle of a file and wouldn't be 128MB, or existing blocks would be edited, resulting in blocks larger than 128MB. Both of these scenarios are bad. Since HDFS users typically do not need seeked reads or seeked writes, these downsides aren't really a big deal. If something's not clear, let me know. Thank you for taking the time to explain. --Kerin