[gentoo-user] Re: File system testing

2014-09-19 Thread James
J. Roeleveld joost at antarean.org writes:


 Out of curiosity, what do you want to simulate?

subsurface flows in porous medium. AKA carbon sequestration
by injection wells. You know, provide proof that those
that remove hydrocarbons and actuall put the CO2 back
and significantly mitigate the effects of their ventures.

It's like this. I have been stuggling with my 17 year old genius
son who is a year away from entering medical school, with
learning responsibility. So I got him a hyperactive, highly
intelligent (mix-doberman) puppy to nurture, raise, train, love
and be resonsible for. It's one genious pup, teaching another
pup about being responsible.

So goes the earl_bidness...imho.



 
  Many folks are recommending to skip Hadoop/HDFS all  together

 I agree, Hadoop/HDFS is for data analysis. Like building a profile 
 about people based on the information companies like Facebook,  
 Google, NSA, Walmart, Governments, Banks, collect about their 
 customers/users/citizens/slaves/

  and go straight to mesos/spark. RDD (in-memory)  cluster
  calculations are at the heart of my needs. The opposite end of the
  spectrum, loads of small files and small apps; I dunno about, but, I'm all
  ears.
  In the end, my (3) node scientific cluster will morph and support
  the typical myriad  of networked applications, but I can take
  a few years to figure that out, or just copy what smart guys like
  you and joost do.
  
 Nope, I'm simply following what you do and provide suggestions where I can.
 Most of the clusters and distributed computing stuff I do is based on 
 adding machines to distribute the load. But the mechanisms for these are 
implemented in the applications I work with, not what I design underneath.

 The filesystems I am interested in are different to the ones you want.

Maybe. I do not know what I want yet. My vision is very light weight 
workstations running lxqt (small memory footprint) or such, and a bad_arse
cluster for the heavy lifting running on whatever heterogenous resoruces I
have. From what I've read, the cluster and the file systems are all
redundant that the cluster level (mesos/spark anyway) regardless of one any
give processor/system is doing. All of Alans fantasies (needs) can be
realized once the cluster stuff is master. (chronos, ansible etc etc).

 I need to provided access to software installation files to a VM server 
 and access to documentation which is created by the users. The 
 VM server is physically next to what I already mentioned as server A.  
 Access to the VM from the remote site will be using remote desktop   
 connections.  But to allow faster and easier access to the 
 documentation, I need a server B at the remote site which functions as 
 described.  AFS might be suitable, but I need to be able to layer Samba 
 on top of that to allow a seamless operation.
 I don't want the laptops to have their own cache and then having to 
 figure out how to solve the multiple different changes to documents 
 containing layouts. (MS Word and OpenDocument files).

Ok so your customers (hperactive problem users) inteface to your cluster
to do their work. When finished you write things out to other servers
with all of the VM servers. Lots of really cool tools are emerging
in the cluster space.

I think these folks have mesos + spark + samba + nfs all in one box. [1]
Build rather than purchase? WE have to figure out what you and Alan need, on
a cluster, because it is what most folks need/want. It the admin_advantage
part of cluster. (There also the Big Science (me) and Web centric needs.
Right now they are realted project, but things will coalesce, imho. There is
even Spark_sql for postgres admins [2].

[1]
http://www.quantaqct.com/en/01_product/02_detail.php?mid=29sid=162id=163qs=102

[2] https://spark.apache.org/sql/


   We use Lustre for our high performance general storage. I don't 
   have any numbers, but I'm pretty sure it is *really* fast (10Gbit/s 
   over IB sounds familiar, but don't quote me on that).
  
  AT Umich, you guys should test the FhGFS/btrfs combo. The folks
  at UCI swear about it, although they are only publishing a wee bit.
  (you know, water cooler gossip).. Surely the Wolverines do not
  want those californians getting up on them?

  Are you guys planning a mesos/spark test?

Personally, I would read up on these and see how they work. Then,
based on that, decide if they are likely to assist in the specific
situation you are interested in.

  It's a ton of reading. It's not apples-to-apple_cider type of reading.
  My head hurts.

 Take a walk outside. Clear air should help you with the headaches :P

Basketball, Boobs and Burbon use to work quite well. Now it's mostly
basketball, but I'm working on someone very cute..

  I'm leaning to  DFS/LFS
  (2)  Luster/btrfs  and FhGFS/btrfs

 I have insufficient knowledge to advise on either of these.
 One question, why BTRFS instead of ZFS?

I think btrfs has 

Re: [gentoo-user] Re: File system testing

2014-09-19 Thread Rich Freeman
On Fri, Sep 19, 2014 at 9:41 AM, James wirel...@tampabay.rr.com wrote:

 I think btrfs has tremendous potential. I tried ZFS a few times,
 but the installs are not part of gentoo, so they got borked
 uEFI, grubs to uuids, etc etc also were in the mix. That was almost
 a year ago. For what ever reason the clustering folks I have
 read and communicated with are using ext4, xfs and btrfs. Prolly
 mostly because those are mostly used in their (systemd) inspired)
 distros?

I do think that btrfs in the long-term is more likely to be mainstream
on linux, but I wouldn't be surprised if getting zfs working on Gentoo
is much easier now.  Richard Yao is both a Gentoo dev and significant
zfs on linux contributor, so I suspect he is doing much of the latter
on the former.


 Yep. the license issue with ZFS is a real killer for me. Besides,
 as an old state-machine, C hack, anything with B-tree is fabulous.
 Prejudices? Yep, but here, I'm sticking with my gut. Multi port
 ram can do mavelous things with Btree data structures. The
 rest will become available/stable. Simply, I just trust btrfs, in
 my gut.

I don't know enough about zfs to compare them, but the design of btrfs
has a certain amount of beauty/symmetry/etc to it IMHO.  I only have
studied it enough to be dangerous and give some intro talks to my LUG,
but just about everything is stored in b-trees, the design allows both
fixed and non-fixed length nodes within the trees, and just about
everything about the filesystem is dynamic other than the superblocks,
which do little more than ID the filesystem and point to the current
tree roots.  The important stuff is all replicated and versioned.

I wouldn't be surprised if it shared many of these design features
with other modern filesystems, and I do not profess to be an expert on
modern filesystem design, so I won't make any claims about btrfs being
better/worse than other filesystems in this regard.  However, I would
say that anybody interested in data structures would do well to study
it.

--
Rich



Re: [gentoo-user] Re: File system testing

2014-09-19 Thread J. Roeleveld

On Friday, September 19, 2014 01:41:26 PM James wrote:
 J. Roeleveld joost at antarean.org writes:
  Out of curiosity, what do you want to simulate?
 
 subsurface flows in porous medium. AKA carbon sequestration
 by injection wells. You know, provide proof that those
 that remove hydrocarbons and actuall put the CO2 back
 and significantly mitigate the effects of their ventures.

Interesting topic. Can't provide advice on that topic.

 It's like this. I have been stuggling with my 17 year old genius
 son who is a year away from entering medical school, with
 learning responsibility. So I got him a hyperactive, highly
 intelligent (mix-doberman) puppy to nurture, raise, train, love
 and be resonsible for. It's one genious pup, teaching another
 pup about being responsible.

Overactive kids, always fun.
I try to keep mine busy without computers and TVs for now. (She's going to be 
3 in November)

 So goes the earl_bidness...imho.
 
   Many folks are recommending to skip Hadoop/HDFS all  together
  
  I agree, Hadoop/HDFS is for data analysis. Like building a profile
  about people based on the information companies like Facebook,
  Google, NSA, Walmart, Governments, Banks, collect about their
  customers/users/citizens/slaves/
  
   and go straight to mesos/spark. RDD (in-memory)  cluster
   calculations are at the heart of my needs. The opposite end of the
   spectrum, loads of small files and small apps; I dunno about, but, I'm
   all
   ears.
   In the end, my (3) node scientific cluster will morph and support
   the typical myriad  of networked applications, but I can take
   a few years to figure that out, or just copy what smart guys like
   you and joost do.
  
   
  Nope, I'm simply following what you do and provide suggestions where I
  can.
  Most of the clusters and distributed computing stuff I do is based on
  adding machines to distribute the load. But the mechanisms for these are 
  implemented in the applications I work with, not what I design underneath.
  The filesystems I am interested in are different to the ones you want.
 
 Maybe. I do not know what I want yet. My vision is very light weight
 workstations running lxqt (small memory footprint) or such, and a bad_arse
 cluster for the heavy lifting running on whatever heterogenous resoruces I
 have. From what I've read, the cluster and the file systems are all
 redundant that the cluster level (mesos/spark anyway) regardless of one any
 give processor/system is doing. All of Alans fantasies (needs) can be
 realized once the cluster stuff is master. (chronos, ansible etc etc).

Alan = your son? or?
I would, from the workstation point of view, keep the cluster as a single 
entity, to keep things easier.
A cluster FS for workstation/desktop use is generally not suitable for a High 
Performance Cluster (HPC) (or vice-versa)

  I need to provided access to software installation files to a VM server
  and access to documentation which is created by the users. The
  VM server is physically next to what I already mentioned as server A.
  Access to the VM from the remote site will be using remote desktop
  connections.  But to allow faster and easier access to the
  documentation, I need a server B at the remote site which functions as
  described.  AFS might be suitable, but I need to be able to layer Samba
  on top of that to allow a seamless operation.
  I don't want the laptops to have their own cache and then having to
  figure out how to solve the multiple different changes to documents
  containing layouts. (MS Word and OpenDocument files).
 
 Ok so your customers (hperactive problem users) inteface to your cluster
 to do their work. When finished you write things out to other servers
 with all of the VM servers. Lots of really cool tools are emerging
 in the cluster space.

Actually, slightly different scenario.
Most work is done at customers systems. Occasionally we need to test software 
versions prior to implementing these at customers. For that, we use VMs.

The VM-server we have is currently sufficient for this. When it isn't, we'll 
need to add a 2nd VMserver.

On the NAS, we store:
- Documentation about customers + Howto documents on how to best install the 
software.
- Installation files downloaded from vendors (We also deal with older versions 
that are no longer available. We need to have our own collection to handle 
that)

As we are looking into also working from a different location, we need:
- Access to the VM-server (easy, using VPN and Remote Desktops)
- Access to the files (I prefer to have a local 'cache' at the remote location)

It's the access to files part where I need to have some sort of distributed 
filesystem.

 I think these folks have mesos + spark + samba + nfs all in one box. [1]
 [1]
 http://www.quantaqct.com/en/01_product/02_detail.php?mid=29sid=162id=163q
 s=102

Had a quick look, these use MS Windows Storage 2012, this is only failover on 
the storage side. I don't see anything related to 

Re: [gentoo-user] Re: File system testing

2014-09-19 Thread J. Roeleveld

On Friday, September 19, 2014 10:56:59 AM Rich Freeman wrote:
 On Fri, Sep 19, 2014 at 9:41 AM, James wirel...@tampabay.rr.com wrote:
  I think btrfs has tremendous potential. I tried ZFS a few times,
  but the installs are not part of gentoo, so they got borked
  uEFI, grubs to uuids, etc etc also were in the mix. That was almost
  a year ago. For what ever reason the clustering folks I have
  read and communicated with are using ext4, xfs and btrfs. Prolly
  mostly because those are mostly used in their (systemd) inspired)
  distros?
 
 I do think that btrfs in the long-term is more likely to be mainstream
 on linux, but I wouldn't be surprised if getting zfs working on Gentoo
 is much easier now.  Richard Yao is both a Gentoo dev and significant
 zfs on linux contributor, so I suspect he is doing much of the latter
 on the former.

Don't have the link handy, but there is an howto about it that, when followed, 
will give a ZFS pool running on Gentoo in a very short time. (emerge zfs is 
the longest part of the whole thing)
Not even needed to reboot.

  Yep. the license issue with ZFS is a real killer for me. Besides,
  as an old state-machine, C hack, anything with B-tree is fabulous.
  Prejudices? Yep, but here, I'm sticking with my gut. Multi port
  ram can do mavelous things with Btree data structures. The
  rest will become available/stable. Simply, I just trust btrfs, in
  my gut.
 
 I don't know enough about zfs to compare them, but the design of btrfs
 has a certain amount of beauty/symmetry/etc to it IMHO.  I only have
 studied it enough to be dangerous and give some intro talks to my LUG,
 but just about everything is stored in b-trees, the design allows both
 fixed and non-fixed length nodes within the trees, and just about
 everything about the filesystem is dynamic other than the superblocks,
 which do little more than ID the filesystem and point to the current
 tree roots.  The important stuff is all replicated and versioned.
 
 I wouldn't be surprised if it shared many of these design features
 with other modern filesystems, and I do not profess to be an expert on
 modern filesystem design, so I won't make any claims about btrfs being
 better/worse than other filesystems in this regard.  However, I would
 say that anybody interested in data structures would do well to study
 it.

I like the idea of both and hope BTRFS will also come with the raid-6-like 
features and good support for larger drive counts (I've got 16 available for 
the filestorage) to make it, for me, a viable alternative to ZFS.

--
Joost



Re: [gentoo-user] Re: File system testing

2014-09-19 Thread Kerin Millar

On 18/09/2014 14:12, Alec Ten Harmsel wrote:


On 09/18/2014 05:17 AM, Kerin Millar wrote:

On 17/09/2014 21:20, Alec Ten Harmsel wrote:

As far as HDFS goes, I would only set that up if you will use it for
Hadoop or related tools. It's highly specific, and the performance is
not good unless you're doing a massively parallel read (what it was
designed for). I can elaborate why if anyone is actually interested.


I, for one, am very interested.

--Kerin



Alright, here goes:

Rich Freeman wrote:


FYI - one very big limitation of hdfs is its minimum filesize is
something huge like 1MB or something like that.  Hadoop was designed
to take a REALLY big input file and chunk it up.  If you use hdfs to
store something like /usr/portage it will turn into the sort of
monstrosity that you'd actually need a cluster to store.


This is exactly correct, except we run with a block size of 128MB, and a large 
cluster will typically have a block size of 256MB or even 512MB.

HDFS has two main components: a NameNode, which keeps track of which blocks are 
a part of which file (in memory), and the DataNodes that actually store the 
blocks. No data ever flows through the NameNode; it negotiates transfers 
between the client and DataNodes and negotiates transfers for jobs. Since the 
NameNode stores metadata in-memory, small files are bad because RAM gets wasted.

What exactly is Hadoop/HDFS used for? The most common uses are generating 
search indices on data (which is a batch job) and doing non-realtime processing 
of log streams and/or data streams (another batch job) and allowing a large 
number of analysts run disparate queries on the same large dataset (another 
batch job). Batch processing - processing the entire dataset - is really where 
Hadoop shines.

When you put a file into HDFS, it gets split based on the block size. This is 
done so that a parallel read will be really fast - each map task reads in a 
single block and processes it. Ergo, if you put in a 1GB file with a 128MB 
block size and run a MapReduce job, 8 map tasks will be launched. If you put in 
a 1TB file, 8192 tasks would be launched. Tuning the block size is important to 
optimize the overhead of launching tasks vs. potentially under-utilizing a 
cluster. Typically, a cluster with a lot of data has a bigger block size.

The downsides of HDFS:
* Seeked reads are not supported afaik because no one needs that for batch 
processing
* Seeked writes into an existing file are not supported because either blocks 
would be added in the middle of a file and wouldn't be 128MB, or existing 
blocks would be edited, resulting in blocks larger than 128MB. Both of these 
scenarios are bad.

Since HDFS users typically do not need seeked reads or seeked writes, these 
downsides aren't really a big deal.

If something's not clear, let me know.


Thank you for taking the time to explain.

--Kerin



Re: [gentoo-user] Re: File system testing

2014-09-18 Thread J. Roeleveld

On Wednesday, September 17, 2014 09:05:09 PM James wrote:
 J. Roeleveld joost at antarean.org writes:
  AFS has caching and can survive temporary disappearance of the 
server.
 
 Excellent for low bandwidth connections. Most DFS have mechanisms to
 deal with transient failures, but not as generaous on the time-scale
 as AFS. I believe, if I recall correctly, these hi-latency, low bandwith
 recovery mechanism keen design paramters, at least bake in the
 CMU develop cycples, for AFS?
 
 While attractive  for your situation, these features might actually
 be detrimental to a hi_performance distributed cluster's needs for
 a DFS?

I tend to agree. I'm not sure how up-to-date AFS is, but from re-reading the 
wikipedia pages, it sounds like what I need. Provided I can get it to work 
together with Samba. I need to allow MS Windows laptops access to the 
files on the remote location.

  For me, I need to be able to provide Samba filesharing on top of that
  layer on 2 different locations as I don't  see the network bandwidth to
  be sufficient for normal operations. (ADSL uplinks tend to be dead 
slow)
 
 Yea, I'm not going to be testing OpenAFS for my needs, unless I read
 some compelling publish data on it's applicability to high end
 clusters best choice as a DFS.

I wouldn't either.

 It's probably great for SETI etc etc.

Doubtful :)

Did you see the following wikipedia page:
http://en.wikipedia.org/wiki/List_of_file_systems

It contains a nice long list of various distributed, clustered, filesystems.
I just miss an indication on how well these are still supported and on which 
OSs these (can) work.

--
Joost


Re: [gentoo-user] Re: File system testing

2014-09-18 Thread J. Roeleveld

On Wednesday, September 17, 2014 04:20:24 PM Alec Ten Harmsel wrote:
 As far as HDFS goes, I would only set that up if you will use it for
 Hadoop or related tools. It's highly specific, and the performance is
 not good unless you're doing a massively parallel read (what it was
 designed for). I can elaborate why if anyone is actually interested.
 
 We use Lustre for our high performance general storage. I don't have 
any
 numbers, but I'm pretty sure it is *really* fast (10Gbit/s over IB
 sounds familiar, but don't quote me on that).

I think any shared filesystem will be fast if you have a lot of bandwidth :)
When comparing network filesystems it makes sense to keep the hardware 
identical reduce the overhead to a percentage. Eg. What is the theoretical 
maximum speed for the used network. (10Gbit/s) and what is the actual 
maximum speed you get with:
1) a single really large file (200GB)
2) a lot (100,000) smaller files (2MB)

Then you can make an estimate on what to expect when using a 1Gbit/s 
network. I somehow don't expect James to have InfiniBand available for his 
research?
Personally, when choosing between InfiniBand and Ethernet, I'm tempted 
to go with dedicated bonded 10Gbit/s links because of the price-
difference. (A quick research shows me that Infiniband is about 3x as 
expensive for the same throughput)

  Personally, I would read up on these and see how they work. Then,
  based on that, decide if they are likely to assist in the specific
  situation you are interested in.
 
 Always good advice.

It saves time to do some simple research (the reading type) before 
actually doing tests.

--
Joost


Re: [gentoo-user] Re: File system testing

2014-09-18 Thread J. Roeleveld

On Wednesday, September 17, 2014 08:56:28 PM James wrote:
 Alec Ten Harmsel alec at alectenharmsel.com writes:
  As far as HDFS goes, I would only set that up if you will use it for
  Hadoop or related tools. It's highly specific, and the performance is
  not good unless you're doing a massively parallel read (what it was
  designed for). I can elaborate why if anyone is actually interested.
 
 Acutally, from my research and my goal (one really big scientific 
simulation
 running constantly).

Out of curiosity, what do you want to simulate?

 Many folks are recommending to skip Hadoop/HDFS all
 together

I agree, Hadoop/HDFS is for data analysis. Like building a profile about 
people based on the information companies like Facebook, Google, NSA, 
Walmart, Governments, Banks, collect about their 
customers/users/citizens/slaves/

 and go straight to mesos/spark. RDD (in-memory)  cluster
 calculations are at the heart of my needs. The opposite end of the
 spectrum, loads of small files and small apps; I dunno about, but, I'm all
 ears.
 In the end, my (3) node scientific cluster will morph and support
 the typical myriad  of networked applications, but I can take
 a few years to figure that out, or just copy what smart guys like
 you and joost do.

Nope, I'm simply following what you do and provide suggestions where I 
can.
Most of the clusters and distributed computing stuff I do is based on 
adding machines to distribute the load. But the mechanisms for these are 
implemented in the applications I work with, not what I design underneath.

The filesystems I am interested in are different to the ones you want.
I need to provided access to software installation files to a VM server and 
access to documentation which is created by the users.
The VM server is physically next to what I already mentioned as server A. 
Access to the VM from the remote site will be using remote desktop 
connections.
But to allow faster and easier access to the documentation, I need a 
server B at the remote site which functions as described.
AFS might be suitable, but I need to be able to layer Samba on top of that 
to allow a seamless operation.
I don't want the laptops to have their own cache and then having to figure 
out how to solve the multiple different changes to documents containing 
layouts. (MS Word and OpenDocument files)

  We use Lustre for our high performance general storage. I don't have 
any
  numbers, but I'm pretty sure it is *really* fast (10Gbit/s over IB
  sounds familiar, but don't quote me on that).
 
 AT Umich, you guys should test the FhGFS/btrfs combo. The folks
 at UCI swear about it, although they are only publishing a wee bit.
 (you know, water cooler gossip).. Surely the Wolverines do not
 want those californians getting up on them?
 
 Are you guys planning a mesos/spark test?
 
   Personally, I would read up on these and see how they work. Then,
   based on that, decide if they are likely to assist in the specific
   situation you are interested in.
 
 It's a ton of reading. It's not apples-to-apple_cider type of reading.
 My head hurts.

Take a walk outside. Clear air should help you with the headaches :P

 I'm leaning to  DFS/LFS
 
 (2)  Luster/btrfs  and FhGFS/btrfs
 
 Thoughts/comments?

I have insufficient knowledge to advise on either of these.
One question, why BTRFS instead of ZFS?

My current understanding is:
- ZFS is production ready, but due to licensing issues, not included in the 
kernel
- BTRFS is included, but not yet production ready with all planned features

For me, Raid6-like functionality is an absolute requirement and latest I 
know is that that isn't implemented in BTRFS yet. Does anyone know when 
that will be implemented and reliable? Eg. what time-frame are we talking 
about?

--
Joost


Re: [gentoo-user] Re: File system testing

2014-09-18 Thread Kerin Millar

On 17/09/2014 21:20, Alec Ten Harmsel wrote:

As far as HDFS goes, I would only set that up if you will use it for
Hadoop or related tools. It's highly specific, and the performance is
not good unless you're doing a massively parallel read (what it was
designed for). I can elaborate why if anyone is actually interested.


I, for one, am very interested.

--Kerin



Re: [gentoo-user] Re: File system testing

2014-09-18 Thread Rich Freeman
The HTML...it hurts my eyes...  :)

On Thu, Sep 18, 2014 at 4:24 AM, J. Roeleveld jo...@antarean.org wrote:

 On Wednesday, September 17, 2014 08:56:28 PM James wrote:

 Alec Ten Harmsel alec at alectenharmsel.com writes:

  As far as HDFS goes, I would only set that up if you will use it for
  Hadoop or related tools. It's highly specific, and the performance is
  not good unless you're doing a massively parallel read (what it was
  designed for). I can elaborate why if anyone is actually interested.


FYI - one very big limitation of hdfs is its minimum filesize is
something huge like 1MB or something like that.  Hadoop was designed
to take a REALLY big input file and chunk it up.  If you use hdfs to
store something like /usr/portage it will turn into the sort of
monstrosity that you'd actually need a cluster to store.


 My current understanding is:

 - ZFS is production ready, but due to licensing issues, not included in the
 kernel

 - BTRFS is included, but not yet production ready with all planned features


Your understanding of their maturity is fairly accurate.  They also
aren't 100% moving in the same direction - btrfs aims more to be a
general-purpose filesystem replacement especially for smaller systems,
and zfs is more focused on the enterprise, so it lacks features like
raid reshaping (who needs to add 1 disk to a raid5 when you can just
add 5 more disks to your 30 disk storage system).

I think btrfs has a bit more hope of being an ext4 replacement some
day for both this reason and the licensing issue.  That in no way
detracts from the usefulness of zfs, especially for larger deployments
where the few areas where btrfs is more flexible would probably be
looked at as gimmicks (kind of like being able to build your whole OS
from source :) ).

 For me, Raid6-like functionality is an absolute requirement and latest I
 know is that that isn't implemented in BTRFS yet. Does anyone know when that
 will be implemented and reliable? Eg. what time-frame are we talking about?


I suspect we're talking months before it is really implemented, and
much longer before it is reliable.  Right now btrfs can write raid6,
but it can't really read it.  That is, it operates just fine until you
actually lose a disk containing something other than parity, and then
it loses access to the data.  This code is only in the kernel for
development purposes and nobody advocates using it for production.
Most of the code in btrfs which is reliable has been around for years,
like raid1 support, and obviously it will be years until the raid5/6
code reaches that point.  I am using btrfs mainly because once that
day comes it will be much easier to migrate to it from btrfs raid1
than from zfs (which has no mechanism for migrating raid levels
in-place (that is, within an existing vdev) - you would need to add
new drives to the pool, migrate the data, and remove the old drives
from the pool, which is nice if you have a big stack of drives and
spare sata ports lying around like you would in a SAN).

--
Rich



Re: [gentoo-user] Re: File system testing

2014-09-18 Thread J. Roeleveld

On Thursday, September 18, 2014 05:48:58 AM Rich Freeman wrote:
 The HTML...it hurts my eyes...  :)

Apologies.

  My current understanding is:
  
  - ZFS is production ready, but due to licensing issues, not included in
  the
  kernel
  
  - BTRFS is included, but not yet production ready with all planned
  features
 
 Your understanding of their maturity is fairly accurate.  They also
 aren't 100% moving in the same direction - btrfs aims more to be a
 general-purpose filesystem replacement especially for smaller systems,
 and zfs is more focused on the enterprise, so it lacks features like
 raid reshaping (who needs to add 1 disk to a raid5 when you can just
 add 5 more disks to your 30 disk storage system).

Thank you for this info. I wasn't aware of this difference in 'design'.
Sounds like ZFS will be more suited for me then.

 I think btrfs has a bit more hope of being an ext4 replacement some
 day for both this reason and the licensing issue.  That in no way
 detracts from the usefulness of zfs, especially for larger deployments
 where the few areas where btrfs is more flexible would probably be
 looked at as gimmicks (kind of like being able to build your whole OS
 from source :) ).

Next time I am rebuilding the desktops, I will likely switch them to BTRFS.
Sounds like BTRFS will be more suited there.

  For me, Raid6-like functionality is an absolute requirement and latest I
  know is that that isn't implemented in BTRFS yet. Does anyone know when
  that will be implemented and reliable? Eg. what time-frame are we talking
  about?
 I suspect we're talking months before it is really implemented, and
 much longer before it is reliable.  Right now btrfs can write raid6,
 but it can't really read it.  That is, it operates just fine until you
 actually lose a disk containing something other than parity, and then
 it loses access to the data.  This code is only in the kernel for
 development purposes and nobody advocates using it for production.
 Most of the code in btrfs which is reliable has been around for years,
 like raid1 support, and obviously it will be years until the raid5/6
 code reaches that point.  I am using btrfs mainly because once that
 day comes it will be much easier to migrate to it from btrfs raid1
 than from zfs (which has no mechanism for migrating raid levels
 in-place (that is, within an existing vdev) - you would need to add
 new drives to the pool, migrate the data, and remove the old drives
 from the pool, which is nice if you have a big stack of drives and
 spare sata ports lying around like you would in a SAN).

Exactly, although I prefer not to change the filesystem on a live system 
anytime soon. When it comes to redoing the filesystem like that, restoring 
from backups will be the fastest solution.

--
Joost



Re: [gentoo-user] Re: File system testing

2014-09-18 Thread Alec Ten Harmsel

On 09/18/2014 05:17 AM, Kerin Millar wrote:
 On 17/09/2014 21:20, Alec Ten Harmsel wrote:
 As far as HDFS goes, I would only set that up if you will use it for
 Hadoop or related tools. It's highly specific, and the performance is
 not good unless you're doing a massively parallel read (what it was
 designed for). I can elaborate why if anyone is actually interested.

 I, for one, am very interested.

 --Kerin


Alright, here goes:

Rich Freeman wrote:

 FYI - one very big limitation of hdfs is its minimum filesize is
 something huge like 1MB or something like that.  Hadoop was designed
 to take a REALLY big input file and chunk it up.  If you use hdfs to
 store something like /usr/portage it will turn into the sort of
 monstrosity that you'd actually need a cluster to store.

This is exactly correct, except we run with a block size of 128MB, and a large 
cluster will typically have a block size of 256MB or even 512MB.

HDFS has two main components: a NameNode, which keeps track of which blocks are 
a part of which file (in memory), and the DataNodes that actually store the 
blocks. No data ever flows through the NameNode; it negotiates transfers 
between the client and DataNodes and negotiates transfers for jobs. Since the 
NameNode stores metadata in-memory, small files are bad because RAM gets wasted.

What exactly is Hadoop/HDFS used for? The most common uses are generating 
search indices on data (which is a batch job) and doing non-realtime processing 
of log streams and/or data streams (another batch job) and allowing a large 
number of analysts run disparate queries on the same large dataset (another 
batch job). Batch processing - processing the entire dataset - is really where 
Hadoop shines.

When you put a file into HDFS, it gets split based on the block size. This is 
done so that a parallel read will be really fast - each map task reads in a 
single block and processes it. Ergo, if you put in a 1GB file with a 128MB 
block size and run a MapReduce job, 8 map tasks will be launched. If you put in 
a 1TB file, 8192 tasks would be launched. Tuning the block size is important to 
optimize the overhead of launching tasks vs. potentially under-utilizing a 
cluster. Typically, a cluster with a lot of data has a bigger block size.

The downsides of HDFS:
* Seeked reads are not supported afaik because no one needs that for batch 
processing
* Seeked writes into an existing file are not supported because either blocks 
would be added in the middle of a file and wouldn't be 128MB, or existing 
blocks would be edited, resulting in blocks larger than 128MB. Both of these 
scenarios are bad.

Since HDFS users typically do not need seeked reads or seeked writes, these 
downsides aren't really a big deal.

If something's not clear, let me know.

Alec




[gentoo-user] Re: File system testing

2014-09-18 Thread James
Hervé Guillemet herve at guillemet.org writes:

 
 Le 16/09/2014 21:07, James a écrit :
  
  By now many are familiar with my keen interest in clustering gentoo
  systems. So, what most cluster technologies use is a distributed file
  system on top of the local (HD/SDD) file system.


 Have you found this document :
 http://hal.inria.fr/hal-00789086/PDF/a_survey_of_dfs.pdf

Hello Herve,

Yes, I read the document and it is a good introduction to some
of my issues on which file system(s) to use for clustering. But, it's 
more of a survey than a comparison/benchmark study, which would be
really beneficial. 

DFS are moving so fast now, and their setups and features are
rarely a one to one match. For example, (currently) the best load balancing
you find, is actually in the apps that run above the cluster software. [1]
Some of the performance/resource-utilizations of the files systems/resources
 are determined by real-time analytics with graphical displays. I'm
not sure that load balancing even belongs in a DFS, yet in the paper
you reference, it was prominently discussed. Things are moving so
fast there in the distributed-*/cluster/cluster-tools/cluster-apps
space, one really need a system set up to apply almost daily patches
for testing. I never realize just how much reading is necessary just
to understand the current landscape in clustering.

I'm trying to figure out an echo_system where gentoo folks can experiment
wtih mesos clustering for scientific applications. After that, the
more general case should be mature enough for general purpose applications.
I'm avoiding the clustered web arena, as that is just too much for
me to digest; so somebody else could champion that part of all of
those Apache-cluster technologies.

Thanks for the document link!

James

[1]












[gentoo-user] Re: File system testing

2014-09-17 Thread James
J. Roeleveld joost at antarean.org writes:


  Distributed File Systems (DFS):

  Local (Device) File Systems LFS:

 Is my understanding correct that the top list all require one of 
 the bottom  list?
 Eg. the clustering FSs only ensure the files on the LFSs are 
 duplicated/spread over the various nodes?

 I would normally expect the clustering FS to be either the full layer 
 or a  clustered block-device where an FS can be placed on top.

I have not performed these installation yet. My research indicates
that first you put the Local FS on the drive, just like any installation
of Linux. Then you put the distributed FS on top of this. Some DFS might
not require a LFS, but FhGFS does and does HDFS. I will not acutally
be able to accurately answer your questions, until I start to build
up the 3 system cluster. (a week or 2 away) is my best guess.


 Otherwise it seems more like a network filesystem with caching 
 options (See  AFS).

OK, I'll add AFS. You may be correct on this one  or AFS might be both.

 I am also interested in these filesystems, but for a slightly different 
 scenario:

Ok, so I the test-dummy-crash-victim I'd be honored to have, you,
Alan, Neil, Mic  etc etc back-seat-0drive on this adventure! (The more 
I read the more it's time for burbon, bash, and a  bit of cursing
to get started...)


 - 2 servers in remote locations (different offices)
 - 1 of these has all the files stored (server A) at the main office
 - The other (server B - remote office) needs to offer all files 
 from serverA  When server B needs to supply a file, it needs to 
 check if the local copy is still the valid version. 
 If yes, supply the local copy, otherwise download 
 from server A. When a file is changed, server A needs to be updated.
 While server B is sharing a file, the file needs to be locked on server A 
 preventing simultaneous updates.

OOch, file locking (precious tells me that is alway tricky).
(pist, systemd is causing fits for the clustering geniuses;
some are espousing a variety of cgroup gymnastics for phantom kills)
Spark is fault tolerant, regardless of node/memory/drive failures
above the fault tolerance that a file system configuration many support.
If fact, files lost can be 'regenerated' but it is computationally
expensive. You have to get your file system(s) set up. Then install
mesos-0.20.0 and then spark. I have mesos mostly ready. I should
have spark in alpha-beta this weekend. I'm fairly clueless on the 
DFS/LFS issue, so a DFS that needs no LFS might be a good first choice
for testing the (3) system cluster.


 I prefer not to supply the same amount of storage at server B as 
 server A has. The remote location generally only needs access to 5% of 
 the total amount of files stored on server A. But not always the same 5%.
 Does anyone know of a filesystem that can handle this?

So in clustering, from what I have read, there are all kinds of files
passed around between the nodes and the master(s). Many are critical
files not part of the application or scientific calculations. 
So in time, I think in a clustering evironment, all you seek is
very possible, but it's a hunch, gut feeling, not fact. I'd put
raid mirros underdneath that system, if it makes sense, for now,
or just dd the stuff with a script of something kludgy (Alan is the
king of kludge)

On gentoo planet one of the devs has Consul in his overlays. Read
up on that for ideas that may be relevant to what you need.


 Joost

James
 







Re: [gentoo-user] Re: File system testing

2014-09-17 Thread J. Roeleveld

On Wednesday, September 17, 2014 03:55:56 PM James wrote:
 J. Roeleveld joost at antarean.org writes:
   Distributed File Systems (DFS):
  
   Local (Device) File Systems LFS:
  Is my understanding correct that the top list all require one of
  the bottom  list?
  Eg. the clustering FSs only ensure the files on the LFSs are
  duplicated/spread over the various nodes?
  
  I would normally expect the clustering FS to be either the full layer
  or a  clustered block-device where an FS can be placed on top.
 
 I have not performed these installation yet. My research indicates
 that first you put the Local FS on the drive, just like any installation
 of Linux. Then you put the distributed FS on top of this. Some DFS might
 not require a LFS, but FhGFS does and does HDFS. I will not acutally
 be able to accurately answer your questions, until I start to build
 up the 3 system cluster. (a week or 2 away) is my best guess.

Playing around with clusters is on my list, but due to other activities having 
a higher priority, I haven't had much time yet.

  Otherwise it seems more like a network filesystem with caching
  options (See  AFS).
 
 OK, I'll add AFS. You may be correct on this one  or AFS might be both.

Personally, I would read up on these and see how they work. Then, based 
on that, decide if they are likely to assist in the specific situation you are 
interested in.
AFS, NFS, CIFS,... can be used for clusters, but, apart from NFS, I wouldn't 
expect much performance out of them.
If you need it to be fault-tolerant and not overly rely on a single point of 
failure, I wouldn't be using any of these. Only AFS, from my original 
investigation, showed some fault-tolerence, but needed too many 
resources (disk-space) on the clients.

  I am also interested in these filesystems, but for a slightly different
 
  scenario:
 Ok, so I the test-dummy-crash-victim I'd be honored to have, you,
 Alan, Neil, Mic  etc etc back-seat-0drive on this adventure! (The more
 I read the more it's time for burbon, bash, and a  bit of cursing
 to get started...)

Good luck and even though I'd love to join in with the testing, I simply do 
not have the time to keep up. I would probably just slow you down.

  - 2 servers in remote locations (different offices)
  - 1 of these has all the files stored (server A) at the main office
  - The other (server B - remote office) needs to offer all files
  from serverA  When server B needs to supply a file, it needs to
  check if the local copy is still the valid version.
  If yes, supply the local copy, otherwise download
  from server A. When a file is changed, server A needs to be updated.
  While server B is sharing a file, the file needs to be locked on server A
  preventing simultaneous updates.
 
 OOch, file locking (precious tells me that is alway tricky).

I need it to be locked on server A while server B has a proper write-lock to 
avoid 2 modifications to compete with each other.

 (pist, systemd is causing fits for the clustering geniuses;
 some are espousing a variety of cgroup gymnastics for phantom kills)

phantom kills?

 Spark is fault tolerant, regardless of node/memory/drive failures
 above the fault tolerance that a file system configuration many support.
 If fact, files lost can be 'regenerated' but it is computationally
 expensive.

Too much for me.

 You have to get your file system(s) set up. Then install
 mesos-0.20.0 and then spark. I have mesos mostly ready. I should
 have spark in alpha-beta this weekend. I'm fairly clueless on the
 DFS/LFS issue, so a DFS that needs no LFS might be a good first choice
 for testing the (3) system cluster.

That, or a 4th node acting like a NAS sharing the filesystem over NFS.

  I prefer not to supply the same amount of storage at server B as
  server A has. The remote location generally only needs access to 5% 
of
  the total amount of files stored on server A. But not always the same 
5%.
  Does anyone know of a filesystem that can handle this?
 
 So in clustering, from what I have read, there are all kinds of files
 passed around between the nodes and the master(s). Many are critical
 files not part of the application or scientific calculations.
 So in time, I think in a clustering evironment, all you seek is
 very possible, but it's a hunch, gut feeling, not fact. I'd put
 raid mirros underdneath that system, if it makes sense, for now,
 or just dd the stuff with a script of something kludgy (Alan is the
 king of kludge)

Hmm... mirroring between servers. Always an option, except it will not work 
for me in this case:
1) Remote location will have a domestic ADSL line. I'll be lucky if it has a 
500kbps uplink
2) Server A, currently, has around 7TB of current data that also needs to 
be available on the remote site.

With a 8mbps downlink, waiting for a file to be copied to the remote site is 
acceptable. After modifications, the new version can be copied back to 
serverA slowly during network-idle-time or when server A 

Re: [gentoo-user] Re: File system testing

2014-09-17 Thread Alec Ten Harmsel
As far as HDFS goes, I would only set that up if you will use it for
Hadoop or related tools. It's highly specific, and the performance is
not good unless you're doing a massively parallel read (what it was
designed for). I can elaborate why if anyone is actually interested.

We use Lustre for our high performance general storage. I don't have any
numbers, but I'm pretty sure it is *really* fast (10Gbit/s over IB
sounds familiar, but don't quote me on that).

 Personally, I would read up on these and see how they work. Then,
 based on that, decide if they are likely to assist in the specific
 situation you are interested in.

Always good advice.

Alec



[gentoo-user] Re: File system testing

2014-09-17 Thread James
Alec Ten Harmsel alec at alectenharmsel.com writes:


 As far as HDFS goes, I would only set that up if you will use it for
 Hadoop or related tools. It's highly specific, and the performance is
 not good unless you're doing a massively parallel read (what it was
 designed for). I can elaborate why if anyone is actually interested.

Acutally, from my research and my goal (one really big scientific simulation
running constantly). Many folks are recommending to skip Hadoop/HDFS all
together and go straight to mesos/spark. RDD (in-memory)  cluster calculations
are at the heart of my needs. The opposite end of the spectrum, loads
of small files and small apps; I dunno about, but, I'm all ears.
In the end, my (3) node scientific cluster will morph and support
the typical myriad  of networked applications, but I can take
a few years to figure that out, or just copy what smart guys like
you and joost do.


 We use Lustre for our high performance general storage. I don't have any
 numbers, but I'm pretty sure it is *really* fast (10Gbit/s over IB
 sounds familiar, but don't quote me on that).

AT Umich, you guys should test the FhGFS/btrfs combo. The folks 
at UCI swear about it, although they are only publishing a wee bit.
(you know, water cooler gossip).. Surely the Wolverines do not
want those californians getting up on them?

Are you guys planning a mesos/spark test? 

  Personally, I would read up on these and see how they work. Then,
  based on that, decide if they are likely to assist in the specific
  situation you are interested in.

It's a ton of reading. It's not apples-to-apple_cider type of reading.
My head hurts.


I'm leaning to  DFS/LFS

(2)  Luster/btrfs  and FhGFS/btrfs

Thoughts/comments?

James





[gentoo-user] Re: File system testing

2014-09-17 Thread James
J. Roeleveld joost at antarean.org writes:

 AFS has caching and can survive temporary disappearance of the server.

Excellent for low bandwidth connections. Most DFS have mechanisms to 
deal with transient failures, but not as generaous on the time-scale
as AFS. I believe, if I recall correctly, these hi-latency, low bandwith
recovery mechanism keen design paramters, at least bake in the
CMU develop cycples, for AFS?

While attractive  for your situation, these features might actually
be detrimental to a hi_performance distributed cluster's needs for
a DFS?


 For me, I need to be able to provide Samba filesharing on top of that 
 layer on 2 different locations as I don't  see the network bandwidth to 
 be sufficient for normal operations. (ADSL uplinks tend to be dead slow)

Yea, I'm not going to be testing OpenAFS for my needs, unless I read
some compelling publish data on it's applicability to high end
clusters best choice as a DFS.

It's probably great for SETI etc etc.


James