[PATCH] WORKING copy of readonly mfs2

Waldemar Kozaczuk Sat, 09 Dec 2017 14:22:45 -0800

Please note it is a working (non-final) version of the patch. The most messy 
part is mfs2_cache.cc which implements some experimental features that do not 
seem to yield expected benefits. I am still looking for a review mostly from 
high level perspective to see if there no fundamental issues. Also I am looking 
for any feedback around locking logic and usage of mutexes in mfs2_cache.cc. 
Lastly I am aware that code style-wise (proper C++, etc) this code is pretty 
messy so do not be too harsh.


Here are the build commmands to try it:

./scripts/build image=node-fs-example -j4 fs=mfs
./scripts/build image=openjdk9-java-base,java-example -j4 fs=mfs //please note 
that due to the symlink bug you have to manually replace the jvm symlink with a 
directory to make it work

./scripts/build image=node-fs-example -j4 fs=mfs2
./scripts/build image=openjdk9-java-base,java-example -j4 fs=mfs2

The layout in the order stored on disk:
- Super Block (512 bytes) that contains magic number and specifies meta 
information including block size and location and size of tables containing 
i-nodes, dentries and symbolic links

- Files data where each file is padded to 512 bytes block  

- Table of directory entries referenced by index in directory i-node (each 
entry holds string with direntry name and i-node number)

- Table of symlinks referenced by symlink i-node (each entry holds symbolic 
link path string)

- Table of inodes where each specifies type (dir,file,symlink) and data offset 
(for files it is a block on a disk, for symlinks and directories it is an 
offset in one of the 2 tables above)

So when OSv mounts mfs2 (mfs2_mount) it reads super block in first disk read 
and then all 3 tables (inode, directories and i-nodes) at once in second disk 
read into memory. That way all structural data stays in memory and and can be 
accessed by corresponding vnops calls (readdir, lookup, etc) without having to 
access disk. As far as time it is small improvement over initial James Root's 
mfs in terms of file system speed but makes many things easier and cleaner.

To speed up mfs2_read I implemented simple "read-ahead" cache-like layer that 
loads file data on demand in 16K segments and stores it is hashmap for any 
subsequent read. Small files (less than 16K) are loaded in full in one shot. In 
essence it kind of works like memory mapping files - each file is logically 
divided into 16K segments and when accessed at given offset corresponding 
segment key is calculated (file offset / 16K) and copied from memory if hit or 
while segment read from disk and put in hashmap if miss. 
Obviously I had to add a little bit of thread synchronization logic to make it 
all thread safe. I tried to make it as little contentious as possible by using 
mutexes at segment level when reading data from disk. For now files segments 
loaded in memory stay there forever but only parts that are needed are loaded 
from disk as opposed to the current ramdisk implementation. It should not be 
difficult to add some cache eviction thread that would evict unused segments 
after some time which would be beneficial for any stateless apps that stay 
running longer. 

I also experimented with trying to detect sequential read of a file and then 
load it into larger segments (64K) hoping that reading more from disk (also 
less exits to host) would speed up things even more. Unfortunately I did not 
see much of an improvement in terms of speed even though I saw better hit/miss 
ratio (80% vs 60%) on average. I was also disappointed to observe that bumping 
file segment size to 32K or 64K did not speed things much (maybe 5%) at the 
cost of even higher cache memory waste. 
Which is weird because my experiments with copying files using dd indicated 
that 64K or even 128K should yields best read speed on the host. Is there some 
kind of speed limit of data transfer between host and guest on qemu? 
Performance tests with ramdisk images show that reading from disk and 
uncompressing 20MB in real mode takes on average 100ms when reading 16MB from 
disk in 16K chunks in my my mfs2 implementation takes on average 250ms?

Signed-off-by: Waldemar Kozaczuk <jwkozac...@gmail.com>
---
 Makefile                  |   9 +
 analyze_mfs2.py           |  39 +++++
 core/shutdown.cc          |  27 +++
 fs/mfs/mfs.hh             | 113 +++++++++++++
 fs/mfs/mfs_cache.cc       |  34 ++++
 fs/mfs/mfs_inode.cc       |  90 ++++++++++
 fs/mfs/mfs_vfsops.cc      | 152 +++++++++++++++++
 fs/mfs/mfs_vnops.cc       | 356 +++++++++++++++++++++++++++++++++++++++
 fs/mfs2/mfs2.hh           |  75 +++++++++
 fs/mfs2/mfs2_cache.cc     | 421 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/mfs2/mfs2_vfsops.cc    | 231 +++++++++++++++++++++++++
 fs/mfs2/mfs2_vnops.cc     | 316 ++++++++++++++++++++++++++++++++++
 fs/vfs/main.cc            |  18 +-
 fs/vfs/vfs_conf.cc        |   6 +
 fs/vfs/vfs_fops.cc        |   1 +
 include/osv/shutdown.hh   |   3 +
 licenses/mfs.txt          |  45 +++++
 loader.cc                 |   2 +
 modules/java-base/java.cc |   4 +
 scripts/build             |  24 ++-
 scripts/gen-mfs-img.py    | 293 ++++++++++++++++++++++++++++++++
 scripts/gen-mfs2-img.py   | 342 +++++++++++++++++++++++++++++++++++++
 22 files changed, 2594 insertions(+), 7 deletions(-)
 create mode 100755 analyze_mfs2.py
 create mode 100644 fs/mfs/mfs.hh
 create mode 100644 fs/mfs/mfs_cache.cc
 create mode 100644 fs/mfs/mfs_inode.cc
 create mode 100644 fs/mfs/mfs_vfsops.cc
 create mode 100644 fs/mfs/mfs_vnops.cc
 create mode 100644 fs/mfs2/mfs2.hh
 create mode 100644 fs/mfs2/mfs2_cache.cc
 create mode 100644 fs/mfs2/mfs2_vfsops.cc
 create mode 100644 fs/mfs2/mfs2_vnops.cc
 create mode 100644 licenses/mfs.txt
 create mode 100755 scripts/gen-mfs-img.py
 create mode 100755 scripts/gen-mfs2-img.py

diff --git a/Makefile b/Makefile
index 8dd2371..9a499fa 100644
--- a/Makefile
+++ b/Makefile
@@ -1769,6 +1769,15 @@ fs_objs += ramfs/ramfs_vfsops.o \
 fs_objs += devfs/devfs_vnops.o \
        devfs/device.o
 
+fs_objs += mfs/mfs_vfsops.o \
+       mfs/mfs_vnops.o \
+       mfs/mfs_inode.o \
+       mfs/mfs_cache.o
+
+fs_objs += mfs2/mfs2_vfsops.o \
+       mfs2/mfs2_vnops.o \
+       mfs2/mfs2_cache.o
+
 fs_objs += procfs/procfs_vnops.o
 
 objects += $(addprefix fs/, $(fs_objs))
diff --git a/analyze_mfs2.py b/analyze_mfs2.py
new file mode 100755
index 0000000..fa20a62
--- /dev/null
+++ b/analyze_mfs2.py
@@ -0,0 +1,39 @@
+#!/usr/bin/python
+import sys, re
+
+class InodeRead(object):
+       def __init__(self,inode,block):
+               self.inode = inode
+               self.block = block
+
+inode_reads = []
+
+filename = sys.argv[1]
+file = open(filename,"r")
+
+for line in file.readlines():
+       mfs2_read = re.match(".*inode: (\d+), \[(\d+) .*", line)
+       if mfs2_read and mfs2_read.groups():
+               inode = int(mfs2_read.group(1))
+               block = int(mfs2_read.group(2))
+               inode_read = InodeRead(inode,block)
+               inode_reads.append(inode_read)
+
+for (idx,inode_read) in enumerate(inode_reads):
+       idx2 = idx - 1
+       found = False
+       distance = 1000000
+        while (idx2 >= 0 and idx - idx2 < 128):
+               other_read = inode_reads[idx2]
+               if (inode_read.inode == other_read.inode): 
+                       if (inode_read.block == other_read.block + 8):
+                               found = True
+                               break
+                       else:
+                               distance = 
min(distance,abs(other_read.block-inode_read.block)) 
+               idx2 = idx2 - 1
+               
+       if found:
+               print("idx: %d, inode: %d, block: %d, %d_" % 
(idx,inode_read.inode,inode_read.block,idx-idx2))
+       else:
+               print("idx: %d, inode: %d, block: %d, NONE, %d_" % 
(idx,inode_read.inode,inode_read.block,distance))
diff --git a/core/shutdown.cc b/core/shutdown.cc
index 30b9512..439256b 100644
--- a/core/shutdown.cc
+++ b/core/shutdown.cc
@@ -1,3 +1,8 @@
+#include <iostream>
+#include <chrono>
+#include <ctime>
+#include <iomanip>
+
 #include <osv/shutdown.hh>
 #include <osv/power.hh>
 #include <osv/debug.hh>
@@ -8,8 +13,25 @@ extern void vfs_exit(void);
 
 namespace osv {
 
+static std::chrono::high_resolution_clock::time_point pre_vfs_mount;
+static std::chrono::high_resolution_clock::time_point app_begin;
+
+void mark_pre_vfs_mount()
+{
+    pre_vfs_mount = std::chrono::high_resolution_clock::now();
+}
+
+void mark_app_begins()
+{
+    app_begin = std::chrono::high_resolution_clock::now();
+}
+
 void shutdown()
 {
+    auto app_end = std::chrono::high_resolution_clock::now();
+    std::chrono::duration<double> sec = app_end - app_begin;
+    std::cout << "app main took " << std::fixed << std::setprecision(2) << 
1000 * sec.count() << " ms" << std::endl;
+
     dhcp_release();
 
     // The vfs_exit() call below will forcibly unmount the filesystem. If any
@@ -34,6 +56,11 @@ void shutdown()
     }
 
     vfs_exit();
+
+    auto after_vfs_unmount = std::chrono::high_resolution_clock::now();
+    std::chrono::duration<double> sec2 = after_vfs_unmount - pre_vfs_mount;
+    std::cout << "from pre-VFS mount until post-VFS unmount took " << 
std::fixed << std::setprecision(2) << 1000 * sec2.count() << " ms" << std::endl;
+
     debug("Powering off.\n");
     osv::poweroff();
 }
diff --git a/fs/mfs/mfs.hh b/fs/mfs/mfs.hh
new file mode 100644
index 0000000..aa1aa13
--- /dev/null
+++ b/fs/mfs/mfs.hh
@@ -0,0 +1,113 @@
+/*
+ * Copyright (c) 2015 Carnegie Mellon University.
+ * All Rights Reserved.
+ *
+ * THIS SOFTWARE IS PROVIDED "AS IS," WITH NO WARRANTIES WHATSOEVER. CARNEGIE
+ * MELLON UNIVERSITY EXPRESSLY DISCLAIMS TO THE FULLEST EXTENT PERMITTEDBY LAW
+ * ALL EXPRESS, IMPLIED, AND STATUTORY WARRANTIES, INCLUDING, WITHOUT
+ * LIMITATION, THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
+ * PURPOSE, AND NON-INFRINGEMENT OF PROPRIETARY RIGHTS.
+ *
+ * Released under a modified BSD license. For full terms, please see mfs.txt in
+ * the licenses folder or contact permi...@sei.cmu.edu.
+ *
+ * DM-0002621
+ *
+ * Based on https://github.com/jdroot/mfs
+ */
+
+#ifndef __INCLUDE_MFS_H__
+#define __INCLUDE_MFS_H__
+
+#include <osv/vnode.h>
+#include <osv/mount.h>
+#include <osv/dentry.h>
+#include <osv/prex.h>
+#include <osv/buf.h>
+
+#define MFS_VERSION            1
+#define MFS_MAGIC              0xDEADBEEF
+#define MFS_FILENAME_MAXLEN    63
+#define MFS_ROOT_INODE_NUMBER  1
+
+#define MFS_SUPERBLOCK_SIZE sizeof(struct mfs_super_block)
+#define MFS_SUPERBLOCK_BLOCK 0
+
+
+#define MFS_INODE_SIZE ((uint64_t)sizeof(struct mfs_inode))
+#define MFS_INODES_PER_BLOCK(bs) ((bs) / MFS_INODE_SIZE)
+#define MFS_INODE_BLOCK(bs, i) ((i) / MFS_INODES_PER_BLOCK(bs))
+#define MFS_INODE_OFFSET(bs, i) ((i) % MFS_INODES_PER_BLOCK(bs))
+
+
+#define MFS_RECORD_SIZE (uint64_t)sizeof(struct mfs_dir_record)
+#define MFS_RECORDS_PER_BLOCK(bs) ((bs) / MFS_RECORD_SIZE)
+#define MFS_RECORD_BLOCK(bs, i) ((i) / MFS_RECORDS_PER_BLOCK(bs))
+#define MFS_RECORD_OFFSET(bs, i) ((i) % (MFS_RECORDS_PER_BLOCK(bs)))
+
+
+#define MFS_CACHE_SIZE 1024
+
+
+//#if 0
+//#define print(...) kprintf(__VA_ARGS__)
+//#else
+#define print(...)
+//#endif
+
+//#if 0
+#define MFS_STOPWATCH_START auto begin = 
std::chrono::high_resolution_clock::now();
+#define MFS_STOPWATCH_END(...) auto end = 
std::chrono::high_resolution_clock::now(); \
+std::chrono::duration<double> sec = end - begin; \
+print(__VA_ARGS__);
+#define MFS_STOPWATCH_END2(total) auto end = 
std::chrono::high_resolution_clock::now(); \
+std::chrono::duration<double> sec = end - begin; \
+total += ((long)(sec.count() * 1000000));
+//#else
+//#define MFS_STOPWATCH_START
+//#define MFS_STOPWATCH_END(...)
+//#define MFS_STOPWATCH_END2(...)
+//#endif
+
+extern struct vfsops mfs_vfsops;
+extern struct vnops mfs_vnops;
+
+struct mfs_super_block {
+    uint64_t magic;
+    uint64_t version;
+    uint64_t block_size;
+    uint64_t inodes_block;
+};
+
+
+struct mfs_inode {
+       mode_t   mode;
+       uint64_t inode_no;
+       uint64_t data_block_number;
+       union {
+               uint64_t file_size;
+               uint64_t dir_children_count;
+       };
+};
+
+struct mfs_dir_record {
+    // Add one for \0
+    char filename[MFS_FILENAME_MAXLEN + 1];
+    uint64_t inode_no;
+};
+
+// FIXME: The code is setup so a cache can be added pretty quickly if needed, 
but the
+// underlying bread function is already cached. To add a cache, the structure 
would be
+// added here, and then mfs_cache_read and mfs_cache_write would use it
+struct mfs {
+     struct mfs_super_block *sb;
+};
+
+struct mfs_inode *mfs_get_inode(struct mfs *mfs, struct device *dev, uint64_t 
inode_no);
+void              mfs_set_vnode(struct vnode* vnode, struct mfs_inode *inode);
+
+int  mfs_cache_read(struct mfs *mfs, struct device *device, uint64_t blkid, 
struct buf **bh);
+void mfs_cache_release(struct mfs *mfs, struct buf *bh);
+
+#endif
+
diff --git a/fs/mfs/mfs_cache.cc b/fs/mfs/mfs_cache.cc
new file mode 100644
index 0000000..d835852
--- /dev/null
+++ b/fs/mfs/mfs_cache.cc
@@ -0,0 +1,34 @@
+/*
+ * Copyright (c) 2015 Carnegie Mellon University.
+ * All Rights Reserved.
+ *
+ * THIS SOFTWARE IS PROVIDED "AS IS," WITH NO WARRANTIES WHATSOEVER. CARNEGIE
+ * MELLON UNIVERSITY EXPRESSLY DISCLAIMS TO THE FULLEST EXTENT PERMITTEDBY LAW
+ * ALL EXPRESS, IMPLIED, AND STATUTORY WARRANTIES, INCLUDING, WITHOUT
+ * LIMITATION, THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
+ * PURPOSE, AND NON-INFRINGEMENT OF PROPRIETARY RIGHTS.
+ *
+ * Released under a modified BSD license. For full terms, please see mfs.txt in
+ * the licenses folder or contact permi...@sei.cmu.edu.
+ *
+ * DM-0002621
+ *
+ * Based on https://github.com/jdroot/mfs
+ */
+
+#include "mfs.hh"
+
+extern std::atomic<long> mfs_block_read_count;
+extern std::atomic<long> mfs_block_read_ms;
+
+int mfs_cache_read(struct mfs *mfs, struct device *device, uint64_t blkid, 
struct buf **bh) {
+    MFS_STOPWATCH_START
+    mfs_block_read_count += 1;
+    int ret = bread(device, blkid, bh);
+    MFS_STOPWATCH_END2(mfs_block_read_ms)
+    return ret;
+}
+
+void mfs_cache_release(struct mfs *mfs, struct buf *bh) {
+    brelse(bh);
+}
diff --git a/fs/mfs/mfs_inode.cc b/fs/mfs/mfs_inode.cc
new file mode 100644
index 0000000..c08dcd1
--- /dev/null
+++ b/fs/mfs/mfs_inode.cc
@@ -0,0 +1,90 @@
+/*
+ * Copyright (c) 2015 Carnegie Mellon University.
+ * All Rights Reserved.
+ *
+ * THIS SOFTWARE IS PROVIDED "AS IS," WITH NO WARRANTIES WHATSOEVER. CARNEGIE
+ * MELLON UNIVERSITY EXPRESSLY DISCLAIMS TO THE FULLEST EXTENT PERMITTEDBY LAW
+ * ALL EXPRESS, IMPLIED, AND STATUTORY WARRANTIES, INCLUDING, WITHOUT
+ * LIMITATION, THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
+ * PURPOSE, AND NON-INFRINGEMENT OF PROPRIETARY RIGHTS.
+ *
+ * Released under a modified BSD license. For full terms, please see mfs.txt in
+ * the licenses folder or contact permi...@sei.cmu.edu.
+ *
+ * DM-0002621
+ *
+ * Based on https://github.com/jdroot/mfs
+ */
+
+#include "mfs.hh"
+
+#include <stdio.h>
+#include <sys/types.h>
+#include <osv/device.h>
+#include <osv/buf.h>
+#include <osv/debug.h>
+
+struct mfs_inode *mfs_get_inode(struct mfs *mfs, struct device *dev, uint64_t 
inode_no) {
+    struct mfs_super_block *sb    = mfs->sb;
+    struct mfs_inode       *inode = nullptr;
+    struct mfs_inode       *rv    = nullptr;
+    struct buf             *bh    = nullptr;
+    
+    uint64_t i            = inode_no - 1;
+    int      error        = -1;
+    uint64_t inode_block  = sb->inodes_block;
+    uint64_t inode_offset = 0;
+
+    inode_block += MFS_INODE_BLOCK(sb->block_size, i);
+    inode_offset = MFS_INODE_OFFSET(sb->block_size, i);
+
+    print("[mfs] looking for inode %llu in block %llu\n", inode_no, 
inode_block);
+
+    error = mfs_cache_read(mfs, dev, inode_block, &bh);
+    if (error) {
+        kprintf("[mfs] Error reading block [%llu]\n", inode_block);
+        return nullptr;
+    }
+
+    inode = (struct mfs_inode *)bh->b_data;
+    inode += inode_offset;
+
+    print("[mfs] got inode_no = %llu\n", inode->inode_no);
+
+    // Assert is somewhat dangerous here, but if this assert fails the 
filesystem
+    // has been corrupted somehow.
+    assert(inode->inode_no == inode_no);
+
+    rv = new mfs_inode;
+    memcpy(rv, inode, sizeof(struct mfs_inode));
+
+    mfs_cache_release(mfs, bh);
+
+    return rv;
+}
+
+void mfs_set_vnode(struct vnode* vnode, struct mfs_inode *inode) {
+    off_t size = 0;
+    if (vnode == nullptr || inode == nullptr) {
+        return;
+    }
+
+    vnode->v_data = inode;
+    vnode->v_ino = inode->inode_no;
+
+    // Set type
+    if (S_ISDIR(inode->mode)) {
+        size = MFS_INODE_SIZE;
+        vnode->v_type = VDIR;
+    } else if (S_ISREG(inode->mode)) {
+        size = inode->file_size;
+        vnode->v_type = VREG;
+    } else if (S_ISLNK(inode->mode)) {
+        size = 512; // Max size
+        vnode->v_type = VLNK;
+    }
+
+    vnode->v_mode = inode->mode;
+    vnode->v_size = size;
+}
+
diff --git a/fs/mfs/mfs_vfsops.cc b/fs/mfs/mfs_vfsops.cc
new file mode 100644
index 0000000..b3f5a17
--- /dev/null
+++ b/fs/mfs/mfs_vfsops.cc
@@ -0,0 +1,152 @@
+/*
+ * Copyright (c) 2015 Carnegie Mellon University.
+ * All Rights Reserved.
+ *
+ * THIS SOFTWARE IS PROVIDED "AS IS," WITH NO WARRANTIES WHATSOEVER. CARNEGIE
+ * MELLON UNIVERSITY EXPRESSLY DISCLAIMS TO THE FULLEST EXTENT PERMITTEDBY LAW
+ * ALL EXPRESS, IMPLIED, AND STATUTORY WARRANTIES, INCLUDING, WITHOUT
+ * LIMITATION, THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
+ * PURPOSE, AND NON-INFRINGEMENT OF PROPRIETARY RIGHTS.
+ *
+ * Released under a modified BSD license. For full terms, please see mfs.txt in
+ * the licenses folder or contact permi...@sei.cmu.edu.
+ *
+ * DM-0002621
+ *
+ * Based on https://github.com/jdroot/mfs
+ */
+ 
+#include "mfs.hh"
+#include <stdio.h>
+#include <sys/types.h>
+#include <osv/device.h>
+#include <osv/debug.h>
+#include <osv/bio.h>
+
+static int mfs_mount(struct mount *mp, const char *dev, int flags, const void 
*data);
+static int mfs_sync(struct mount *mp);
+static int mfs_statfs(struct mount *mp, struct statfs *statp);
+static int mfs_unmount(struct mount *mp, int flags);
+
+#define ramfs_vget     ((vfsop_vget_t)vfs_nullop)
+#define ramfs_statfs   ((vfsop_statfs_t)vfs_nullop)
+
+std::atomic<long> mfs_block_read_ms(0);
+std::atomic<long> mfs_block_read_count(0);
+
+struct vfsops mfs_vfsops = {
+       mfs_mount,                          /* mount */
+       mfs_unmount,                    /* unmount */
+       mfs_sync,                           /* sync */
+       ((vfsop_vget_t)vfs_nullop), /* vget */
+       mfs_statfs,                         /* statfs */
+       &mfs_vnops,                         /* vnops */
+};
+
+static int
+mfs_mount(struct mount *mp, const char *dev, int flags, const void *data) {
+    struct device          *device;
+    struct buf             *bh    = nullptr;
+    struct mfs             *mfs   = new struct mfs;
+    struct mfs_super_block *sb    = nullptr;
+    struct mfs_inode *root_inode  = nullptr;
+    int error = -1;
+
+    error = device_open(dev + 5, DO_RDWR, &device);
+    if (error) {
+        kprintf("[mfs] Error opening device!\n");
+        return error;
+    }
+
+
+    error = mfs_cache_read(mfs, device, MFS_SUPERBLOCK_BLOCK, &bh);
+    mfs_block_read_count += 1;
+    if (error) {
+        kprintf("[mfs] Error reading mfs superblock\n");
+        device_close(device);
+        delete mfs;
+        return error;
+    }
+
+    // We see if the file system is MFS, if not, return error and close 
everything
+    sb = (struct mfs_super_block*)bh->b_data;
+    if (sb->magic != MFS_MAGIC) {
+        print("[mfs] Error magics do not match!\n");
+        print("[mfs] Expecting %016llX but got %016llX\n", MFS_MAGIC, 
sb->magic);
+        mfs_cache_release(mfs, bh);
+        device_close(device);
+        delete mfs;
+        return -1; // TODO: Proper error code
+    }
+
+    if (sb->version != MFS_VERSION) {
+        kprintf("[mfs] Found mfs volume but incompatible version!\n");
+        kprintf("[mfs] Expecting %llu but found %llu\n", MFS_VERSION, 
sb->version);
+        mfs_cache_release(mfs, bh);
+        device_close(device);
+        delete mfs;
+        return -1;
+    }
+
+    print("[mfs] Got superblock version: 0x%016llX\n", sb->version);
+    print("[mfs] Got magic:              0x%016llX\n", sb->magic);
+    print("[mfs] Got block size:         0x%016llX\n", sb->block_size);
+    print("[mfs] Got inode block:        0x%016llX\n", sb->inodes_block);
+
+    // Since we have found MFS, we can copy the superblock now
+    sb = new mfs_super_block;
+    memcpy(sb, bh->b_data, MFS_SUPERBLOCK_SIZE);
+    mfs_cache_release(mfs, bh);
+    
+    mfs->sb    = sb;
+
+    // Save a reference to our superblock
+    mp->m_data = mfs;
+    mp->m_dev = device;
+
+    root_inode = mfs_get_inode(mfs, device, MFS_ROOT_INODE_NUMBER);
+
+    mfs_set_vnode(mp->m_root->d_vnode, root_inode);
+
+    return 0;
+}
+
+static int mfs_sync(struct mount *mp) {
+    return 0;
+}
+
+static int mfs_statfs(struct mount *mp, struct statfs *statp) {
+    struct mfs             *mfs = (struct mfs*)mp->m_data;
+    struct mfs_super_block *sb  = mfs->sb;
+
+    statp->f_bsize = sb->block_size;
+
+    // Total blocks, unknown...
+    statp->f_blocks = sb->inodes_block;
+    // Read only. 0 blocks free
+    statp->f_bfree = 0;
+    statp->f_bavail = 0;
+
+    statp->f_ffree = 0;
+    statp->f_files = sb->inodes_block; //Needs to be inode count
+
+    statp->f_namelen = MFS_FILENAME_MAXLEN;
+
+    return 0;
+}
+
+static int
+mfs_unmount(struct mount *mp, int flags) {
+    struct mfs             *mfs   = (struct mfs*)mp->m_data;
+    struct mfs_super_block *sb    = mfs->sb;
+    struct device          *dev   = mp->m_dev;
+
+    device_close(dev);
+    delete sb;
+    delete mfs;
+
+    std::cout << "Spent " << mfs_block_read_ms.load() / 1000 << " ms reading 
blocks" << std::endl;
+    std::cout << "Read " << mfs_block_read_count.load() << " blocks" << 
std::endl;
+
+    return 0;
+}
diff --git a/fs/mfs/mfs_vnops.cc b/fs/mfs/mfs_vnops.cc
new file mode 100644
index 0000000..aa9c8a3
--- /dev/null
+++ b/fs/mfs/mfs_vnops.cc
@@ -0,0 +1,356 @@
+/*
+ * Copyright (c) 2015 Carnegie Mellon University.
+ * All Rights Reserved.
+ *
+ * THIS SOFTWARE IS PROVIDED "AS IS," WITH NO WARRANTIES WHATSOEVER. CARNEGIE
+ * MELLON UNIVERSITY EXPRESSLY DISCLAIMS TO THE FULLEST EXTENT PERMITTEDBY LAW
+ * ALL EXPRESS, IMPLIED, AND STATUTORY WARRANTIES, INCLUDING, WITHOUT
+ * LIMITATION, THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR
+ * PURPOSE, AND NON-INFRINGEMENT OF PROPRIETARY RIGHTS.
+ *
+ * Released under a modified BSD license. For full terms, please see mfs.txt in
+ * the licenses folder or contact permi...@sei.cmu.edu.
+ *
+ * DM-0002621
+ *
+ * Based on https://github.com/jdroot/mfs
+ */
+
+#include <sys/stat.h>
+#include <dirent.h>
+#include <sys/param.h>
+
+#include <errno.h>
+#include <string.h>
+#include <stdlib.h>
+#include <fcntl.h>
+
+#include <osv/prex.h>
+#include <osv/vnode.h>
+#include <osv/file.h>
+#include <osv/mount.h>
+#include <osv/debug.h>
+#include <osv/bio.h>
+
+#include <sys/types.h>
+#include <osv/device.h>
+
+#include "mfs.hh"
+
+extern std::atomic<long> mfs_block_read_ms;
+extern std::atomic<long> mfs_block_read_count;
+
+// Used by extern declaration in fs/vfs/vfs_conf.cc
+int mfs_init(void) {
+    return 0;
+}
+
+static int mfs_open(struct file *fp) {
+    if ((file_flags(fp) & FWRITE)) {
+        // We do not allow writing! jerks
+        return (EPERM);
+    }
+    print("[mfs] mfs_open called\n");
+    return 0;
+}
+
+static int mfs_close(struct vnode *vp, struct file *fp) {
+    print("[mfs] mfs_close called\n");
+    // Nothing to do really...
+    return 0;
+}
+
+static size_t min(size_t a, size_t b) {
+    if (a > b) return b;
+    return a;
+}
+
+// FIXME: The current link implementation is very wasteful as far as disk
+// space goes. It leaves 512 - path length bytes un-used. For a base OSv
+// image this is not a big deal, because it has exactly 1 link, but if
+// an image is created that uses many links, it could start to be a large,
+// waste of space. I think the least intrusive way to add more is to write
+// links sequentially (with \0 as the last character) and store their
+// data_block_number and offset in the corresponding inode. This makes
+// the generation script a little bit more complex.
+static int mfs_readlink(struct vnode *vnode, struct uio *uio) {
+    struct mfs             *mfs    = (struct mfs*) vnode->v_mount->m_data;
+    struct mfs_inode       *inode  = (struct mfs_inode*)vnode->v_data;
+    struct device          *device = vnode->v_mount->m_dev;
+    struct buf             *bh     = nullptr;
+    char                   *data   = nullptr;
+
+    int error = -1;
+
+    error = mfs_cache_read(mfs, device, inode->data_block_number, &bh);
+    if (error) {
+        kprintf("[mfs] Error reading link from inode->data_block_number\n");
+        return error;
+    }
+
+    data = (char *)bh->b_data;
+    error = uiomove(data, strlen(data) + 1, uio);
+    mfs_cache_release(mfs, bh);
+    return error;
+}
+
+static int mfs_read(struct vnode *vnode, struct file* fp, struct uio *uio, int 
ioflag) {
+
+    struct mfs             *mfs    = (struct mfs*) vnode->v_mount->m_data;
+    struct mfs_super_block *sb     = mfs->sb;
+    struct mfs_inode       *inode  = (struct mfs_inode*)vnode->v_data;
+    struct device          *device = vnode->v_mount->m_dev;
+    struct buf             *bh     = nullptr;
+    char                   *data   = nullptr;
+
+    size_t   len    =  0;
+    int      rv     =  0;
+    int      error  = -1;
+    uint64_t block  =  inode->data_block_number;
+    uint64_t offset =  0;
+
+    // Total read amount is what they requested, or what is left
+    uint64_t read_amt = min(inode->file_size - uio->uio_offset, 
uio->uio_resid);
+    uint64_t total  =  0;
+
+    // Calculate which block we need actually need to read
+    block += uio->uio_offset / sb->block_size;
+    offset = uio->uio_offset % sb->block_size;
+
+    // Cant read directories
+    if (vnode->v_type == VDIR)
+        return EISDIR;
+    // Cant read anything but reg
+    if (vnode->v_type != VREG)
+        return EINVAL;
+    // Cant start reading before the first byte
+    if (uio->uio_offset < 0)
+        return EINVAL;
+    // Need to read more than 1 byte
+    if (uio->uio_resid == 0)
+        return 0;
+    // Cant read after the end of the file
+    if (uio->uio_offset >= (off_t)vnode->v_size)
+        return 0;
+
+    while (read_amt > 0) {
+        // Force the read to fit inside a block
+        len = min(sb->block_size - offset, read_amt);
+
+        error = mfs_cache_read(mfs, device, block, &bh);
+        if (error) {
+            kprintf("[mfs] Error reading block [%llu]\n", block);
+            return 0;
+        }
+
+        data = (char *)bh->b_data;
+        rv = uiomove(data + offset, len, uio);
+        mfs_cache_release(mfs, bh);
+
+        // Move on to the next block
+        // Set offset to 0 to make sure we start the start of the next block
+        offset    = 0;
+        read_amt -= len;
+        total    += len;
+        block++;
+    }
+
+    return rv;
+}
+
+static int mfs_readdir(struct vnode *vnode, struct file *fp, struct dirent 
*dir) {
+
+    //MFS_STOPWATCH_START
+    struct mfs             *mfs    = (struct mfs*)vnode->v_mount->m_data;
+    struct mfs_inode       *inode  = (struct mfs_inode*)vnode->v_data;
+    struct mfs_super_block *sb     = mfs->sb;
+    struct device          *device = vnode->v_mount->m_dev;
+    struct mfs_dir_record  *record = nullptr;
+    struct buf             *bh     = nullptr;
+    
+    int      error  = -1;
+    uint64_t index  =  0;
+    uint64_t block  =  inode->data_block_number;
+    uint64_t offset =  0;
+    
+    if (fp->f_offset == 0) {
+        dir->d_type = DT_DIR;
+        strlcpy((char *)&dir->d_name, ".", sizeof(dir->d_name));
+    } else if (fp->f_offset == 1) {
+        dir->d_type = DT_DIR;
+        strlcpy((char *)&dir->d_name, "..", sizeof(dir->d_name));
+    } else {
+        
+        index = fp->f_offset - 2;
+        if (index >= inode->dir_children_count) {
+            return ENOENT;
+        }
+    
+        block  += MFS_RECORD_BLOCK(sb->block_size, index);
+        offset  = MFS_RECORD_OFFSET(sb->block_size, index);
+
+        print("[mfs] readdir block: %llu\n", block);
+        print("[mfs] readdir offset: %llu\n", offset);
+
+        // Do as much as possible before the read
+        if (S_ISDIR(inode->mode))
+            dir->d_type = DT_DIR;
+        else
+            dir->d_type = DT_REG;
+    
+        dir->d_fileno = fp->f_offset;
+
+
+        error = mfs_cache_read(mfs, device, block, &bh);
+        if (error) {
+            kprintf("[mfs] Error reading block [%llu]\n", block);
+            return ENOENT;
+        }
+
+        record = (struct mfs_dir_record*)bh->b_data;
+        record += offset;
+
+        // Set the name
+        strlcpy((char *)&dir->d_name, record->filename, sizeof(dir->d_name));
+        dir->d_ino = record->inode_no;
+
+        mfs_cache_release(mfs, bh);
+    }
+
+    fp->f_offset++;
+    //MFS_STOPWATCH_END("[mfs] mfs_readdir of dir %s took %.2fms\n", 
dir->d_name, 1000 * sec.count())
+
+    return 0;
+}
+
+static int mfs_lookup(struct vnode *vnode, char *name, struct vnode **vpp) {
+    //MFS_STOPWATCH_START
+    struct mfs             *mfs     = (struct mfs*)vnode->v_mount->m_data;
+    struct mfs_inode       *inode   = (struct mfs_inode*)vnode->v_data;
+    struct mfs_super_block *sb      = mfs->sb;
+    struct device          *device  = vnode->v_mount->m_dev;
+    struct mfs_inode       *r_inode = nullptr;
+    struct mfs_dir_record  *records = nullptr;
+    struct buf             *bh      = nullptr;
+    struct vnode           *vp      = nullptr;
+
+    int      error  = -1;
+    uint64_t i      =  0;
+    uint64_t block  =  inode->data_block_number;
+    uint64_t c      =  0;
+
+    if (*name == '\0') {
+        return ENOENT;
+    }
+
+    while (r_inode == nullptr) {
+        error = mfs_cache_read(mfs, device, block, &bh);
+        if (error) {
+            kprintf("[mfs] Error reading block [%llu]\n", block);
+            return ENOENT;
+        }
+
+        records = (struct mfs_dir_record *)bh->b_data;
+        for (i = 0; i < MFS_RECORDS_PER_BLOCK(sb->block_size); i++) {
+            if (strcmp(name, records[i].filename) == 0) {
+                // Found!
+                print("[mfs] found the directory entry!\n");
+                r_inode = mfs_get_inode(mfs, device, records[i].inode_no);
+                break;
+            }
+            c++;
+            if (c >= inode->dir_children_count) {
+                break;
+            }
+        }
+
+        mfs_cache_release(mfs, bh);
+
+        // If we looked at every entry and still havnt found it
+        if (c >= inode->dir_children_count && r_inode == nullptr) {
+            return ENOENT;
+        } else {
+            // Move on to the next block!
+            block++;
+        }
+    }
+
+    print("[mfs] mfs_lookup [%s] using inode: %llu\n", name, 
r_inode->inode_no);
+
+    if (vget(vnode->v_mount, r_inode->inode_no, &vp)) {
+        print("[mfs] found vp in cache!\n");
+        // Found in cache?
+        *vpp = vp;
+        return 0;
+    }
+
+    print("[mfs] got vp: %p\n", vp);
+
+    if (!vp) {
+        delete r_inode;
+        return ENOMEM;
+    }
+
+    mfs_set_vnode(vp, r_inode);
+
+    *vpp = vp;
+
+    //MFS_STOPWATCH_END("[mfs] lookup took %.2fms\n", 1000 * sec.count())
+    //MFS_STOPWATCH_END2(mfs_lookup_total_ms)
+
+    return 0;
+}
+
+static int mfs_getattr(struct vnode *vnode, struct vattr *attr) {
+    struct mfs_inode *inode = (struct mfs_inode*)vnode->v_data;
+
+    // Doesn't seem to work, I think permissions are hard coded to 777
+    attr->va_mode = 00555;
+    
+    if (S_ISDIR(inode->mode)) {
+        attr->va_type = VDIR;
+    } else {
+        attr->va_type = VREG;
+    }
+
+    attr->va_nodeid = vnode->v_ino;
+    attr->va_size = vnode->v_size;
+
+    return 0;
+}
+
+#define mfs_seek        ((vnop_seek_t)vop_nullop)
+#define mfs_ioctl        ((vnop_ioctl_t)vop_nullop)
+#define mfs_inactive    ((vnop_inactive_t)vop_nullop)
+#define mfs_truncate    ((vnop_truncate_t)vop_nullop)
+#define mfs_link         ((vnop_link_t)vop_nullop)
+#define mfs_arc            ((vnop_cache_t) nullptr)
+#define mfs_fallocate    ((vnop_fallocate_t)vop_nullop)
+#define mfs_fsync        ((vnop_fsync_t)vop_nullop)
+#define mfs_symlink        ((vnop_symlink_t)vop_nullop)
+
+struct vnops mfs_vnops = {
+    mfs_open,       /* open */
+    mfs_close,      /* close */
+    mfs_read,       /* read */
+    nullptr,           /* write - not impelemented */
+    mfs_seek,       /* seek */
+    mfs_ioctl,      /* ioctl */
+    mfs_fsync,      /* fsync */
+    mfs_readdir,    /* readdir */
+    mfs_lookup,     /* lookup */
+    nullptr,        /* create - not impelemented */
+    nullptr,        /* remove - not impelemented */
+    nullptr,        /* rename - not impelemented */
+    nullptr,        /* mkdir - not impelemented */
+    nullptr,        /* rmdir - not impelemented */
+    mfs_getattr,    /* getattr */
+    nullptr,        /* setattr - not impelemented */
+    mfs_inactive,   /* inactive */
+    mfs_truncate,   /* truncate */
+    mfs_link,       /* link */
+    mfs_arc,        /* arc */
+    mfs_fallocate,  /* fallocate */
+    mfs_readlink,   /* read link */
+    mfs_symlink     /* symbolic link */
+};
diff --git a/fs/mfs2/mfs2.hh b/fs/mfs2/mfs2.hh
new file mode 100644
index 0000000..27a1df8
--- /dev/null
+++ b/fs/mfs2/mfs2.hh
@@ -0,0 +1,75 @@
+#ifndef __INCLUDE_MFS2_H__
+#define __INCLUDE_MFS2_H__
+
+#include <osv/vnode.h>
+#include <osv/mount.h>
+#include <osv/dentry.h>
+#include <osv/prex.h>
+#include <osv/buf.h>
+
+#define MFS2_VERSION            1
+#define MFS2_MAGIC              0xDEADBEAD
+#define MFS2_ROOT_INODE_NUMBER  1
+
+#define MFS_INODE_SIZE ((uint64_t)sizeof(struct mfs2_inode))
+
+#define MFS2_SUPERBLOCK_SIZE sizeof(struct mfs2_super_block)
+#define MFS2_SUPERBLOCK_BLOCK 0
+
+//#if 0
+//#define print(...) kprintf(__VA_ARGS__)
+//#else
+#define print(...)
+//#endif
+
+//#if 0
+#define MFS2_STOPWATCH_START auto begin = 
std::chrono::high_resolution_clock::now();
+#define MFS2_STOPWATCH_END(...) auto end = 
std::chrono::high_resolution_clock::now(); \
+std::chrono::duration<double> sec = end - begin; \
+print(__VA_ARGS__);
+#define MFS2_STOPWATCH_END2(total) auto end = 
std::chrono::high_resolution_clock::now(); \
+std::chrono::duration<double> sec = end - begin; \
+total += ((long)(sec.count() * 1000000));
+//#else
+//#define MFS2_STOPWATCH_START
+//#define MFS2_STOPWATCH_END(...)
+//#define MFS2_STOPWATCH_END2(...)
+//#endif
+
+extern struct vfsops mfs2_vfsops;
+extern struct vnops mfs2_vnops;
+
+struct mfs2_super_block {
+    uint64_t magic;
+    uint64_t version;
+    uint64_t block_size;
+    uint64_t structure_info_first_block;
+    uint64_t structure_info_blocks_count;
+    uint64_t directory_entries_count;
+    uint64_t symlinks_count;
+    uint64_t inodes_count;
+};
+
+struct mfs2_inode {
+    mode_t   mode;
+    uint64_t inode_no;
+    uint64_t data_offset;
+    union {
+        uint64_t file_size;
+        uint64_t dir_children_count;
+    };
+};
+
+struct mfs2_dir_entry {
+    char *filename;
+    uint64_t inode_no;
+};
+
+struct mfs2 {
+    struct mfs2_super_block *sb;
+    struct mfs2_dir_entry* dir_entries;
+    char **symlinks;
+    struct mfs2_inode* inodes;
+};
+
+#endif
diff --git a/fs/mfs2/mfs2_cache.cc b/fs/mfs2/mfs2_cache.cc
new file mode 100644
index 0000000..237c4be
--- /dev/null
+++ b/fs/mfs2/mfs2_cache.cc
@@ -0,0 +1,421 @@
+
+#include "mfs2.hh"
+#include <list>
+#include <unordered_map>
+#include <include/osv/uio.h>
+#include <osv/debug.h>
+#include <osv/sched.hh>
+
+/*
+ * From cache perspective let us divide each file into sequence of contiguous 
small (16K) or
+ * large (64K) segments. The files smaller or equal than small segment get 
loaded in one shot,
+ * others get loaded by small segment. If algorithm detects that files is 
being read sequentially
+ * the file get loaded by large segments.
+ */
+
+//
+//TODO These 4 values can be made configurable
+//TODO large segment will be removed as it is only used when sequential read 
logic is used
+#define SMALL_SEGMENT_SIZE_IN_BLOCKS 32  // 16K
+#define LARGE_SEGMENT_SIZE_IN_BLOCKS 128 // 64K
+
+#define SMALL_SEGMENT_INDEX(offset) (offset >> 14)
+#define LARGE_SEGMENT_INDEX(offset) (offset >> 16)
+
+//
+// This enables logic to reclaim existing segments (instead of allocating new 
ones)
+// whenn sequential reads are detected -> will be removed in the final patch 
(SEE below)
+static bool reclaiming_enabled = false;
+//
+//This enables logic to detect sequential reads however given it does not 
yield big performance
+//gains corresponding logic will be removed in final patch
+static bool sequential_enabled = false;
+
+extern std::atomic<long> mfs2_block_allocated;
+
+int mfs2_read_blocks(struct device *device, uint64_t starting_block, uint64_t 
blocks_count, void* buf);
+
+struct inode_cache {
+    std::unordered_map<uint64_t,struct inode_cache_segment*> 
small_segments_by_index;
+    std::unordered_map<uint64_t,struct inode_cache_segment*> 
large_segments_by_index;
+    struct mfs2_inode *inode;
+    struct mfs2_super_block *sb;
+    mutex cache_lock;
+};
+
+//TODO: Some fields and methods will be eliminated in final patch
+// as they are only useful for sequential read detection and reclaiming
+class inode_cache_segment {
+private:
+    struct inode_cache* cache;
+    void *data;
+    uint64_t starting_block; // This is relative to the 512-block of the inode 
itself
+    uint64_t block_count;
+    uint64_t hit_count;
+    uint64_t reading_count;
+    uint64_t last_block_hit; // This is relative to the 512-block of the inode 
itself
+    bool _is_writing;
+
+public:
+    mutex write_lock;
+
+    inode_cache_segment(struct inode_cache* _cache, uint64_t _starting_block, 
uint64_t _block_count) {
+        this->cache = _cache;
+        this->starting_block = _starting_block;
+        this->block_count = _block_count;
+        this->hit_count = 0;
+        this->reading_count = 1;  // New segment always will be read from
+        this->last_block_hit = 0;
+        this->_is_writing = true; // New segment always will be written to
+        this->data = malloc(_cache->sb->block_size * _block_count);
+        print("[mfs2] [%d] -> created new inode_cache_segment for i-node %d 
starting block %d of size %d blocks \n",
+              sched::thread::current()->id(), _cache->inode->inode_no, 
starting_block, block_count);
+
+        mfs2_block_allocated += block_count;
+    }
+
+    ~inode_cache_segment() {
+        free(this->data);
+    }
+
+    void reclaim(uint64_t _starting_block) {
+        this->starting_block = _starting_block;
+        this->hit_count = 0;
+        this->reading_count = 1;  // Reclaimed segment always will be read from
+        this->_is_writing = true; // Reclaimed segment always will be written 
to
+        this->last_block_hit = 0;
+    }
+
+    bool is_large() {
+        return this->block_count == LARGE_SEGMENT_SIZE_IN_BLOCKS;
+    }
+
+    bool is_adjacent(uint64_t offset) {
+        uint64_t block = offset >> 9;
+        return block == this->last_block_hit + 1;
+    }
+
+    uint64_t get_starting_block() {
+        return this->starting_block;
+    }
+
+    uint64_t length() {
+        return this->block_count * this->cache->sb->block_size;
+    }
+
+    void mark_hit() {
+        this->hit_count++;
+    }
+
+    bool can_be_reclaimed() {
+        return reclaiming_enabled && this->reading_count <= 0;
+    }
+
+    void mark_reading() {
+        this->reading_count++;
+    }
+
+    void mark_reading_done() {
+        this->reading_count--;
+    }
+
+    void mark_reading_done(uint64_t last_block) {
+        this->reading_count--;
+        this->last_block_hit = last_block;
+    }
+
+    bool is_writing() {
+        return this->_is_writing;
+    }
+
+    void mark_writing(bool _writing) {
+        this->_is_writing = _writing;
+    }
+
+    int read(struct uio *uio, uint64_t offset_in_segment, uint64_t 
bytes_to_read) {
+        print("[mfs2] [%d] -> inode_cache_segment::read() i-node: %d, starting 
block %d, reading [%d] bytes at segment offset [%d]\n",
+              sched::thread::current()->id(), cache->inode->inode_no, 
starting_block, bytes_to_read, offset_in_segment);
+        return uiomove(data + offset_in_segment, bytes_to_read, uio);
+    }
+
+    int write(struct device *device) {
+        auto block = cache->inode->data_offset + starting_block;
+        auto bytes_remaining = cache->inode->file_size - starting_block * 
cache->sb->block_size;
+        auto blocks_remaining = bytes_remaining / cache->sb->block_size;
+        if(bytes_remaining % cache->sb->block_size > 0 ) {
+            blocks_remaining++;
+        }
+        auto block_count_to_read = std::min(block_count,blocks_remaining);
+        print("[mfs2] [%d] -> inode_cache_segment::write() i-node: %d, 
starting block %d, reading [%d] blocks at disk offset [%d]\n",
+              sched::thread::current()->id(), cache->inode->inode_no, 
starting_block, block_count_to_read, block);
+        return mfs2_read_blocks(device, block, block_count_to_read, data);
+    }
+};
+
+enum CacheOperationType {
+    READ = 1,
+    WRITE
+};
+
+// This represents an operation/transaction to read data from segment memory 
or/and from disk
+struct cache_segment_op {
+    struct inode_cache_segment* segment;
+    CacheOperationType type;
+    uint64_t segment_offset;
+    uint64_t bytes_to_read;
+    uint64_t last_block_to_read;
+
+    cache_segment_op(inode_cache_segment* _segment, uint64_t file_offset, 
uint64_t _bytes_to_read) {
+        this->segment = _segment;
+        if(_segment->is_writing()) {
+            this->type = CacheOperationType::WRITE;
+        }
+        else {
+            this->type = CacheOperationType::READ;
+        }
+        this->segment_offset = file_offset % segment->length();
+        this->bytes_to_read = std::min(segment->length() - segment_offset, 
_bytes_to_read);
+        this->last_block_to_read = (file_offset + this->bytes_to_read - 1) >> 
9;
+    }
+};
+
+static std::unordered_map<uint64_t,struct inode_cache*> inode_cache_by_id;
+static mutex inode_cache_lock;
+
+static inode_cache_segment* find_previous_segment_if_sequential_read(struct 
inode_cache* cache, uint64_t offset) {
+    if(!sequential_enabled) {
+        return nullptr;
+    }
+
+    // a) Is the read at the end of previous large or small segment X AND
+    // b) Is last hit in segment X adjacent with THIS read
+    auto previous_small_segment = 
cache->small_segments_by_index.find(SMALL_SEGMENT_INDEX(offset) - 1);
+    if( previous_small_segment != cache->small_segments_by_index.end() && 
previous_small_segment->second->is_adjacent(offset)) {
+        return previous_small_segment->second;
+    }
+    else {
+        auto previous_large_segment = 
cache->large_segments_by_index.find(LARGE_SEGMENT_INDEX(offset) - 1);
+        if( previous_large_segment != cache->large_segments_by_index.end() && 
previous_large_segment->second->is_adjacent(offset)) {
+            return previous_large_segment->second;
+        }
+        else {
+            return nullptr;
+        }
+    }
+}
+
+static struct inode_cache* mfs2_cache_get_or_create_inode_cache(struct 
mfs2_inode *inode, struct mfs2_super_block *sb) {
+    WITH_LOCK(inode_cache_lock) {
+        auto cache_entry = inode_cache_by_id.find(inode->inode_no);
+        if(cache_entry == inode_cache_by_id.end()) {
+            struct inode_cache* new_cache = new inode_cache();
+            new_cache->inode = inode;
+            new_cache->sb = sb;
+            inode_cache_by_id.emplace(inode->inode_no, new_cache);
+            return new_cache;
+        }
+        else {
+            return cache_entry->second;
+        }
+    }
+}
+
+//
+// TODO: Greatly simplify this logic as most of it deals with sequential read 
detection
+// and segment reclaiming
+static std::vector<struct cache_segment_op> mfs2_plan_cache_operations(struct 
inode_cache *cache, struct uio *uio) {
+
+    std::vector<struct cache_segment_op> ops;
+
+    WITH_LOCK(cache->cache_lock) {
+        //
+        // Check if file is small enough to fit into small segment
+        if(cache->small_segments_by_index.empty() &&
+                cache->inode->file_size <= (SMALL_SEGMENT_SIZE_IN_BLOCKS * 
cache->sb->block_size)) {
+            auto block_count = cache->inode->file_size / cache->sb->block_size;
+            if(cache->inode->file_size % cache->sb->block_size > 0) {
+                block_count++;
+            }
+            auto new_small_segment = new 
inode_cache_segment(cache,0,block_count);
+            cache->small_segments_by_index.emplace(0,new_small_segment);
+            
ops.push_back(cache_segment_op(new_small_segment,uio->uio_offset,uio->uio_resid));
+            print("[mfs2] [%d] -> mfs2_cache_get_segment_operations i-node: 
%d, read FULL file\n",
+                  sched::thread::current()->id(), cache->inode->inode_no);
+            return ops;
+        }
+        //
+        // Bigger continue
+        uint64_t file_offset = uio->uio_offset;
+        uint64_t bytes_to_read = uio->uio_resid;
+        while(bytes_to_read > 0) {
+            //
+            // First try to see if any large segment is hit
+            auto large_segment_index = LARGE_SEGMENT_INDEX(file_offset);
+            auto large_segment = 
cache->large_segments_by_index.find(large_segment_index);
+            if(large_segment != cache->large_segments_by_index.end()) {
+                print("[mfs2] [%d] -> mfs2_cache_get_segment_operations 
i-node: %d, large segment %d HIT at file offset %d\n",
+                      sched::thread::current()->id(), cache->inode->inode_no, 
large_segment_index, file_offset);
+                large_segment->second->mark_reading();
+                large_segment->second->mark_hit();
+                auto op = 
cache_segment_op(large_segment->second,file_offset,bytes_to_read);
+                file_offset += op.bytes_to_read;
+                bytes_to_read -= op.bytes_to_read;
+                ops.push_back(op);
+            }
+            //
+            // Next try to see if any small segment is hit
+            else {
+                auto small_segment_index = SMALL_SEGMENT_INDEX(file_offset);
+                auto small_segment = 
cache->small_segments_by_index.find(small_segment_index);
+                if(small_segment != cache->small_segments_by_index.end()) {
+                    print("[mfs2] [%d] -> mfs2_cache_get_segment_operations 
i-node: %d, small segment %d HIT at file offset %d\n",
+                          sched::thread::current()->id(), 
cache->inode->inode_no, small_segment_index, file_offset);
+                    small_segment->second->mark_reading();
+                    small_segment->second->mark_hit();
+                    auto op = 
cache_segment_op(small_segment->second,file_offset,bytes_to_read);
+                    file_offset += op.bytes_to_read;
+                    bytes_to_read -= op.bytes_to_read;
+                    ops.push_back(op);
+
+                }
+                // No hit at all -> see if sequential read
+                else {
+                    auto previous_segment_if_sequential = 
find_previous_segment_if_sequential_read(cache,file_offset);
+                    if(!previous_segment_if_sequential) {
+                        //SMALL
+                        print("[mfs2] [%d] -> 
mfs2_cache_get_segment_operations i-node: %d, small segment %d MISS at file 
offset %d\n",
+                              sched::thread::current()->id(), 
cache->inode->inode_no, small_segment_index, file_offset);
+                        uint64_t segment_starting_block = small_segment_index 
* SMALL_SEGMENT_SIZE_IN_BLOCKS;
+                        //TODO: Either allocate new or reclaim existing small 
one
+                        auto new_small_segment = new 
inode_cache_segment(cache,segment_starting_block,SMALL_SEGMENT_SIZE_IN_BLOCKS);
+                        
cache->small_segments_by_index.emplace(small_segment_index,new_small_segment);
+                        auto op = 
cache_segment_op(new_small_segment,file_offset,bytes_to_read);
+                        file_offset += op.bytes_to_read;;
+                        bytes_to_read -= op.bytes_to_read;
+                        ops.push_back(op);
+                    }
+                    else {
+                        //
+                        // Check small ones overlap with new potential big
+                        bool overlaps = false; //TODO
+                        if( !overlaps) {
+                            //BIG
+                            print("[mfs2] [%d] -> 
mfs2_cache_get_segment_operations i-node: %d, large segment %d SEQUENTIAL at 
file offset %d\n",
+                                  sched::thread::current()->id(), 
cache->inode->inode_no, large_segment_index, file_offset);
+                            uint64_t segment_starting_block = 
large_segment_index * LARGE_SEGMENT_SIZE_IN_BLOCKS;
+                            inode_cache_segment* large_segment = 
previous_segment_if_sequential; // Assume we can reclaim and is large
+                            if(large_segment->is_large() && 
large_segment->can_be_reclaimed()) {
+                                
cache->large_segments_by_index.erase(large_segment->get_starting_block());
+                                large_segment->reclaim(segment_starting_block);
+                                print("[mfs2] [%d] -> 
mfs2_cache_get_segment_operations i-node: %d RECLAIMING\n",
+                                      sched::thread::current()->id(), 
cache->inode->inode_no);
+                            }
+                            else {
+                                large_segment = new 
inode_cache_segment(cache,segment_starting_block,LARGE_SEGMENT_SIZE_IN_BLOCKS);
+                            }
+                            
cache->large_segments_by_index.emplace(large_segment_index,large_segment);
+
+                            auto op = 
cache_segment_op(large_segment,file_offset,bytes_to_read);
+                            file_offset += op.bytes_to_read;;
+                            bytes_to_read -= op.bytes_to_read;
+                            ops.push_back(op);
+                        }
+                        else {
+                            print("[mfs2] [%d] -> 
mfs2_cache_get_segment_operations i-node: %d, small segment %d SEQUENTIAL at 
file offset %d\n",
+                                  sched::thread::current()->id(), 
cache->inode->inode_no, small_segment_index, file_offset);
+                            uint64_t segment_starting_block = 
small_segment_index * SMALL_SEGMENT_SIZE_IN_BLOCKS;
+                            //TODO: Either allocate new or reclaim existing 
small one
+                            auto new_small_segment = new 
inode_cache_segment(cache,segment_starting_block,SMALL_SEGMENT_SIZE_IN_BLOCKS);
+                            
cache->small_segments_by_index.emplace(small_segment_index,new_small_segment);
+                            auto op = 
cache_segment_op(new_small_segment,file_offset,bytes_to_read);
+                            file_offset += op.bytes_to_read;;
+                            bytes_to_read -= op.bytes_to_read;
+                            ops.push_back(op);
+                        }
+                    }
+                }
+            }
+        }
+    }
+
+    return ops;
+}
+
+int
+mfs2_cache_read(struct mfs2_inode *inode, struct device *device, struct 
mfs2_super_block *sb, struct uio *uio) {
+    //
+    // 1. Grab inode
+    struct inode_cache* cache = mfs2_cache_get_or_create_inode_cache(inode,sb);
+
+    //
+    // 2. Prepare list of cache operations (copy from memory
+    // or read from disk into cache memory and then copy into memory)
+    auto segment_ops = mfs2_plan_cache_operations(cache, uio);
+    print("[mfs2] [%d] mfs2_cache_read called for i-node [%d] at %d with %d 
ops\n",
+          sched::thread::current()->id(), inode->inode_no, uio->uio_offset, 
segment_ops.size());
+
+    int error = 0;
+
+    // 3. Iterate over the list of cache operation and either copy from memory
+    // or read from disk into cache memory and then copy into memory
+    std::vector<struct cache_segment_op>::iterator it;
+    for( it = segment_ops.begin(); it != segment_ops.end(); ++it) {
+        auto op = *it;
+        //
+        // 3a.
+        if(op.type == CacheOperationType::READ) {
+            // 3a.1 Simply memcpy data from segment to target buffer
+            error = op.segment->read(uio,op.segment_offset, op.bytes_to_read);
+            WITH_LOCK(cache->cache_lock) {
+                // 3a.2 WITH cache lock mark read complete (dec counter)
+                op.segment->mark_reading_done(op.last_block_to_read);
+            }
+        }
+        // 3b. Read from disk into segment missing in cache
+        else {
+            // 3b.1 WITH segment WRITE lock
+            WITH_LOCK(op.segment->write_lock) {
+                // 3b.2 WITH cache lock
+                auto write = false;
+                WITH_LOCK(cache->cache_lock) {
+                    // 3b.3 Check if write needs to be done; if not exit from 
WRITE lock
+                    write = op.segment->is_writing();
+                }
+                //
+                // 3b.4 if write needs to be done read from disk into cache 
segment
+                if( write ) {
+                    error = op.segment->write(device);
+                    WITH_LOCK(cache->cache_lock) {
+                        op.segment->mark_writing(error != 0);
+                        // 3b.5 WITH cache lock mark that segment has been 
written to and exit
+                    }
+                    if( error ) {
+                       print("!!!!! Error reading from disk\n");
+                    }
+                }
+            }
+            //
+            // 3b.6 Simply memcpy data from segment to target buffer
+            if( !error) {
+                error = op.segment->read(uio, op.segment_offset, 
op.bytes_to_read);
+                WITH_LOCK(cache->cache_lock) {
+                    // 3b.7 WITH cache lock mark read complete (dec counter)
+                    op.segment->mark_reading_done(op.last_block_to_read);
+                }
+            }
+        }
+
+        if(error) {
+            break;
+        }
+    }
+
+    WITH_LOCK(cache->cache_lock) {
+        for( ; it != segment_ops.end(); ++it) {
+            it->segment->mark_reading_done();
+        }
+    }
+
+    print("[mfs2] [%d] mfs2_cache_read completed for i-node [%d]\n", 
sched::thread::current()->id(), inode->inode_no);
+    return error;
+}
diff --git a/fs/mfs2/mfs2_vfsops.cc b/fs/mfs2/mfs2_vfsops.cc
new file mode 100644
index 0000000..e63ad1e
--- /dev/null
+++ b/fs/mfs2/mfs2_vfsops.cc
@@ -0,0 +1,231 @@
+
+#include "mfs2.hh"
+#include <stdio.h>
+#include <sys/types.h>
+#include <osv/device.h>
+#include <osv/debug.h>
+#include <osv/bio.h>
+
+static int mfs2_mount(struct mount *mp, const char *dev, int flags, const void 
*data);
+static int mfs2_sync(struct mount *mp);
+static int mfs2_statfs(struct mount *mp, struct statfs *statp);
+static int mfs2_unmount(struct mount *mp, int flags);
+
+int mfs2_read_blocks(struct device *device, uint64_t starting_block, uint64_t 
blocks_count, void* buf);
+void mfs2_set_vnode(struct vnode* vnode, struct mfs2_inode *inode);
+
+static std::atomic<long> mfs2_block_read_ms(0);
+static std::atomic<long> mfs2_block_read_count(0);
+std::atomic<long> mfs2_block_allocated(0);
+
+static std::list<uint64_t> blocks_read;
+
+static mutex blocks_read_lock;
+
+struct vfsops mfs2_vfsops = {
+        mfs2_mount,                        /* mount */
+        mfs2_unmount,                  /* unmount */
+        mfs2_sync,                         /* sync */
+        ((vfsop_vget_t)vfs_nullop), /* vget */
+        mfs2_statfs,                   /* statfs */
+        &mfs2_vnops,                   /* vnops */
+};
+
+static int
+mfs2_mount(struct mount *mp, const char *dev, int flags, const void *data) {
+    struct device *device;
+    struct mfs2 *mfs = nullptr;
+    struct mfs2_super_block *sb = nullptr;
+    int error = -1;
+
+    error = device_open(dev + 5, DO_RDWR, &device);
+    if (error) {
+        kprintf("[mfs2] Error opening device!\n");
+        return error;
+    }
+
+    void* buf = malloc(BSIZE); //Just enough for single block of 512 bytes
+    error = mfs2_read_blocks(device, MFS2_SUPERBLOCK_BLOCK, 1, buf);
+    if (error) {
+        kprintf("[mfs2] Error reading mfs superblock\n");
+        device_close(device);
+        free(buf);
+        return error;
+    }
+
+    // We see if the file system is MFS, if not, return error and close 
everything
+    sb = (struct mfs2_super_block*)buf;
+    if (sb->magic != MFS2_MAGIC) {
+        print("[mfs2] Error magics do not match!\n");
+        print("[mfs2] Expecting %016llX but got %016llX\n", MFS2_MAGIC, 
sb->magic);
+        free(buf);
+        device_close(device);
+        return -1; // TODO: Proper error code
+    }
+
+    if (sb->version != MFS2_VERSION) {
+        kprintf("[mfs2] Found mfs volume but incompatible version!\n");
+        kprintf("[mfs2] Expecting %llu but found %llu\n", MFS2_VERSION, 
sb->version);
+        free(buf);
+        device_close(device);
+        return -1;
+    }
+
+    print("[mfs2] Got superblock version:   0x%016llX\n", sb->version);
+    print("[mfs2] Got magic:                0x%016llX\n", sb->magic);
+    print("[mfs2] Got block size:                  %d\n", sb->block_size);
+    print("[mfs2] Got structure info first block:  %d\n", 
sb->structure_info_first_block);
+    print("[mfs2] Got structure info blocks count: %d\n", 
sb->structure_info_blocks_count);
+    print("[mfs2] Got directory entries count:     %d\n", 
sb->directory_entries_count);
+    print("[mfs2] Got symlinks count:              %d\n", sb->symlinks_count);
+    print("[mfs2] Got inode count:                 %d\n", sb->inodes_count);
+
+    // Since we have found MFS, we can copy the superblock now
+    sb = new mfs2_super_block;
+    memcpy(sb, buf, MFS2_SUPERBLOCK_SIZE);
+    free(buf);
+
+    //TODO: Read structure_info_blocks_count to construct array of directory 
enries, symlinks and i-nodes
+    buf = malloc(BSIZE * sb->structure_info_blocks_count);
+    error = mfs2_read_blocks(device, sb->structure_info_first_block, 
sb->structure_info_blocks_count, buf);
+    if (error) {
+        kprintf("[mfs2] Error reading mfs structure info blocks\n");
+        device_close(device);
+        free(buf);
+        return error;
+    }
+
+    mfs = new struct mfs2;
+    mfs->sb = sb;
+    mfs->dir_entries = (struct mfs2_dir_entry *)malloc(sizeof(struct 
mfs2_dir_entry) * sb->directory_entries_count);
+
+    void* data_ptr = buf;
+    //
+    // Read directory entries
+    for(unsigned int idx = 0; idx < sb->directory_entries_count; idx++) {
+        struct mfs2_dir_entry* dir_entry = &(mfs->dir_entries[idx]);
+        dir_entry->inode_no = *((uint64_t*) data_ptr);
+        data_ptr += sizeof(uint64_t);
+
+        unsigned short* filename_size = (unsigned short *)data_ptr;
+        data_ptr += sizeof(unsigned short);
+
+        dir_entry->filename = (char*)malloc(*filename_size + 1);
+        strncpy(dir_entry->filename,(char*)data_ptr,*filename_size);
+        dir_entry->filename[*filename_size] = 0;
+        print("[mfs2] i-node: %d -> directory entry: %s\n", 
dir_entry->inode_no, dir_entry->filename);
+        data_ptr += *filename_size * sizeof(char);
+    }
+    //
+    // Read symbolic links
+    mfs->symlinks = (char **)malloc(sizeof(char *) * sb->symlinks_count);
+
+    for(unsigned int idx = 0; idx < sb->symlinks_count; idx++) {
+        unsigned short* symlink_path_size = (unsigned short *)data_ptr;
+        data_ptr += sizeof(unsigned short);
+
+        mfs->symlinks[idx] = (char*)malloc(*symlink_path_size + 1);
+        strncpy(mfs->symlinks[idx],(char*)data_ptr,*symlink_path_size);
+        mfs->symlinks[idx][*symlink_path_size] = 0;
+        print("[mfs2] symlink: %s\n", mfs->symlinks[idx]);
+        data_ptr += *symlink_path_size * sizeof(char);
+    }
+    //
+    // Read i-nodes
+    mfs->inodes = (struct mfs2_inode *)malloc(sizeof(struct mfs2_inode) * 
sb->inodes_count);
+    memcpy(mfs->inodes, data_ptr, sb->inodes_count * sizeof(struct 
mfs2_inode));
+
+    /*
+    for(unsigned int idx = 0; idx < sb->inodes_count; idx++) {
+        print("[mfs2] inode: %d\n", mfs->inodes[idx].inode_no);
+    }*/
+
+    free(buf);
+
+    // Save a reference to our superblock
+    mp->m_data = mfs;
+    mp->m_dev = device;
+
+    mfs2_set_vnode(mp->m_root->d_vnode, mfs->inodes);
+
+    print("[mfs2] returning from mount\n");
+
+    return 0;
+}
+
+static int mfs2_sync(struct mount *mp) {
+    return 0;
+}
+
+static int mfs2_statfs(struct mount *mp, struct statfs *statp) {
+    struct mfs2             *mfs = (struct mfs2*)mp->m_data;
+    struct mfs2_super_block *sb  = mfs->sb;
+
+    statp->f_bsize = sb->block_size;
+
+    // Total blocks, unknown...
+    statp->f_blocks = sb->structure_info_blocks_count + 
sb->structure_info_first_block; //FIXME - this is calculatable
+    // Read only. 0 blocks free
+    statp->f_bfree = 0;
+    statp->f_bavail = 0;
+
+    statp->f_ffree = 0;
+    statp->f_files = sb->inodes_count; //Needs to be inode count
+
+    statp->f_namelen = 0; //FIXME - unlimited MFS_FILENAME_MAXLEN;
+
+    return 0;
+}
+
+static int
+mfs2_unmount(struct mount *mp, int flags) {
+    struct mfs2             *mfs   = (struct mfs2*)mp->m_data;
+    struct mfs2_super_block *sb    = mfs->sb;
+    struct device           *dev   = mp->m_dev;
+
+    device_close(dev);
+    delete sb;
+    delete mfs;
+
+    std::cout << "Spent " << mfs2_block_read_ms.load() / 1000 << " ms reading 
blocks" << std::endl;
+    std::cout << "Read " << mfs2_block_read_count.load() << " blocks" << 
std::endl;
+    std::cout << "Allocated " << mfs2_block_allocated.load() << " 512-byte 
blocks of cache memory" << std::endl;
+
+    return 0;
+}
+
+int
+mfs2_read_blocks(struct device *device, uint64_t starting_block, uint64_t 
blocks_count, void* buf) {
+    MFS2_STOPWATCH_START
+    struct bio *bio;
+    int    error  = -1;
+
+    bio = alloc_bio();
+    if (!bio)
+        return ENOMEM;
+
+    bio->bio_cmd = BIO_READ;
+    bio->bio_dev = device;
+    bio->bio_data = buf;
+    bio->bio_offset = starting_block << 9;
+    bio->bio_bcount = blocks_count * BSIZE;
+
+    //auto begin = std::chrono::high_resolution_clock::now();
+    //print("[mfs] mfs2_read_blocks called of %d blocks: before strategy %d -> 
%d\n", blocks_count, bio->bio_offset, bio->bio_bcount);
+    bio->bio_dev->driver->devops->strategy(bio);
+    error = bio_wait(bio);
+    //print("[mfs] mfs2_read_blocks after strategy\n");
+    //std::chrono::duration<double> sec = 
std::chrono::high_resolution_clock::now() - begin;
+    //print("[mfs2] Took %.2fms\n", 1000 * sec.count());
+
+    destroy_bio(bio);
+    //print("[mfs2] mfs2_read_blocks read %d block at %d offset\n", 
blocks_count, starting_block);
+    WITH_LOCK(blocks_read_lock) {
+        blocks_read.push_back(starting_block);
+    }
+
+    mfs2_block_read_count += blocks_count;
+    MFS2_STOPWATCH_END2(mfs2_block_read_ms)
+
+    return error;
+}
diff --git a/fs/mfs2/mfs2_vnops.cc b/fs/mfs2/mfs2_vnops.cc
new file mode 100644
index 0000000..83a5129
--- /dev/null
+++ b/fs/mfs2/mfs2_vnops.cc
@@ -0,0 +1,316 @@
+#include <sys/stat.h>
+#include <dirent.h>
+#include <sys/param.h>
+
+#include <errno.h>
+#include <string.h>
+#include <stdlib.h>
+#include <fcntl.h>
+
+#include <osv/prex.h>
+#include <osv/vnode.h>
+#include <osv/file.h>
+#include <osv/mount.h>
+#include <osv/debug.h>
+
+#include <sys/types.h>
+#include <osv/device.h>
+#include <osv/sched.hh>
+
+#include "mfs2.hh"
+
+void mfs2_set_vnode(struct vnode* vnode, struct mfs2_inode *inode);
+int mfs2_read_blocks(struct device *device, uint64_t starting_block, uint64_t 
blocks_count, void* buf);
+
+int
+mfs2_cache_read(struct mfs2_inode *inode, struct device *device, struct 
mfs2_super_block *sb, struct uio *uio);
+
+int mfs2_init(void) {
+    return 0;
+}
+
+static int mfs2_open(struct file *fp) {
+    if ((file_flags(fp) & FWRITE)) {
+        // We do not allow writing! jerks
+        return (EPERM);
+    }
+    print("[mfs2] mfs_open called\n");
+    return 0;
+}
+
+static int mfs2_close(struct vnode *vp, struct file *fp) {
+    print("[mfs2] mfs_close called\n");
+    // Nothing to do really...
+    return 0;
+}
+
+static int mfs2_readlink(struct vnode *vnode, struct uio *uio) {
+    print("[mfs2] mfs2_readlink called\n");
+    struct mfs2             *mfs    = (struct mfs2*) vnode->v_mount->m_data;
+    struct mfs2_inode       *inode  = (struct mfs2_inode*)vnode->v_data;
+
+    if (!S_ISLNK(inode->mode)) {
+        return ENOENT; //TODO: Is this correct?
+    }
+
+    int error = -1;
+    char *link_path = mfs->symlinks[inode->data_offset]; //TODO check if 
offset correct
+    error = uiomove(link_path, strlen(link_path) + 1, uio);
+    return error;
+}
+
+static int mfs2_read(struct vnode *vnode, struct file* fp, struct uio *uio, 
int ioflag) {
+    struct mfs2             *mfs    = (struct mfs2*) vnode->v_mount->m_data;
+    struct mfs2_super_block *sb     = mfs->sb;
+    struct mfs2_inode       *inode  = (struct mfs2_inode*)vnode->v_data;
+    struct device           *device = vnode->v_mount->m_dev;
+
+    //print("[mfs2] mfs2_read\n");
+
+    // Cant read directories
+    if (vnode->v_type == VDIR)
+        return EISDIR;
+    // Cant read anything but reg
+    if (vnode->v_type != VREG)
+        return EINVAL;
+    // Cant start reading before the first byte
+    if (uio->uio_offset < 0)
+        return EINVAL;
+    // Need to read more than 1 byte
+    if (uio->uio_resid == 0)
+        return 0;
+    // Cant read after the end of the file
+    if (uio->uio_offset >= (off_t)vnode->v_size)
+        return 0;
+
+    int      rv     =  0;
+    int      error  = -1;
+    uint64_t block  =  inode->data_offset;
+    uint64_t offset =  0;
+
+    // Total read amount is what they requested, or what is left
+    uint64_t read_amt = std::min<uint64_t>(inode->file_size - uio->uio_offset, 
uio->uio_resid);
+
+    // Calculate which block we need actually need to read
+    block += uio->uio_offset / sb->block_size;
+    offset = uio->uio_offset % sb->block_size;
+
+    uint64_t block_count = ( offset + read_amt ) / sb->block_size;
+    if( (offset + read_amt ) % sb->block_size > 0)
+        block_count++;
+
+    //mfs_block_read += block_count; //TODO: Fix later
+    void* buf = malloc(BSIZE * block_count);
+
+    print("[mfs2] mfs2_read [%d], inode: %d, [%d -> %d] at %d of %d bytes\n",
+          sched::thread::current()->id(), inode->inode_no, block, block_count, 
uio->uio_offset, read_amt );
+
+    error = mfs2_read_blocks(device, block, block_count, buf);
+
+    if (error) {
+        kprintf("[mfs2_read] Error reading data\n");
+        free(buf);
+        return error;
+    }
+
+    rv = uiomove(buf + offset, read_amt, uio);
+    //print("[mfs2] mfs2_read after uiomove\n");
+
+    free(buf);
+    return rv;
+}
+
+static int mfs2_read_with_cache(struct vnode *vnode, struct file* fp, struct 
uio *uio, int ioflag) {
+    struct mfs2 *mfs = (struct mfs2 *) vnode->v_mount->m_data;
+    struct mfs2_super_block *sb = mfs->sb;
+    struct mfs2_inode *inode = (struct mfs2_inode *) vnode->v_data;
+    struct device *device = vnode->v_mount->m_dev;
+
+    // Cant read directories
+    if (vnode->v_type == VDIR)
+        return EISDIR;
+    // Cant read anything but reg
+    if (vnode->v_type != VREG)
+        return EINVAL;
+    // Cant start reading before the first byte
+    if (uio->uio_offset < 0)
+        return EINVAL;
+    // Need to read more than 1 byte
+    if (uio->uio_resid == 0)
+        return 0;
+    // Cant read after the end of the file
+    if (uio->uio_offset >= (off_t)vnode->v_size)
+        return 0;
+
+    return mfs2_cache_read(inode,device,sb,uio);
+}
+
+static int mfs2_readdir(struct vnode *vnode, struct file *fp, struct dirent 
*dir) {
+    struct mfs2             *mfs     = (struct mfs2*)vnode->v_mount->m_data;
+    struct mfs2_inode       *inode   = (struct mfs2_inode*)vnode->v_data;
+
+    uint64_t index  =  0;
+
+    if (!S_ISDIR(inode->mode)) {
+        return ENOENT; //TODO: Is this correct?
+    }
+
+    if (fp->f_offset == 0) {
+        dir->d_type = DT_DIR;
+        strlcpy((char *)&dir->d_name, ".", sizeof(dir->d_name));
+    } else if (fp->f_offset == 1) {
+        dir->d_type = DT_DIR;
+        strlcpy((char *)&dir->d_name, "..", sizeof(dir->d_name));
+    } else {
+        index = fp->f_offset - 2;
+        if (index >= inode->dir_children_count) {
+            return ENOENT;
+        }
+
+        dir->d_fileno = fp->f_offset;
+
+        // Set the name
+        struct mfs2_dir_entry* directory_entry = mfs->dir_entries + 
(inode->data_offset + index);
+        strlcpy((char *)&dir->d_name, directory_entry->filename, 
sizeof(dir->d_name));
+        dir->d_ino = directory_entry->inode_no;
+
+        struct mfs2_inode *directory_entry_inode = mfs->inodes + (dir->d_ino 
-1);
+        // Do as much as possible before the read
+        if (S_ISDIR(directory_entry_inode->mode))
+            dir->d_type = DT_DIR;
+        else
+        if (S_ISLNK(directory_entry_inode->mode))
+            dir->d_type = DT_LNK;
+        else
+            dir->d_type = DT_REG;
+    }
+
+    fp->f_offset++;
+
+    return 0;
+}
+
+static int mfs2_lookup(struct vnode *vnode, char *name, struct vnode **vpp) {
+    struct mfs2             *mfs     = (struct mfs2*)vnode->v_mount->m_data;
+    struct mfs2_inode       *inode   = (struct mfs2_inode*)vnode->v_data;
+    struct vnode            *vp      = nullptr;
+
+    if (*name == '\0') {
+        return ENOENT;
+    }
+
+    print("[mfs2] looking up %s at inode %d\n", name, inode->inode_no );
+
+    if (!S_ISDIR(inode->mode)) {
+        print("[mfs2] ABORTED lookup up %s at inode %d because not a 
directory\n", name, inode->inode_no );
+        return ENOENT; //TODO: Is this correct?
+    }
+
+    for( unsigned int idx = 0; idx < inode->dir_children_count; idx++) {
+        if (strcmp(name, mfs->dir_entries[inode->data_offset + idx].filename) 
== 0) {
+            int inode_no = mfs->dir_entries[inode->data_offset + idx].inode_no;
+
+            if (vget(vnode->v_mount, inode_no, &vp)) { //TODO: Will it ever 
work? Revisit
+                print("[mfs2] found vp in cache!\n");
+                // Found in cache?
+                *vpp = vp;
+                return 0;
+            }
+
+            struct mfs2_inode *found_inode = mfs->inodes + (inode_no -1); 
//Check if exists
+            mfs2_set_vnode(vp, found_inode);
+
+            print("[mfs2] found the directory entry [%s] at at inode %d -> 
%d!\n", name, inode->inode_no, found_inode->inode_no);
+
+            *vpp = vp;
+            return 0;
+        }
+    }
+
+    print("[mfs2] FAILED to find up %s\n", name );
+
+    return ENOENT;
+}
+
+static int mfs2_getattr(struct vnode *vnode, struct vattr *attr) {
+    struct mfs2_inode *inode = (struct mfs2_inode*)vnode->v_data;
+
+    // Doesn't seem to work, I think permissions are hard coded to 777
+    attr->va_mode = 00555;
+
+    if (S_ISDIR(inode->mode)) {
+        attr->va_type = VDIR;
+    } else {
+        attr->va_type = VREG;
+    }
+
+    attr->va_nodeid = vnode->v_ino;
+    attr->va_size = vnode->v_size;
+
+    return 0;
+}
+
+void mfs2_set_vnode(struct vnode* vnode, struct mfs2_inode *inode) {
+    off_t size = 0;
+    if (vnode == nullptr || inode == nullptr) {
+        return;
+    }
+
+    vnode->v_data = inode;
+    vnode->v_ino = inode->inode_no;
+
+    // Set type
+    if (S_ISDIR(inode->mode)) {
+        size = MFS_INODE_SIZE; //TODO: Revisit
+        vnode->v_type = VDIR;
+    } else if (S_ISREG(inode->mode)) {
+        size = inode->file_size;
+        vnode->v_type = VREG;
+    } else if (S_ISLNK(inode->mode)) {
+        size = 512; // TODO: Revisit
+        vnode->v_type = VLNK;
+    }
+
+    vnode->v_mode = inode->mode;
+    vnode->v_size = size;
+}
+
+#define mfs2_seek        ((vnop_seek_t)vop_nullop)
+#define mfs2_ioctl        ((vnop_ioctl_t)vop_nullop)
+#define mfs2_inactive    ((vnop_inactive_t)vop_nullop)
+#define mfs2_truncate    ((vnop_truncate_t)vop_nullop)
+#define mfs2_link         ((vnop_link_t)vop_nullop)
+#define mfs2_arc            ((vnop_cache_t) nullptr)
+#define mfs2_fallocate    ((vnop_fallocate_t)vop_nullop)
+#define mfs2_fsync        ((vnop_fsync_t)vop_nullop)
+#define mfs2_symlink        ((vnop_symlink_t)vop_nullop)
+
+struct vnops mfs2_vnops = {
+        mfs2_open,       /* open */
+        mfs2_close,      /* close */
+        //mfs2_read,       /* read */
+        mfs2_read_with_cache, /* read */
+        nullptr,           /* write - not impelemented */
+        mfs2_seek,       /* seek */
+        mfs2_ioctl,      /* ioctl */
+        mfs2_fsync,      /* fsync */
+        mfs2_readdir,    /* readdir */
+        mfs2_lookup,     /* lookup */
+        nullptr,        /* create - not impelemented */
+        nullptr,        /* remove - not impelemented */
+        nullptr,        /* rename - not impelemented */
+        nullptr,        /* mkdir - not impelemented */
+        nullptr,        /* rmdir - not impelemented */
+        mfs2_getattr,    /* getattr */
+        nullptr,        /* setattr - not impelemented */
+        mfs2_inactive,   /* inactive */
+        mfs2_truncate,   /* truncate */
+        mfs2_link,       /* link */
+        mfs2_arc,        /* arc */
+        mfs2_fallocate,  /* fallocate */
+        mfs2_readlink,   /* read link */
+        mfs2_symlink     /* symbolic link */
+};
+ 
+vnop_read_t read_fn_1 = mfs2_read;
+vnop_read_t read_fn_2 = mfs2_read_with_cache;
diff --git a/fs/vfs/main.cc b/fs/vfs/main.cc
index 939adac..0b78462 100644
--- a/fs/vfs/main.cc
+++ b/fs/vfs/main.cc
@@ -2249,6 +2249,7 @@ static void import_extra_zfs_pools(void)
 extern "C" void mount_zfs_rootfs(bool pivot_root)
 {
     int ret;
+    bool is_zfs = false;
 
     if (mkdir("/zfs", 0755) < 0)
         kprintf("failed to create /zfs, error = %s\n", strerror(errno));
@@ -2257,7 +2258,17 @@ extern "C" void mount_zfs_rootfs(bool pivot_root)
     if (ret)
         kprintf("failed to unmount /dev, error = %s\n", strerror(ret));
 
-    ret = sys_mount("/dev/vblk0.1", "/zfs", "zfs", 0, (void *)"osv/zfs");
+    // We attempt to mount as MFS, if that returns an error code we try ZFS.
+    // We still try to mount on /zfs so there is only one mount point
+    ret = sys_mount("/dev/vblk0.1", "/zfs", "mfs", 0, 0);
+    if (ret) {
+        ret = sys_mount("/dev/vblk0.1", "/zfs", "mfs2", 0, 0);
+        if(ret) {
+            ret = sys_mount("/dev/vblk0.1", "/zfs", "zfs", 0, (void 
*)"osv/zfs");
+            is_zfs = true;
+        }
+    }
+
     if (ret)
         kprintf("failed to mount /zfs, error = %s\n", strerror(ret));
 
@@ -2291,8 +2302,9 @@ extern "C" void mount_zfs_rootfs(bool pivot_root)
         }
     }
     endmntent(ent);
-
-    import_extra_zfs_pools();
+    if (is_zfs) {
+        import_extra_zfs_pools();
+    }
 }
 
 extern "C" void unmount_rootfs(void)
diff --git a/fs/vfs/vfs_conf.cc b/fs/vfs/vfs_conf.cc
index 59e1b79..ca51191 100644
--- a/fs/vfs/vfs_conf.cc
+++ b/fs/vfs/vfs_conf.cc
@@ -46,12 +46,16 @@
 #include "vfs.h"
 
 extern struct vfsops ramfs_vfsops;
+extern struct vfsops mfs_vfsops;
+extern struct vfsops mfs2_vfsops;
 extern struct vfsops devfs_vfsops;
 extern struct vfsops nfs_vfsops;
 extern struct vfsops procfs_vfsops;
 extern struct vfsops zfs_vfsops;
 
 extern int ramfs_init(void);
+extern int mfs_init(void);
+extern int mfs2_init(void);
 extern int devfs_init(void);
 extern int nfs_init(void);
 extern int procfs_init(void);
@@ -66,5 +70,7 @@ const struct vfssw vfssw[] = {
        {"nfs",         nfs_init,       &nfs_vfsops},
        {"procfs",      procfs_init,    &procfs_vfsops},
        {"zfs",         zfs_init,       &zfs_vfsops},
+    {"mfs",    mfs_init,       &mfs_vfsops},
+       {"mfs2",        mfs2_init,      &mfs2_vfsops},
        {nullptr,       fs_noop,        nullptr},
 };
diff --git a/fs/vfs/vfs_fops.cc b/fs/vfs/vfs_fops.cc
index 3a8f98b..6f6d0ae 100644
--- a/fs/vfs/vfs_fops.cc
+++ b/fs/vfs/vfs_fops.cc
@@ -180,6 +180,7 @@ int vfs_file::get_arcbuf(void* key, off_t offset)
 
 std::unique_ptr<mmu::file_vma> vfs_file::mmap(addr_range range, unsigned 
flags, unsigned perm, off_t offset)
 {
+       //kprintf("Mapping file ...\n");
        auto fp = this;
        struct vnode *vp = fp->f_dentry->d_vnode;
        if (!vp->v_op->vop_cache || (vp->v_size < (off_t)mmu::page_size)) {
diff --git a/include/osv/shutdown.hh b/include/osv/shutdown.hh
index 0b25a34..a8a9aea 100644
--- a/include/osv/shutdown.hh
+++ b/include/osv/shutdown.hh
@@ -15,6 +15,9 @@ namespace osv {
 * Unmounts file systems.
 */
 void shutdown() __attribute__((noreturn));
+void mark_app_begins();
+void mark_pre_vfs_mount();
+
 
 }
 
diff --git a/licenses/mfs.txt b/licenses/mfs.txt
new file mode 100644
index 0000000..e38f597
--- /dev/null
+++ b/licenses/mfs.txt
@@ -0,0 +1,45 @@
+Copyright (c) 2015 Carnegie Mellon University.
+All Rights Reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+    
+    1. Redistributions of source code must retain the above copyright notice,
+       this list of conditions and the following acknowledgments and
+       disclaimers.
+
+    2. Redistributions in binary form must reproduce the above copyright
+       notice, this list of conditions and the following acknowledgments and
+       disclaimers in the documentation and/or other materials provided with
+       the distribution.
+
+    3. Products derived from this software may not include “Carnegie Mellon
+       University,” "SEI” and/or “Software Engineering Institute" in the name
+       of such derived product, nor shall “Carnegie Mellon University,” "SEI”
+       and/or “Software Engineering Institute" be used to endorse or promote
+       products derived from this software without prior written permission.
+       For written permission, please contact permi...@sei.cmu.edu.
+
+ACKNOWLEDGMENTS AND DISCLAIMERS:
+Copyright 2015 Carnegie Mellon University
+
+This material is based upon work funded and supported by the Department of
+Defense under Contract No. FA8721-05-C-0003 with Carnegie Mellon University
+for the operation of the Software Engineering Institute, a federally funded
+research and development center.
+
+Any opinions, findings and conclusions or recommendations expressed in this
+material are those of the author(s) and do not necessarily reflect the views of
+the United States Department of Defense.
+
+NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE
+MATERIAL IS FURNISHED ON AN “AS-IS” BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO
+WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER
+INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR
+MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL.
+CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT
+TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT.
+
+This material has been approved for public release and unlimited distribution.
+
+DM-0002621
\ No newline at end of file
diff --git a/loader.cc b/loader.cc
index c3a7256..7bf5951 100644
--- a/loader.cc
+++ b/loader.cc
@@ -334,6 +334,7 @@ void* do_main_thread(void *_main_args)
     }
     boot_time.event("drivers loaded");
 
+    osv::mark_pre_vfs_mount();
     if (opt_mount) {
         zfsdev::zfsdev_init();
         mount_zfs_rootfs(opt_pivot);
@@ -407,6 +408,7 @@ void* do_main_thread(void *_main_args)
     }
 
     boot_time.event("Total time");
+    osv::mark_app_begins();
 
     if (opt_bootchart) {
         boot_time.print_chart();
diff --git a/modules/java-base/java.cc b/modules/java-base/java.cc
index 74e0d86..ab647c8 100644
--- a/modules/java-base/java.cc
+++ b/modules/java-base/java.cc
@@ -263,6 +263,7 @@ static int java_main(int argc, char **argv)
 extern "C"
 int main(int argc, char **argv)
 {
+    auto begin = std::chrono::high_resolution_clock::now();
     int res = 0;
     std::thread t([&res](int argc, char **argv) {
         res = java_main(argc, argv);
@@ -278,5 +279,8 @@ int main(int argc, char **argv)
     while(!osv::application::unsafe_stop_and_abandon_other_threads()) {
         usleep(100000);
     }
+    auto end = std::chrono::high_resolution_clock::now();
+    std::chrono::duration<double> sec = end - begin;
+    printf("java main took %.2fms\n", 1000 * sec.count());
     return res;
 }
diff --git a/scripts/build b/scripts/build
index f7cd0ee..bc4293b 100755
--- a/scripts/build
+++ b/scripts/build
@@ -135,8 +135,12 @@ fs_type=${vars[fs]-zfs}
 usrskel_arg=
 case $fs_type in
 zfs)   ;; # Nothing to change here. This is our default behavior
+mfs)   ;; # Nothing to change here. This is our default behavior
+mfs2)  ;; # Nothing to change here. This is our default behavior
 ramfs) manifest=$OUT/usr.manifest
-       usrskel_arg="--usrskel usr_nozfs.manifest.skel";;
+       usrskel_arg="--usrskel usr_nozfs.manifest.skel"
+       rm -f $OUT/arch/$arch/boot*
+       rm -f $OUT/boot.bin;;
 *)     echo "Unknown filesystem \"$fs_type\"" >&2
        exit 2
 esac
@@ -170,8 +174,9 @@ then
 fi
 
 loader_size=`stat --printf %s $OUT/loader.img`
-zfs_start=$(($loader_size+2097151 & ~2097151))
-zfs_size=$(($fs_size - $zfs_start))
+kernel_end=$(($loader_size+2097151 & ~2097151))
+zfs_size=$(($fs_size - $kernel_end))
+
 
 # The python scripts called below assume the current directory is $OUT (as was
 # the case in our old build.mk).
@@ -180,7 +185,7 @@ cd $OUT
 case $fs_type in
 zfs)
        cp loader.img bare.raw
-       $SRC/scripts/imgedit.py setpartition "-f raw bare.raw" 2 $zfs_start 
$zfs_size
+       $SRC/scripts/imgedit.py setpartition "-f raw bare.raw" 2 $kernel_end 
$zfs_size
 
        qemu-img convert -f raw -O qcow2 bare.raw usr.img
        qemu-img resize usr.img ${fs_size}b >/dev/null 2>&1
@@ -193,6 +198,17 @@ zfs)
                $SRC/scripts/export_manifest.py -e "$export_dir" -m 
usr.manifest -D jdkbase=$jdkbase -D gccbase=$gccbase -D glibcbase=$glibcbase -D 
miscbase=$miscbase
        fi
        ;;
+mfs | mfs2)
+       rm -rf mfs.img 
+       $SRC/scripts/gen-${fs_type}-img.py -o mfs.img -m usr.manifest -D 
jdkbase=$jdkbase -D gccbase=$gccbase -D glibcbase=$glibcbase -D 
miscbase=$miscbase
+       mfs_size=`stat --printf %s mfs.img` 
+       img_size=$((kernel_end + mfs_size)) 
+       cp loader.img bare.raw
+       $SRC/scripts/imgedit.py setpartition "-f raw bare.raw" 2 $kernel_end 
$mfs_size 
+       qemu-img resize bare.raw ${img_size}b >/dev/null 2>&1 
+       dd if=mfs.img of=bare.raw obs=${kernel_end} seek=1 >/dev/null 2>&1 
+       qemu-img convert -f raw -O qcow2 bare.raw usr.img 
+       ;;
 ramfs)
        qemu-img convert -f raw -O qcow2 loader.img usr.img
        ;;
diff --git a/scripts/gen-mfs-img.py b/scripts/gen-mfs-img.py
new file mode 100755
index 0000000..0b19f1c
--- /dev/null
+++ b/scripts/gen-mfs-img.py
@@ -0,0 +1,293 @@
+#!/usr/bin/python
+
+import os, optparse, io
+from struct import *
+from ctypes import *
+from manifest_common import add_var, expand, unsymlink, read_manifest, 
defines, strip_file
+
+OSV_BLOCK_SIZE = 512
+
+DIR_MODE  = int('0x4000', 16)
+REG_MODE  = int('0x8000', 16)
+LINK_MODE = int('0xA000', 16)
+
+block = 0
+
+class SuperBlock(Structure):
+    _fields_ = [
+        ('magic', c_ulonglong),
+        ('version', c_ulonglong),
+        ('block_size', c_ulonglong),
+        ('inodes_block', c_ulonglong)
+    ]
+
+
+class Inode(Structure):
+    _fields_ = [
+        ('mode', c_ulonglong),
+        ('inode_no', c_ulonglong),
+        ('data_block_number', c_ulonglong),
+        ('count', c_ulonglong) # either file size or children count
+    ]
+
+class Record(Structure):
+    _fields_ = [
+        ('filename', c_char * 64),
+        ('inode_no', c_ulonglong)
+    ]
+
+class Link(Structure):
+    _fields_ = [
+        ('path', c_char * OSV_BLOCK_SIZE)
+    ]
+
+
+inodes = []
+inodes_count = 1
+
+def nextInode():
+    global inodes_count
+    global inodes
+
+    inode = Inode()
+    inode.inode_no = inodes_count
+    inodes_count += 1
+    inodes.append(inode)
+
+    return inode
+
+
+def pad(fp, size):
+    fp.write('\0' * size)
+    return size
+
+
+def write_initial_superblock(fp):
+    global block
+    print "block %d: Writing initial superblock" % block
+    pad(fp, OSV_BLOCK_SIZE) # superblock is empty at first
+    block += 1
+
+
+def writelink(fp, link):
+    global block
+    print "block %d: Writing link %s" % (block,link)
+    length = len(link)
+    l = Link()
+    l.path = link
+    fp.write(l)
+    # There is no need to pad because Link consumes exact 1 block
+    block += 1 # Max length is 512 - 1
+
+
+def writefile(fp, path, targetpath):
+    global block
+    print "block %d: Writing file %s" % (block,targetpath)
+
+    total = 0
+    last = 0
+
+    with open(path, 'rb') as f:
+        while True:
+            chunk = f.read(OSV_BLOCK_SIZE)
+            if chunk:
+                last = len(chunk)
+                total += last
+                block += 1
+                fp.write(chunk)
+            else:
+                break
+
+    pad(fp, OSV_BLOCK_SIZE - last)
+
+    return total
+
+
+def writeArray(fp, vals, size):
+    global block
+    c = 0
+    perBlock = OSV_BLOCK_SIZE / size
+    padding = OSV_BLOCK_SIZE - (perBlock * size)
+    blocksNeeded = (len(vals) / perBlock) + (1 if len(vals) % perBlock > 0 
else 0)
+
+    print "block %d: Writing array of %d elements using %d blocks" % 
(block,len(vals), blocksNeeded)
+
+    for v in vals:
+        fp.write(v)
+        c += 1
+        if c == perBlock:
+            c = 0
+            pad(fp, padding)
+
+    block += blocksNeeded
+
+    if c != 0:
+        pad(fp, OSV_BLOCK_SIZE - (c * size))
+
+
+def writedir(fp, manifest, dirpath):
+    global block
+    records = []
+    def nextRecord():
+        rec = Record()
+        records.append(rec)
+        return rec
+
+    print "Writing directory %s" % dirpath
+    for entry in manifest:
+        if len(entry) > 63:
+            continue
+
+        #print "--> Entry [%s]" % entry
+        val = manifest.get(entry)
+        if not type(val) is dict:
+            if os.path.islink(val):
+                #print "--> Skipping link %s" % val
+                continue
+
+        inode = nextInode()
+        rec = nextRecord()
+
+        rec.inode_no = inode.inode_no
+        rec.filename = entry
+
+        if type(val) is dict: # folder
+            inode.mode = DIR_MODE
+            count, block_no = writedir(fp, val, dirpath + '/' + entry)
+            inode.count = count
+            inode.data_block_number = block_no
+        else: # file
+            inode.data_block_number = block
+            if val.startswith('->'):
+                inode.mode = LINK_MODE
+                writelink(fp, val[2:])
+            else:
+                inode.mode = REG_MODE
+                inode.count = writefile(fp, val, entry)
+
+
+    block_no = block
+    writeArray(fp, records, sizeof(Record))
+    return (len(records), block_no)
+
+
+def writefs(fp, manifest):
+    global block
+    global inodes
+
+    root_inode = nextInode()
+    root_inode.mode = DIR_MODE
+    
+    count, block_no = writedir(fp, manifest.get(''), '')
+    root_inode.count = count
+    root_inode.data_block_number = block_no
+
+    # Write inodes!
+    block_no = block
+    print "Writing inodes"
+    for inode in inodes:
+        print "Inode_no %d, mode:%d, data_block_number:%d, count:%d" % 
(inode.inode_no, inode.mode, inode.data_block_number, inode.count)
+
+    writeArray(fp, inodes, sizeof(Inode))
+
+    return block_no
+
+
+def genImage(out, manifest):
+    print "Writing image"
+    fp = open(out, 'wb')
+
+    # write the initial superblock
+    write_initial_superblock(fp)
+
+    inodes_block = writefs(fp, manifest)
+
+    sb = SuperBlock()
+    sb.version = 1
+    sb.magic = int('0xDEADBEEF', 16)
+    sb.block_size = OSV_BLOCK_SIZE
+    sb.inodes_block = inodes_block
+
+    fp.seek(0)
+    fp.write(sb)
+
+    fp.close()
+
+
+def parseManifest(manifest):
+    manifest = [(x, y % defines) for (x, y) in manifest]
+    files = list(expand(manifest))
+    files = [(x, unsymlink(y)) for (x, y) in files]
+
+    file_dict = {}
+
+    for name, hostname in files:
+        print "^^ [%s] : [%s]" % (name, hostname)
+        if os.path.isdir(hostname):
+            print "Adding directory %s" % name
+            p = file_dict
+            if os.path.islink(hostname):
+                link = os.readlink(hostname)
+                print "--> Link [%s] -> [%s]" % (hostname,link)
+                dirname = os.path.dirname(name)
+                basename = os.path.basename(name)
+                for token in dirname.split('/'):
+                    p = p.setdefault(token, {})
+                p[basename] = "->/usr/lib/jvm/jdk-9.0.1-java-base" #HACK for 
now: "->%s" % name
+            else:
+                for token in name.split('/'):
+                    p = p.setdefault(token, {})
+        else:
+            if hostname.startswith('->'):
+                if len(hostname) - 2 > OSV_BLOCK_SIZE - 1:
+                    print "%s is too long, max length is %d" % (hostname[2:], 
OSV_BLOCK_SIZE - 1)
+                    continue
+            if hostname.endswith("-stripped.so"):
+                continue
+            hostname = strip_file(hostname)
+            #print "Adding %s" % name
+            dirname = os.path.dirname(name)
+            basename = os.path.basename(name)
+            p = file_dict
+            if dirname == '/':
+                p = p.setdefault('', {})
+            else:
+                for token in dirname.split('/'):
+                    p = p.setdefault(token, {})
+            p[basename] = hostname
+
+    return file_dict
+
+
+def main():
+    make_option = optparse.make_option
+
+    opt = optparse.OptionParser(option_list=[
+            make_option('-o',
+                        dest='output',
+                        help='write to FILE',
+                        metavar='FILE'),
+            make_option('-m',
+                        dest='manifest',
+                        help='read manifest from FILE',
+                        metavar='FILE'),
+            make_option('-D',
+                        type='string',
+                        help='define VAR=DATA',
+                        metavar='VAR=DATA',
+                        action='callback',
+                        callback=add_var),
+    ])
+
+    (options, args) = opt.parse_args()
+
+    manifest = read_manifest(options.manifest)
+
+    outfile = os.path.abspath(options.output)
+
+    manifest = parseManifest(manifest)
+
+    genImage(outfile, manifest)
+
+if __name__ == "__main__":
+    main()
diff --git a/scripts/gen-mfs2-img.py b/scripts/gen-mfs2-img.py
new file mode 100755
index 0000000..a463ff3
--- /dev/null
+++ b/scripts/gen-mfs2-img.py
@@ -0,0 +1,342 @@
+#!/usr/bin/python
+
+import os, optparse, io
+from struct import *
+from ctypes import *
+from manifest_common import add_var, expand, unsymlink, read_manifest, 
defines, strip_file
+
+OSV_BLOCK_SIZE = 512
+
+DIR_MODE  = int('0x4000', 16)
+REG_MODE  = int('0x8000', 16)
+LINK_MODE = int('0xA000', 16)
+
+block = 0
+
+class SuperBlock(Structure):
+    _fields_ = [
+        ('magic', c_ulonglong),
+        ('version', c_ulonglong),
+        ('block_size', c_ulonglong),
+        ('structure_info_first_block', c_ulonglong),
+        ('structure_info_blocks_count', c_ulonglong),
+        ('directory_entries_count', c_ulonglong),
+        ('symlinks_count', c_ulonglong),
+        ('inodes_count', c_ulonglong)
+    ]
+
+# data_offset and count represent different things depending on mode:
+# file - number of first block on disk and size in bytes (number of blocks can 
be deduced)
+# directory - index of the first entry in the directory entries array and 
number of entries
+# symlink - index of the entry in the symlink entries array and 1
+class Inode(Structure):
+    _fields_ = [
+        ('mode', c_ulonglong),
+        ('inode_no', c_ulonglong), #redundant
+        ('data_offset', c_ulonglong),
+        ('count', c_ulonglong) # either file size or children count
+    ]
+
+# Represents directory entry - file, subdirectory or symlink
+# It has a name and i-node number so to know what type of
+# entry it is one has to read the i-node
+# filename (length: unsigned short followed by characters)
+class DirectoryEntry(object):
+    def __init__(self,filename,inode_no):
+        self.filename = filename
+        self.inode_no = inode_no
+
+    def write(self,fp):
+        pos = fp.tell()
+        fp.write(c_ulonglong(self.inode_no))
+        fp.write(c_ushort(len(self.filename)))
+        for c in self.filename:
+            fp.write(c_char(c))
+
+        return fp.tell() - pos
+
+class SymbolicLink(object):
+    def __init__(self,path):
+        self.path = path
+
+    def write(self,fp):
+        pos = fp.tell()
+        fp.write(c_ushort(len(self.path)))
+        for c in self.path:
+            fp.write(c_char(c))
+
+        return fp.tell() - pos
+
+directory_entries = []
+directory_entries_count = 0
+
+symlinks = []
+symlinks_count = 0
+
+inodes = []
+inodes_count = 1
+
+def next_directory_entry(filename,inode_no):
+    global directory_entries
+    global directory_entries_count
+
+    directory_entry = DirectoryEntry(filename,inode_no)
+    directory_entries_count += 1
+    directory_entries.append(directory_entry)
+
+    return directory_entry
+
+def next_symlink(path):
+    global symlinks
+    global symlinks_count
+
+    symlink = SymbolicLink(path)
+    symlinks_count += 1
+    symlinks.append(symlink)
+
+    return symlink
+
+def next_inode():
+    global inodes_count
+    global inodes
+
+    inode = Inode()
+    inode.inode_no = inodes_count
+    inodes_count += 1
+    inodes.append(inode)
+
+    return inode
+
+def pad(fp, size):
+    fp.write('\0' * size)
+    return size
+
+def write_initial_superblock(fp):
+    global block
+    print "block %d: Writing initial superblock" % block
+    pad(fp, OSV_BLOCK_SIZE) # superblock is empty at first
+    block += 1
+
+def write_file(fp, path, target_path):
+    global block
+    print "block %d: Writing file %s" % (block,target_path)
+
+    total = 0
+    last = 0
+
+    with open(path, 'rb') as f:
+        while True:
+            chunk = f.read(OSV_BLOCK_SIZE)
+            if chunk:
+                last = len(chunk)
+                total += last
+                block += 1
+                fp.write(chunk)
+            else:
+                break
+
+    if total > 0:
+        pad(fp, OSV_BLOCK_SIZE - last)
+    else:
+        print "!!!Empty file!!!"
+
+    return total
+
+def write_inodes(fp):
+    global inodes
+
+    for inode in inodes:
+        fp.write(inode)
+
+    return len(inodes) * sizeof(Inode)
+
+def write_array(fp, vals):
+    bytes_written = 0
+
+    for val in vals:
+        bytes_written += val.write(fp)
+
+    return bytes_written
+
+def write_dir(fp, manifest, dirpath):
+    global directory_entries_count
+
+    print "Writing directory %s" % dirpath
+    directory_entry_inodes = []
+    for entry in manifest:
+        #print "--> Entry [%s]" % entry
+        val = manifest.get(entry)
+        if not type(val) is dict: #TODO Revisit how to fix it as it ignores 
some symlinks
+            if os.path.islink(val):
+                print "--> Skipping link %s to a directory" % val
+                continue
+
+        inode = next_inode()
+        directory_entry_inodes.append((entry,inode))
+
+        if type(val) is dict: # folder
+            inode.mode = DIR_MODE
+            count, directory_entries_index = write_dir(fp, val, dirpath + '/' 
+ entry)
+            inode.count = count
+            inode.data_offset = directory_entries_index
+        else: # file or symlink
+            if val.startswith('->'): #symlink
+                print "Symlink %s" % entry
+                inode.mode = LINK_MODE
+                global symlinks_count
+                inode.data_offset = symlinks_count
+                inode.count = 1
+                next_symlink(val[2:])
+            else: #file
+                inode.mode = REG_MODE
+                global block
+                inode.data_offset = block
+                inode.count = write_file(fp, val, entry)
+
+    this_directory_entries_index = directory_entries_count
+    for directory_entry_inode in directory_entry_inodes:
+        
next_directory_entry(directory_entry_inode[0],directory_entry_inode[1].inode_no)
+
+    this_directory_entries_count = len(directory_entry_inodes)
+    print "--> Writing directory %s with %d entries" % (dirpath, 
this_directory_entries_count)
+    return (this_directory_entries_count, this_directory_entries_index)
+
+def write_fs(fp, manifest):
+    global block
+    global inodes
+    global directory_entries
+    global symlinks
+
+    root_inode = next_inode()
+    root_inode.mode = DIR_MODE
+    
+    count, directory_entries_index = write_dir(fp, manifest.get(''), '')
+    root_inode.count = count
+    root_inode.data_offset = directory_entries_index
+
+    block_no = block
+
+    # Write directories entries array
+    bytes_written = write_array(fp,directory_entries)
+    bytes_written += write_array(fp,symlinks)
+
+    # Write inodes!
+    print "Writing inodes"
+    for inode in inodes:
+        print "Inode_no %d, mode:%d, data_offset:%d, count:%d" % 
(inode.inode_no, inode.mode, inode.data_offset, inode.count)
+
+    write_inodes(fp)
+    bytes_written += len(inodes) * sizeof(Inode)
+
+    return (block_no, bytes_written)
+
+def gen_image(out, manifest):
+    print "Writing image"
+    fp = open(out, 'wb')
+
+    # write the initial superblock
+    write_initial_superblock(fp)
+
+    system_structure_block, bytes_written = write_fs(fp, manifest)
+    structure_info_last_block_bytes = bytes_written % OSV_BLOCK_SIZE
+    structure_info_blocks_count = bytes_written / OSV_BLOCK_SIZE + (1 if 
structure_info_last_block_bytes > 0 else 0)
+
+    pad(fp,OSV_BLOCK_SIZE - structure_info_last_block_bytes)
+
+    global inodes
+    global directory_entries
+    global symlinks
+
+    sb = SuperBlock()
+    sb.version = 1
+    sb.magic = int('0xDEADBEAD', 16)
+    sb.block_size = OSV_BLOCK_SIZE
+    sb.structure_info_first_block = system_structure_block
+    sb.structure_info_blocks_count = structure_info_blocks_count
+    sb.directory_entries_count = len(directory_entries)
+    sb.symlinks_count = len(symlinks)
+    sb.inodes_count = len(inodes)
+
+    print "First block: %d, blocks count: %d" % 
(sb.structure_info_first_block, sb.structure_info_blocks_count)
+    print "Directory entries count %d" % sb.directory_entries_count
+    print "Symlinks count %d" % sb.symlinks_count
+    print "Inodes count %d" % sb.inodes_count
+
+    fp.seek(0)
+    fp.write(sb)
+
+    fp.close()
+
+def parse_manifest(manifest):
+    manifest = [(x, y % defines) for (x, y) in manifest]
+    files = list(expand(manifest))
+    files = [(x, unsymlink(y)) for (x, y) in files]
+
+    file_dict = {}
+
+    for name, hostname in files:
+        print "^^ [%s] : [%s]" % (name, hostname)
+        if os.path.isdir(hostname):
+            print "Adding directory %s" % name
+            p = file_dict
+            if os.path.islink(hostname):
+                link = os.readlink(hostname)
+                print "--> Link [%s] -> [%s]" % (hostname,link)
+                dirname = os.path.dirname(name)
+                basename = os.path.basename(name)
+                for token in dirname.split('/'):
+                    p = p.setdefault(token, {})
+                p[basename] = "->%s/" % link #TODO Check if add '/' always 
works
+            else:
+                for token in name.split('/'):
+                    p = p.setdefault(token, {})
+        else:
+            if hostname.endswith("-stripped.so"):
+                continue
+            hostname = strip_file(hostname)
+            #print "Adding %s" % name
+            dirname = os.path.dirname(name)
+            basename = os.path.basename(name)
+            p = file_dict
+            if dirname == '/':
+                p = p.setdefault('', {})
+            else:
+                for token in dirname.split('/'):
+                    p = p.setdefault(token, {})
+            p[basename] = hostname
+
+    return file_dict
+
+
+def main():
+    make_option = optparse.make_option
+
+    opt = optparse.OptionParser(option_list=[
+            make_option('-o',
+                        dest='output',
+                        help='write to FILE',
+                        metavar='FILE'),
+            make_option('-m',
+                        dest='manifest',
+                        help='read manifest from FILE',
+                        metavar='FILE'),
+            make_option('-D',
+                        type='string',
+                        help='define VAR=DATA',
+                        metavar='VAR=DATA',
+                        action='callback',
+                        callback=add_var),
+    ])
+
+    (options, args) = opt.parse_args()
+
+    manifest = read_manifest(options.manifest)
+
+    outfile = os.path.abspath(options.output)
+
+    manifest = parse_manifest(manifest)
+
+    gen_image(outfile, manifest)
+
+if __name__ == "__main__":
+    main()
-- 
2.7.4

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[PATCH] WORKING copy of readonly mfs2

Reply via email to