New webrev for bug 7960:
http://cr.opensolaris.org/~bpytlik/ips-7960-v3/

Bug:
7960 client and depot need different organization of files

Note:
Compared to v2, it resyncs with the gate and adds more info to the block comments for file_manager and layout.
I've included the block comments inline below.

Brock Pytlik wrote:
New webrev for bug 7960:
http://cr.opensolaris.org/~bpytlik/ips-7960-v2/

Bug:
7960 client and depot need different organization of files


Notes:
This adds block comments to file_manager and layout. I think it resolves the other issues that came up as well.

It also removes the gen_copy_list.py under util/publisher since there seem to be no consumers and no one responded to my mail asking if anyone needed to keep it.

Thanks,
Brock
_______________________________________________
pkg-discuss mailing list
[email protected]


Thanks,
Brock
file_manager.py
# The purpose of the FileManager class is to provide a central location to
# insert, lookup, and remove files, stored in the download directory of the
# client or the file directory of the repo.  It provides a way to change the
# way files are placed into the directory structure by allowing for an ordered
# sequence of layouts.  The FileManager overlays the layouts on top of each
# other. This means that layouts have certain requirements (described in the
# layout module), but allows the reuse of shared directory structures when
# possible. When a file is inserted, it is placed in the directory structure
# according to the first layout.  When a file is retrieved, each layout is
# checked in turn to determine whether the file is present.  If the file is
# present but not located according to where it should be located according to # the first layout and the FileManager has permission to move the file, it will # be moved to that location. When a file is removed, the layouts are checked in # turn until a file is found and removed. The FileManager also provides a way
# to generate all hashes stored by the FileManager.

layout.py
# The Layout class hierarchy encapsulates bijective mappings between a hash
# (or file name since those are equivalent in our system) and a relative path # that describes where to place that file in the file system. This bijective # relation should hold when the union of all layouts is considered as a single
# set of mappings.  In practical terms, this means that only one layout may
# potentially deposit a hash into any particular location.  This is not a
# difficult requirement to satisfy since each layout may append a unique
# identifier to the file name or choose to carve out its own namespace at some
# level of directory hierarchy.

# The Original layout has the problem that for size of images (on the order of
# 300-500k files) and repos (on the order of 1M files), the second directory
# level usually contains a single file. This imposes a substantial penalty for # removing or resyncing the directories because a readdir must be done for each # directory and readdir is two orders of magnitude slower than the open or read # ZFS operations, and one order of magnitude slower than ZFS remove. Reducing # the number of directories used to hold the downloaded files was a goal for the
# next layout.

# The simplest layout, and the one with the fewest number of directory layers
# is to place all files in a single directory (henceforth called the flat
# layout).  Such a layout performs well on ZFS, but it has two issues.  The
# first is that it makes operating on the directory.  Running a ls on a
# directory with one million files uses about 370M of memory, while using
# python's os.listdir consumes about 100M. The second and more serious issue is
# that some file systems cannot support having 1M or 10M files in a single
# directory. While creating a ZFS specific implementation and a general one was # possible, it was considered less desirable because it meant both maintaining
# separate code paths and imposed difficulties for moving images between
# systems.

# In order to determine the impact of the different layouts on performance,
# experiments were run using different layout strategies with the number of
# files varying between 1 file and 10M files.  The results clearly suggested
# that having a large number of directories was detrimental to performance and # that having a single level of directories was preferable to a deep structure. # Since the current hash uses hex digits, using multiples of 16 for the number # of directories was convenient (though others were explored). Using a fanout
# of 16 was discarded since it could only support 524k total files, assuming
# that the other file systems under consideration could support 32,768 files in a
# single directory but not 65,536 files.  524k was dangerously close to our
# estimates for a client image and well below what a server might be expected to # hold. A fanout of 4096 was also considered but had poor performance compared # to the flat layout. A fanout of 256 turned out to provide the right trade-offs. # Instead of 524k total files, it can hold over 8M files. Its performance was # also competitive with the flat layout up to 10M files, the largest number of
# files tested.
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Reply via email to