New webrev for bug 7960:
http://cr.opensolaris.org/~bpytlik/ips-7960-v3/
Bug:
7960 client and depot need different organization of files
Note:
Compared to v2, it resyncs with the gate and adds more info to the block
comments for file_manager and layout.
I've included the block comments inline below.
Brock Pytlik wrote:
New webrev for bug 7960:
http://cr.opensolaris.org/~bpytlik/ips-7960-v2/
Bug:
7960 client and depot need different organization of files
Notes:
This adds block comments to file_manager and layout. I think it
resolves the other issues that came up as well.
It also removes the gen_copy_list.py under util/publisher since there
seem to be no consumers and no one responded to my mail asking if
anyone needed to keep it.
Thanks,
Brock
_______________________________________________
pkg-discuss mailing list
[email protected]
Thanks,
Brock
file_manager.py
# The purpose of the FileManager class is to provide a central location to
# insert, lookup, and remove files, stored in the download directory of the
# client or the file directory of the repo. It provides a way to change the
# way files are placed into the directory structure by allowing for an
ordered
# sequence of layouts. The FileManager overlays the layouts on top of each
# other. This means that layouts have certain requirements (described
in the
# layout module), but allows the reuse of shared directory structures when
# possible. When a file is inserted, it is placed in the directory
structure
# according to the first layout. When a file is retrieved, each layout is
# checked in turn to determine whether the file is present. If the file is
# present but not located according to where it should be located
according to
# the first layout and the FileManager has permission to move the file,
it will
# be moved to that location. When a file is removed, the layouts are
checked in
# turn until a file is found and removed. The FileManager also provides
a way
# to generate all hashes stored by the FileManager.
layout.py
# The Layout class hierarchy encapsulates bijective mappings between a hash
# (or file name since those are equivalent in our system) and a relative
path
# that describes where to place that file in the file system. This
bijective
# relation should hold when the union of all layouts is considered as a
single
# set of mappings. In practical terms, this means that only one layout may
# potentially deposit a hash into any particular location. This is not a
# difficult requirement to satisfy since each layout may append a unique
# identifier to the file name or choose to carve out its own namespace
at some
# level of directory hierarchy.
# The Original layout has the problem that for size of images (on the
order of
# 300-500k files) and repos (on the order of 1M files), the second directory
# level usually contains a single file. This imposes a substantial
penalty for
# removing or resyncing the directories because a readdir must be done
for each
# directory and readdir is two orders of magnitude slower than the open
or read
# ZFS operations, and one order of magnitude slower than ZFS remove.
Reducing
# the number of directories used to hold the downloaded files was a goal
for the
# next layout.
# The simplest layout, and the one with the fewest number of directory
layers
# is to place all files in a single directory (henceforth called the flat
# layout). Such a layout performs well on ZFS, but it has two issues. The
# first is that it makes operating on the directory. Running a ls on a
# directory with one million files uses about 370M of memory, while using
# python's os.listdir consumes about 100M. The second and more serious
issue is
# that some file systems cannot support having 1M or 10M files in a single
# directory. While creating a ZFS specific implementation and a general
one was
# possible, it was considered less desirable because it meant both
maintaining
# separate code paths and imposed difficulties for moving images between
# systems.
# In order to determine the impact of the different layouts on performance,
# experiments were run using different layout strategies with the number of
# files varying between 1 file and 10M files. The results clearly suggested
# that having a large number of directories was detrimental to
performance and
# that having a single level of directories was preferable to a deep
structure.
# Since the current hash uses hex digits, using multiples of 16 for the
number
# of directories was convenient (though others were explored). Using a
fanout
# of 16 was discarded since it could only support 524k total files, assuming
# that the other file systems under consideration could support 32,768
files in a
# single directory but not 65,536 files. 524k was dangerously close to our
# estimates for a client image and well below what a server might be
expected to
# hold. A fanout of 4096 was also considered but had poor performance
compared
# to the flat layout. A fanout of 256 turned out to provide the right
trade-offs.
# Instead of 524k total files, it can hold over 8M files. Its
performance was
# also competitive with the flat layout up to 10M files, the largest
number of
# files tested.
_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss