Author: allison
Date: Sat Mar  1 22:23:29 2008
New Revision: 26178

Modified:
   trunk/docs/pdds/draft/pdd18_security.pod

Log:
[pdd] Rewritten Security PDD.


Modified: trunk/docs/pdds/draft/pdd18_security.pod
==============================================================================
--- trunk/docs/pdds/draft/pdd18_security.pod    (original)
+++ trunk/docs/pdds/draft/pdd18_security.pod    Sat Mar  1 22:23:29 2008
@@ -1,6 +1,6 @@
 =head1 NAME
 
-docs/pdds/pdd18_security.pod - Parrot's security model
+docs/pdds/pdd18_security.pod - Security Model
 
 =head1 ABSTRACT
 
@@ -12,140 +12,209 @@
 
 =head1 DESCRIPTION
 
-{{ NOTE: This PDD is inadequate. We want to provide a much more
-extensive level of sandboxing. }}
-
-There are three basic subsystems in Parrot's security system. They are:
+Parrot will be used in a variety of different application contexts, each with
+its own unique security needs.
 
 =over 4
 
-=item Quotas
+=item * Small devices such as cell phones need tight control over resource
+usage (CPU, memory, etc).
 
-To ensure that an interpreter doesn't use more CPU time, memory, or system
-resources than is allowed.
+=item * Web applications need filtering and validation of incoming data and
+blocks to prevent the use of unfiltered data in execution contexts (SQL, system
+calls, runtime eval, etc).
+
+=item * Web browser embedding, i.e. client-side execution of high-level
+languages, needs control over resource access on the client machine (local disk
+access, local network connections), sandboxing for downloaded code, limits on
+what code can be loaded and executed, and limits on certain dynamic features
+(runtime eval of code, modification of global namespaces).
+
+=item * Database engine embedding, i.e. server-side execution of high-level
+languages as stored procedures, also needs control over resource access (disk
+access, network connections), and limits on loaded code, but additionally needs
+adminstrator configured lists of allowed libraries and library paths.
 
-=item Privilege checks
+=item * Security auditing tools need hooks in the compilation process for
+static analysis.
 
-To restrict access to what an interpreter can do.
+=back
 
-=item Parameter checks
+=head1 IMPLEMENTATION
 
-To double-check bytecode parameters for basic sanity.
+Parrot's security infrastructure is not an independent, encapsulated subsystem.
+It is a series of related features and functionality spread throughout the
+virtual machine. 
+
+=head2 Resource Quotas
+
+Resource quotas ensure that an interpreter doesn't use more CPU time, memory,
+or system resources than are allowed. Quotas are most useful when running code
+in a managed environment such as a web, database, or game server where no one
+interpreter is allowed to consume too many resources and impact the system too
+badly. CPU time is managed by the runloop. The memory system handles memory
+quotas, the I/O system handles file open and pending I/O count quotas, and so
+on. 
 
-=back
+=head2 Privileges
 
-Each of these can be enabled or disabled separately, and each has a particular
-purpose. Often two or more systems will be engaged at once, but this isn't
-required.
-
-=head2 Quotas
-
-The purpose of a quota system is to ensure that an interpreter doesn't use up
-too many resources -- usually memory and CPU, but there may be other scarce
-resources, such as files, that need managing. In a shared environment it
-prevents an interpreter from hogging too many resources, either explicitly (as
-in a DOS attack) or implicitly (through poor programming or poorly scheduled
-runs), and preventing other interpreters from running.
-
-=head2 Privilege checks
-
-A privilege system is used to restrict code from performing certain actions.
-When privilege checking is in force you may need a particular privilege to load
-a library, or open a file.
-
-Each interpreter has three sets of privileges. The first set is the I<current>
-privilege set, which is the set of privileges  currently in force. The second
-set is the set of I<authorized>  privileges. These are the privileges that the
-interpreter is  allowed to put into its current set. The third set are the
-I<sub> privileges. These are the privileges that a sub has intrinsic to itself,
-regardless of what the interpreter privileges currently are. (Subs with
-privileges attached to them are called I<privileged subs>, oddly enough).
-
-An interpreter may drop any privilege it likes from the current set. It may
-also at any time enable a privilege which is in the authorized set but not in
-the current set. It is possible for a sub to have a privilege in its current
-set which isn't in its authorized set. Those privileges are, once dropped,
-gone.
-
-Subroutines may also have a set of I<required privileges> attached to them. The
-current interpreter B<must> have those privileges in its current or sub set to
-call a subroutine so tagged. If the interpreter doesn't have the privileges
-then a privilege violation exception is thrown.
+A privilege system is used to restrict code from performing certain
+actions. When privilege checking is in force the code may need a particular
+privilege to load a library, or open a file.
+
+Privileges can be quite broad, on the order of "allow file I/O", or as
+fine-grained as allowing/denying the right to run one particular opcode.
+Privileges are discrete entities, They are also hierarchical, one privilege can
+be specified to follow from another privilege (the privilege FOO may be
+automatically granted to anything with the privilege BAR). Anything with the
+ALL privilege is automatically granted all other privileges in the system.
+Privileges are user-definable, but user-defined privileges can only give grants
+of rights, they cannot take them (BAZ may grant its privileges to any user with
+FOO privileges, but it can't automatically grant itself all the FOO
+privileges).
 
-=head2 Parameter checking
+A few example privileges:
 
-In normal operation the interpreter assumes that the bytecode that it executes
-is valid -- that is, any parameters to opcodes are sane, data structures are
-intact, and the world, generally, is a good place. When parameter checking is
-enabled, however, we assume that bytecode is not necessarily valid. The
-interpreter then, at runtime, makes sure that all specified register numbers
-are within valid range, and string and PMC structures used are valid.
-
-=head2 Feature usage
-
-Each of the three features has a separate use. Parameter checking is most
-useful when executing code which may come from an unsafe source, for example
-from the network. Quotas are most useful when running code in a managed
-environment such as a web, database, or game server where no one interpreter is
-allowed to consume too many resources and impact the system too badly.
-Privileges are used when running untrusted code in a trusted environment, again
-such as a database or game server, where Parrot can't disable certain features,
-but must limit their use to trusted code.
-
-It's unlikely that any one of these features will be enabled individually,
-though there are certainly reasons to do so. Each feature is separately
-implemented, however, and as such can be taken singly and discussed.
+=over 4
 
-=head1 IMPLEMENTATION
+=item ALL
 
-=head2 Quotas
+Granted all privileges.
 
-Quota management is split into two separate parts, CPU time and everything
-else.
+=item IO
 
-=head3 CPU time
+May run I/O operations.
 
-CPU time is managed by the runloop. There's a certain unavoidable overhead, but
-there's no way around this, at least not reliably. (We may be able to play
-interesting games with timer events and system event handlers. We'll see).
-
-=head3 Everything else
-
-The rest of the quotas are enforced by code scattered across the interpreter.
-The memory system handles memory quotas, the IO system handles file open and
-pending IO count quotas, and so on. There's not a whole lot for this, though we
-should abstract out all the high-level operations that do things which may have
-quotas applied so we can wrap these functions, so as not to pay the price when
-quotas aren't in force.
-
-For example, rather than having checks in the memory subsystem to see if quotas
-are enabled (a check that would have to be done on each memory allocation) we
-should instead access the memory allocation via a function pointer stored off
-the interpreter somewhere. This way when quotas are enabled we can swap in an
-alternate function pointer, one that points to a function which checks quotas
-before calling into the memory subsystem.
-
-This should be relatively painless as most, if not all, of the functions which
-should have quotas applied to them are also functions which embedders may wish
-to override, and thus already need to be accessed indirectly.
+=item INVOKE
 
-=head2 Privileges
+May invoke subroutines, methods, or return continuations.
+
+=item RETURN
+
+May invoke return continuations (not subroutines or methods). (RETURN
+privileges are granted to anyone with INVOKE privileges.)
 
+=item SYSCALL
 
+May run a system call.
 
-=head2 Parameter Checking
+=item COMPILE
 
-Parameter checking is done with an alternate runloop, one where the opcodes
-first check their parameters before executing. This is fairly expensive, which
-is why it isn't the default mode for operations. (The JIT may, at its choice,
-check parameters at JIT time). Checking at load time is also somewhat
-problematic, as it is also somewhat expensive, means the checker needs to know
-the signatures for ops which may not have been loaded yet, and precludes code
-doing overly clever things. (Which itself probably ought be forbidden, but
-that's a separate problem).
+May compile code from string source at runtime (eval).
 
-The checking op variants are automatically generated by the op file
-preprocessor, the same way that it generates all the other oploop variants.
+=item LOAD
+
+May load libraries at runtime.
+
+=back
+
+=head3 Users
+
+For the most part, a "user" in the Parrot privilege system doesn't correspond
+to a literal user (though it may, if Parrot is running embedded in a database
+engine or multi-user gaming system). A user is a bundle of privileges,
+identified by a user ID, and authenticated with a pass key. The privilege ID
+can be cheaply passed around, and validated whenever a restricted action is
+performed.
+
+=head3 Opcode Disabling
+
+All opcodes in Parrot can be selectively disabled, by short name (C<print>),
+long name (including signature, C<print_sc>), or by group (C<io>, C<net>,
+C<load>, C<compile>). They can also be selectively enabled, by defining a
+privilege with "disable all", and then allowing only specific opcodes.
+
+When running in a secured mode, all dynamically loaded opcode libraries are
+disabled by default, and have to be explicitly enabled (individually, as a
+group, or by a system-wide configuration).
+
+Opcodes are tagged with their group in their definition, and may be tagged in
+multiple groups, as in:
+
+  inline op print(in INT) :base_io,io {
+      ...
+  }
+
+=head3 Library Loading
+
+In certain environments, it's desirable to be able to restrict what libraries
+may be loaded by code running on the virtual machine. The allowed library list
+is defined in a system-wide configuration file, or set at runtime by a user
+with administrative privileges. Libraries may be signed, with the key specified
+in the library list and verified on loading. Libraries that can't be signed (C
+libraries), can be check-summed to ensure that the library you load is the
+exact file you expect.
+
+Generally, library loading restrictions are useful in an embedded environment
+like a database engine or web browser, or a multi-user environment like a web
+hosting server, where arbitrarily extended behavior is a security risk. It can
+also be a useful development tool, as running your daily development
+environment with library loading restrictions turned on means you always know
+exactly what dependencies the code base has.
+
+=head3 Resource Access
+
+Access to resources such as the local disk, network, are controlled through the
+privilege system. Resource access limitations are a combination of disabled
+opcodes, blocked library loading, and privilege checks within standard
+libraries for I/O, network, etc.
+
+=head2 Sandboxing
+
+A sandbox is a virtual machine within the virtual machine. It's a safe zone to
+contain code from an untrusted source. In the extreme case, a sandbox is
+completely isolated from all outside code, with no access to read or write to
+the surrounding environment. In the general case, a sandbox will have the
+ability to read from, but not write to the surrounding environment (global
+namespaces, for example), with a very narrow and carefully filtered route to
+send some data back to the code that called it. The sandbox system works
+together with the privileges system, in that by default code in the sandbox has
+no privileges outside the sandbox, but may be granted privileges.
+
+=head2 Data Firewall
+
+Any data that originates from user input (command-line, user prompt, web form,
+file access, network operation) is a potential security risk. The best place to
+trap bad data is at the point of entry, before it touches a single line of
+code. When the data firewall is enabled, all data entering from an external
+source or crossing a sandbox barrier is subjected to filter rules. The filter
+rules in force are configurable, and filters can be selectively enabled and
+disabled for particular types and sources of data.
+
+Data filters can be sanitizing or validating. Sanitizing data filters modify
+the data as it passes (escaping quotes, encoding entities, etc). Validating
+filters check that the data meets certain conditions (the presence or absence
+of specific features), and when it fails to meet those conditions, the data is
+blocked from passing the firewall (returning an empty string or PMCNULL instead
+of the expected data). Data filters can also be user-defined, as a regular
+expression (PGE rule) or subroutine.
+
+The same filter rules applied within the data firewall can be called explicitly
+on any data.
+
+=head2 Bytecode Validation
+
+In normal operation the interpreter assumes that the bytecode that it executes
+is valid -- that is, any parameters to opcodes are sane, data structures are
+intact. When bytecode validation is enabled, however, Parrot assumes that
+bytecode is not necessarily valid. The interpreter then, at runtime, makes sure
+that all specified register numbers are within valid range, and string and PMC
+structures used are valid.
+
+=head2 Auditing Hooks
+
+Even in dynamic languages, it's possible to perform a degree of static analysis
+for security risks. The opcode syntax tree (OST) produced by the language
+compiler is a good data source for static analysis checks, because it's
+low-level enough that you can check for individual opcodes that will be called
+(checking for I/O, networking, and other similar operations, or unknown
+dynamically-loaded opcodes), and high-level enough that you still have access
+to substantial metadata from the parse. The standard compiler tools already
+have the ability to add and remove stages from the compilation process. Static
+analysis tools can be implemented by stopping the standard compilation at the
+OST phase, and inserting an additional phase to scan the OST. Because the OST
+form is standard across high-level languages running on Parrot, the tools can
+be written once and applied to many languages.
 
 =head1 ATTACHMENTS
 
@@ -159,6 +228,10 @@
 
 "Safe ERB": http://agilewebdevelopment.com/plugins/safe_erb
 
+pecl/filter: http://us2.php.net/filter
+
+Rasmus Lerdorf for the term "data firewall".
+
 =cut
 
 __END__

Reply via email to