Re: [openstack-dev] [Congress] data-source renovation

2014-08-04 Thread Alex Yip
Hi all,

I favor the first approach because it solves the usability problem of wide 
tables without limiting Congress' ability to use wide tables, or adding extra 
complexity.

There are legitimate uses for wide tables, so Congress should be able to 
support them.  For example, Congress will need to support very large data 
sources in the future (TB in size).  It is best if Congress uses those 
databases in place, without creating a local copy of the database, so 
supporting wide tables and making them easy to use in the policy language will 
be a win for the future.

For Con (i) (we will need to invert the preprocessor when showing 
rules/traces/etc. to the user), we can keep the translated policies hidden from 
the user.  The user should only see policies that he wrote.

For Con (ii) (a layer of translation makes debugging difficult), the 
translation layer would be akin to a C preprocessor.  It will be possible to 
match up items on both sides of the translation layer.

- Alex


 Option 2 looks like a better idea keeping in mind the data model
 consistency with Neutron/Nova.
 Could we write something similar to a view which becomes a layer on top if
 this data model?


From: Tim Hinrichs
Sent: Tuesday, July 29, 2014 3:03 PM
To: openstack-dev@lists.openstack.org
Cc: Alex Yip
Subject: [Congress] data-source renovation

Hi all,

As I mentioned in a previous IRC, when writing our first few policies I had 
trouble using the tables we currently use to represent external data sources 
like Nova/Neutron.

The main problem is that wide tables (those with many columns) are hard to use. 
 (a) it is hard to remember what all the columns are, (b) it is easy to 
mistakenly use the same variable in two different tables in the body of the 
rule, i.e. to create an accidental join, (c) changes to the datasource drivers 
can require tedious/error-prone modifications to policy.

I see several options.  Once we choose something, I’ll write up a spec and 
include the other options as alternatives.


1) Add a preprocessor to the policy engine that makes it easier to deal with 
large tables via named-argument references.

Instead of writing a rule like

p(port_id, name) :-
neutron:ports(port_id, addr_pairs, security_groups, extra_dhcp_opts, 
binding_cap, status, name, admin_state_up, network_id, tenant_id, binding_vif, 
device_owner, mac_address, fixed_ips, router_id, binding_host)

we would write

p(id, nme) :-
neutron:ports(port_id=id, name=nme)

The preprocessor would fill in all the missing variables and hand the original 
rule off to the Datalog engine.

Pros: (i) leveraging vanilla database technology under the hood
  (ii) policy is robust to changes in the fields of the original data b/c 
the Congress data model is different than the Nova/Neutron data models
Cons: (i) we will need to invert the preprocessor when showing 
rules/traces/etc. to the user
  (ii) a layer of translation makes debugging difficult

2) Be disciplined about writing narrow tables and write 
tutorials/recommendations demonstrating how.

Instead of a table like...
neutron:ports(port_id, addr_pairs, security_groups, extra_dhcp_opts, 
binding_cap, status, name, admin_state_up, network_id, tenant_id, binding_vif, 
device_owner, mac_address, fixed_ips, router_id, binding_host)

we would have many tables...
neutron:ports(port_id)
neutron:ports.addr_pairs(port_id, addr_pairs)
neutron:ports.security_groups(port_id, security_groups)
neutron:ports.extra_dhcp_opts(port_id, extra_dhcp_opts)
neutron:ports.name(port_id, name)
...

People writing policy would write rules such as ...

p(x) :- neutron:ports.name(port, name), ...

[Here, the period e.g. in ports.name is not an operator--just a convenient way 
to spell the tablename.]

To do this, Congress would need to know which columns in a table are sufficient 
to uniquely identify a row, which in most cases is just the ID.

Pros: (i) this requires only changes in the datasource drivers; everything else 
remains the same
  (ii) still leveraging database technology under the hood
  (iii) policy is robust to changes in fields of original data
Cons: (i) datasource driver can force policy writer to use wide tables
  (ii) this data model is much different than the original data models
  (iii) we need primary-key information about tables

3) Enhance the Congress policy language to handle objects natively.

Instead of writing a rule like the following ...

p(port_id, name, group) :-
neutron:ports(port_id, addr_pairs, security_groups, extra_dhcp_opts, 
binding_cap, status, name, admin_state_up, network_id, tenant_id, binding_vif, 
device_owner, mac_address, fixed_ips, router_id, binding_host),
neutron:ports.security_groups(security_group, group)

we would write a rule such as
p(port_id, name) :-
neutron:ports(port),
port.name(name),
port.id(port_id),
port.security_groups(group)

The big difference here is that the period (.) 

Re: [openstack-dev] [Congress] data-source renovation

2014-08-01 Thread Rajdeep Dua
Option 2 looks like a better idea keeping in mind the data model
consistency with Neutron/Nova.
Could we write something similar to a view which becomes a layer on top if
this data model?


On Wed, Jul 30, 2014 at 3:33 AM, Tim Hinrichs thinri...@vmware.com wrote:

 Hi all,

 As I mentioned in a previous IRC, when writing our first few policies I
 had trouble using the tables we currently use to represent external data
 sources like Nova/Neutron.

 The main problem is that wide tables (those with many columns) are hard to
 use.  (a) it is hard to remember what all the columns are, (b) it is easy
 to mistakenly use the same variable in two different tables in the body of
 the rule, i.e. to create an accidental join, (c) changes to the datasource
 drivers can require tedious/error-prone modifications to policy.

 I see several options.  Once we choose something, I’ll write up a spec and
 include the other options as alternatives.


 1) Add a preprocessor to the policy engine that makes it easier to deal
 with large tables via named-argument references.

 Instead of writing a rule like

 p(port_id, name) :-
 neutron:ports(port_id, addr_pairs, security_groups, extra_dhcp_opts,
 binding_cap, status, name, admin_state_up, network_id, tenant_id,
 binding_vif, device_owner, mac_address, fixed_ips, router_id, binding_host)

 we would write

 p(id, nme) :-
 neutron:ports(port_id=id, name=nme)

 The preprocessor would fill in all the missing variables and hand the
 original rule off to the Datalog engine.

 Pros: (i) leveraging vanilla database technology under the hood
   (ii) policy is robust to changes in the fields of the original data
 b/c the Congress data model is different than the Nova/Neutron data models
 Cons: (i) we will need to invert the preprocessor when showing
 rules/traces/etc. to the user
   (ii) a layer of translation makes debugging difficult

 2) Be disciplined about writing narrow tables and write
 tutorials/recommendations demonstrating how.

 Instead of a table like...
 neutron:ports(port_id, addr_pairs, security_groups, extra_dhcp_opts,
 binding_cap, status, name, admin_state_up, network_id, tenant_id,
 binding_vif, device_owner, mac_address, fixed_ips, router_id, binding_host)

 we would have many tables...
 neutron:ports(port_id)
 neutron:ports.addr_pairs(port_id, addr_pairs)
 neutron:ports.security_groups(port_id, security_groups)
 neutron:ports.extra_dhcp_opts(port_id, extra_dhcp_opts)
 neutron:ports.name(port_id, name)
 ...

 People writing policy would write rules such as ...

 p(x) :- neutron:ports.name(port, name), ...

 [Here, the period e.g. in ports.name is not an operator--just a
 convenient way to spell the tablename.]

 To do this, Congress would need to know which columns in a table are
 sufficient to uniquely identify a row, which in most cases is just the ID.

 Pros: (i) this requires only changes in the datasource drivers; everything
 else remains the same
   (ii) still leveraging database technology under the hood
   (iii) policy is robust to changes in fields of original data
 Cons: (i) datasource driver can force policy writer to use wide tables
   (ii) this data model is much different than the original data models
   (iii) we need primary-key information about tables

 3) Enhance the Congress policy language to handle objects natively.

 Instead of writing a rule like the following ...

 p(port_id, name, group) :-
 neutron:ports(port_id, addr_pairs, security_groups, extra_dhcp_opts,
 binding_cap, status, name, admin_state_up, network_id, tenant_id,
 binding_vif, device_owner, mac_address, fixed_ips, router_id, binding_host),
 neutron:ports.security_groups(security_group, group)

 we would write a rule such as
 p(port_id, name) :-
 neutron:ports(port),
 port.name(name),
 port.id(port_id),
 port.security_groups(group)

 The big difference here is that the period (.) is an operator in the
 language, just as in C++/Java.

 Pros:
 (i) The data model we use in Congress is almost exactly the same as the
 data model we use in Neutron/Nova.

 (ii) Policy is robust to changes in the Neutron/Nova data model as long as
 those changes only ADD fields.

 (iii) Programmers may be slightly more comfortable with this language.

 Cons:

 (i) The obvious implementation (changing the engine to implement the (.)
 operator directly is quite a change from traditional database technology.
  At this point, that seems risky.

 (ii) It is unclear how to implement this via a preprocessor (thereby
 leveraging database technology).  The key problem I see is that we would
 need to translate port.name(...) into something like option (2) above.
  The difficulty is that TABLE could sometimes be a port, sometimes be a
 network, sometimes be a subnet, etc.

 (iii) Requires some extra syntactic restrictions to ensure we don't lose
 decidability.

 (iv) Because the Congress and Nova/Neutron models are the same, changes to
 the Nova/Neutron model can 

[openstack-dev] [Congress] data-source renovation

2014-07-29 Thread Tim Hinrichs
Hi all,

As I mentioned in a previous IRC, when writing our first few policies I had 
trouble using the tables we currently use to represent external data sources 
like Nova/Neutron.  

The main problem is that wide tables (those with many columns) are hard to use. 
 (a) it is hard to remember what all the columns are, (b) it is easy to 
mistakenly use the same variable in two different tables in the body of the 
rule, i.e. to create an accidental join, (c) changes to the datasource drivers 
can require tedious/error-prone modifications to policy.

I see several options.  Once we choose something, I’ll write up a spec and 
include the other options as alternatives.


1) Add a preprocessor to the policy engine that makes it easier to deal with 
large tables via named-argument references.

Instead of writing a rule like

p(port_id, name) :-
neutron:ports(port_id, addr_pairs, security_groups, extra_dhcp_opts, 
binding_cap, status, name, admin_state_up, network_id, tenant_id, binding_vif, 
device_owner, mac_address, fixed_ips, router_id, binding_host)

we would write

p(id, nme) :-
neutron:ports(port_id=id, name=nme)

The preprocessor would fill in all the missing variables and hand the original 
rule off to the Datalog engine.

Pros: (i) leveraging vanilla database technology under the hood
  (ii) policy is robust to changes in the fields of the original data b/c 
the Congress data model is different than the Nova/Neutron data models
Cons: (i) we will need to invert the preprocessor when showing 
rules/traces/etc. to the user
  (ii) a layer of translation makes debugging difficult

2) Be disciplined about writing narrow tables and write 
tutorials/recommendations demonstrating how.

Instead of a table like...
neutron:ports(port_id, addr_pairs, security_groups, extra_dhcp_opts, 
binding_cap, status, name, admin_state_up, network_id, tenant_id, binding_vif, 
device_owner, mac_address, fixed_ips, router_id, binding_host)

we would have many tables...
neutron:ports(port_id)
neutron:ports.addr_pairs(port_id, addr_pairs)
neutron:ports.security_groups(port_id, security_groups)
neutron:ports.extra_dhcp_opts(port_id, extra_dhcp_opts)
neutron:ports.name(port_id, name)
...

People writing policy would write rules such as ...

p(x) :- neutron:ports.name(port, name), ...

[Here, the period e.g. in ports.name is not an operator--just a convenient way 
to spell the tablename.]

To do this, Congress would need to know which columns in a table are sufficient 
to uniquely identify a row, which in most cases is just the ID.

Pros: (i) this requires only changes in the datasource drivers; everything else 
remains the same
  (ii) still leveraging database technology under the hood
  (iii) policy is robust to changes in fields of original data
Cons: (i) datasource driver can force policy writer to use wide tables
  (ii) this data model is much different than the original data models
  (iii) we need primary-key information about tables

3) Enhance the Congress policy language to handle objects natively.

Instead of writing a rule like the following ...

p(port_id, name, group) :-
neutron:ports(port_id, addr_pairs, security_groups, extra_dhcp_opts, 
binding_cap, status, name, admin_state_up, network_id, tenant_id, binding_vif, 
device_owner, mac_address, fixed_ips, router_id, binding_host),
neutron:ports.security_groups(security_group, group)

we would write a rule such as
p(port_id, name) :-
neutron:ports(port),
port.name(name),
port.id(port_id),
port.security_groups(group)

The big difference here is that the period (.) is an operator in the language, 
just as in C++/Java.

Pros:
(i) The data model we use in Congress is almost exactly the same as the data 
model we use in Neutron/Nova.

(ii) Policy is robust to changes in the Neutron/Nova data model as long as 
those changes only ADD fields.

(iii) Programmers may be slightly more comfortable with this language.

Cons:

(i) The obvious implementation (changing the engine to implement the (.) 
operator directly is quite a change from traditional database technology.  At 
this point, that seems risky.

(ii) It is unclear how to implement this via a preprocessor (thereby leveraging 
database technology).  The key problem I see is that we would need to translate 
port.name(...) into something like option (2) above.  The difficulty is that 
TABLE could sometimes be a port, sometimes be a network, sometimes be a subnet, 
etc.

(iii) Requires some extra syntactic restrictions to ensure we don't lose 
decidability.

(iv) Because the Congress and Nova/Neutron models are the same, changes to the 
Nova/Neutron model can require rewriting policy.



Thoughts?
Tim
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev