Hi Mike,

Thanks for putting this together. Comments below..

Thanks,
Octave

On 7/23/2017 3:02 PM, Michael Bayer wrote:
I've been working with Octave Oregon in assisting with new rules and
datatypes that would allow projects to support the NDB storage engine
with MySQL.

To that end, we've made changes to oslo.db in [1] to support this, and
there are now a bunch of proposals such as [2] [3] to implement new
ndb-specific structures in projects.

The reviews for all downstream projects except Cinder are still under
review. While we have a chance to avoid a future naming problem, I am
making the following proposal:

Rather than having all the projects make use of
oslo_db.sqlalchemy.ndb.AutoStringTinyText / AutoStringSize, we add new
generic types to oslo.db :

oslo_db.sqlalchemy.types.SmallString
oslo_db.sqlalchemy.types.String

(or similar )

Internally, the ndb module would be mapping its implementation for
AutoStringTinyText and AutoStringSize to these types.   Functionality
would be identical, just the naming convention exported to downstream
consuming projects would no longer refer to "ndb.<name>" for
datatypes.

I think this would make sense.


Reasons for doing so include:

1. openstack projects should be relying upon oslo.db to make the best
decisions for any given database backend, hardcoding as few
database-specific details as possible.   While it's unavoidable that
migration files will have some "if ndb:" kinds of blocks, for the
datatypes themselves, the "ndb." namespace defeats extensibility.  if
IBM wanted Openstack to run on DB2 (again?) and wanted to add a
"db2.String" implementation to oslo.db for example, the naming and
datatypes would need to be opened up as above in any case;  might as
well make the change now before the patch sets are merged.

Agreed that this extra layer of abstraction could be used by DB2, MongoDB, etc.

2. The names "AutoStringTinyText" and "AutoStringSize" themselves are
confusing and inconsistent w/ each other (e.g. what is "auto"?  one is
"auto" if its String or TinyText and the other is "auto" if its
String, and..."size"?)

For these, here is a brief synopsis:

AutoStringTinyText, will convert a column to the TinyText type. This is used for cases where a 255 varchar string needs to be converted to a text blob to make the row fit within the NDB limits. If you are using ndb, it'll convert it to TinyText, otherwise it leaves it alone. The reason that TinyText type was chosen is because it'll hold the same 255 varchars and saves on space.

AutoStringText, does the same as the above, but converts the type to Text and is meant for use cases where you need more than 255 varchar worth of space. Good examples of these uses are where outputs of hypervisor and OVS commands are dumped into the database.

AutoStringSize, you pass two parameters, one being the non-NDB size and the second being the NDB size. The point here is where you need to reduce the size of the column to fit within the NDB limits, but you want to preserve the String varchar type because it might be used in a key, index, etc. I only use these in cases where the impacts are very low.. for example where a column is used for keeping track of status (up, down, active, inactive, etc.) that don't require 255 varchars.

In many cases, the use of these could be removed by simply changing the columns to more appropriate types and sizes. There is a tremendous amount of wasted space in many of the databases. I'm more than willing to help out with this if teams decide they would rather do that instead as the long-term solution. Until then, these functions enable the use of both with minimal impact.

Another thing to keep in mind is that the only services that I've had to adjust column sizes for are:

Cinder
Neutron
Nova
Magnum

The other services that I'm working on like Keystone, Barbican, Murano, Glance, etc. only need changes to:

1. Ensure that foreign keys are dropped and created in the correct order when changing things like indexes, constraints, etc. Many services do these proper steps already, there are just cases where this has been missed because InnoDB is very forgiving on this. But other databases are not. 2. Fixing the database migration and sync operations to use oslo.db, pass the right parameters, etc. Something that should have been done in the first place, but hasn't. So this more of a house cleaning step to insure that services are using oslo.db correctly.

The only other oddball use case is deal with disabling nested transactions, where Neutron is the only one that does this.

On the flip side, here is a short list of services that I haven't had to make ANY changes for other than having oslo.db 4.24 or above:

aodh
gnocchi
heat
ironic
manila


3. it's not clear (I don't even know right now by looking at these
reviews) when one would use "AutoStringTinyText" or "AutoStringSize".
For example in 
https://review.openstack.org/#/c/446643/10/nova/db/sqlalchemy/migrate_repo/versions/216_havana.py
I see a list of String(255)'s changed to one type or the other without
any clear notion why one would use one or the other.  Having names
that define simply the declared nature of the type would be most
appropriate.

One has to look at what the column is being used for and decide what appropriate remediation steps are. This takes time and one must research what kind of data goes in the column, what puts it there, what consumes it, and what remediation would have the least amount of impact.


I can add these names up to oslo.db and then we would just need to
spread these out through all the open ndb reviews and then also patch
up Cinder which seems to be the only ndb implementation that's been
merged so far.

Keep in mind this is really me trying to correct my own mistake, as I
helped design and approved of the original approach here where
projects would be consuming against the "ndb." namespace.  However,
after seeing it in reviews how prevalent the use of this extremely
backend-specific name is, I think the use of the name should be much
less frequent throughout projects and only surrounding logic that is
purely to do with the ndb backend and no others.   At the datatype
level, the chance of future naming conflicts is very high and we
should fix this mistake (my mistake) before it gets committed
throughout many downstream projects.


[1] https://review.openstack.org/#/c/427970/

[2] https://review.openstack.org/#/c/446643/

[3] https://review.openstack.org/#/c/446136/

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: [email protected]?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to