On Sat, Jun 13, 2009 at 11:42:21PM +0100, Simon Slavin scratched on the wall:
> Do any of you have experience with doing this ? Are there some > conclusive points which will make me decide immediately that I should > do it one way or the other ? I accept reasoned argument, URLs, > anecdotes, or anything else relevant. Every system that uses a database must deal with this problem, and there are no real easy answers. But here are a few deeper points to consider. From the view point of Relational Model, the formal mathematical model that most RDBMS engines are based off (in theory, anyways), you should push as many constraints, rules, and general enforcement into the database as possible. The database is meant to represent "truth" and under the Model it should be impossible (or at least as difficult as possible) to make the database become inconsistent-- that is, be in a state that doesn't represent truth. There are two main issues with this. First, an modern SQL database is not a Relational Model database. SQL only offers a number of different constraints, but it is not as flexible or complete as the theoretical model presented by the Relational Model. Second, the Relational Model doesn't account for access methods or the concept of middle-ware or tiered applications. It is assumed that anyone and everything from applications to command-line systems will be accessing the database. If you have a user typing in raw SQL commands and you want your database to remain consistent, you have no choice but to push as much as possible into the database and do lots of audits. Thankfully, most real-world applications do not work this way. You provide some piece of software that accesses the database, and (in general) the only people that have command line access are DBAs and other admin types. In that case, you can relax some of the constraints or rules that are difficult to express in the database and move those out to the application. This can be good or bad. In the case of an old-school mainframe application, where you logged into a central system via your VT100 or TN3270 terminal and ran the application on the mainframe, the software was centralized and easy to maintain. A system-wide application upgrade consists of replacing one binary. This made it relatively safe to push rules and constraints into the application, as it was easy to keep the end-user application and the database -- including schema changes and format updates -- in sync. After all, both the DB and the application were sitting on one machine. Then came the era of desktop systems. Now it was common to have a GUI desktop application that was used to access and manipulate the database. The problem is, it is nearly impossible to keep every desktop system in perfect sync. If a schema change required an update to a query, that required a new version of the desktop application, which required a network-wide upgrade. Really fast, you learned to either make your upgrades backwards compatible OR you had an extremely simply automatic update system. Regardless, there were dangers associated with pushing too many high-level rules into the application. If someone managed to access the database with an older client that had a different set of rules, bad things could happen. To get around these problems people learned to push more and more logic into the database, including complex stored functions and procedures to do just about any update or adjustment and views for all but the most basic filtering of a query. This would allow for schema changes without client updates. As the web gained popularity, along with middle-ware and multi-tier architectures, we returned more or less to the model of "everything on one machine" or, at least, under one control. If the application is completely web driven, the problem of upgrading the SQL access application (e.g. the web server code) and the database are greatly reduced compared to the desktop model, allowing more and more logic to return to the code. So much of the answer to your question depends on the environment you're trying to support and how much control you have over all the bits of code that have direct access to the database. If you control the database AND the next tier down, pushing logic, rules, and constraints into that layer has fewer issues and ramifications than keeping them in the database. From a pure software engineering standpoint, the best solution is to keep the rules and constraints as close to the data as possible. Just like the Relational Model thinking, the fewer ways there are to screw up the data, the better. If you have control over the next layer you can "let it hang out" a bit. There is still the danger that some admin can screw something up with a command line tool, but in theory those people know what they're doing and either keep their mitts off, or are being constructive with their access. Either way, they shouldn't be doing application tasks via the command line. Actually, that can sometimes be useful if you need to over-ride a point of policy or something similar. That one exec that made an exception to the rules for themselves, and now that data needs to make it into the database somehow. Sometimes having the rules higher up allows the command line tools to deal with those cases. There is also the question of cost of change. If you need to alter a constraint, and that requires dropping and re-creating a whole table, that can be a big pain in the butt. Most database changes are. So if your application is still somewhat immature, or operates in an environment where the business rules and policies are in great flux, it is often better to push the rules out a level to where they are easier to audit/change/update. The main issue with that is that you expose a level of access that can get around the rules. All in all, I try to balance things out in my own designs. I try to push constraints and limitations that are inherent in the data model into the database. Basically try to keep the database from modeling the impossible. Business rules and policy usually gets implemented in the access layer. For one thing, business rules and policy have a tendency to change, while (if you did it right) the data model should more or less remain constant. The line is a bit more black and white if you're modeling some physical thing, like computers on a network, or items in a library. They get a bit more gray when you're modeling something that's defined by the business rules-- like what *exactly* an invoice is, and what it can contain. Still, you want to keep the absurd data out of the database and the fluid rules in more mailable code. It can still take a lot of auditing and control. With a complex application, it is all too easy to have two different functions or update procedures that access the same data in slightly different ways that apply a different set of update rules or sanity checking. As backwards as it sounds, if the logic is pushed out of the database, it is often even more important that all the queries and updates be written by one person (or audited by one person). If all the constraints are in the database, it keeps everyone -- including the application programmers -- from doing something silly. If the rules are enforced by the programmer(s), we assume they know what they're doing. This is often a weak assumption if you have a large(r) team working from a poorly documented data model (in other words, most software projects). SQLite fills a lot of roles that normal databases and DBAs don't have to deal with, which can present even more complex problems. For example, if you're using SQLite as an application file format, you need to get the file format pretty darn tight. If you find out version 2 of the application requires a new schema for the file format, not only do you need to write an automatic update and conversion function, you need to maintain that function for the rest of the application's lifetime. Same for every other schema change and update that changes the file version/format. If you're a professional software engineer, you've faced similar questions a hundred times before. As usual, the big question isn't so much the code you want to write today, but the update, maintenance, and roll-out issues in the future. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Our opponent is an alien starship packed with atomic bombs. We have a protractor." "I'll go home and see if I can scrounge up a ruler and a piece of string." --from Anathem by Neal Stephenson _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users