RE: [sqlite] Subject: Re: [sqlite] Let us make SQLite more powerful
On Tue, 2003-12-30 at 06:46, Steve O'Hara wrote: > My understanding is that logically, by their very nature, joins are always > likely to be slower than single table queries - I thought that this was a > given downside to normalisation or am I incorrect? Maybe the downside is > not pronounced in SQLite but I'm sure that you see it in other RDBMS (if you > crank up the results set volume enough). Not necessarily; JOIN is an algebraic operation. Most RDBMS can handle JOINs efficiently - especially with the aid of indexes. Some RDBMS even translate "multi-occurrence columns" into JOIN operations transparently. Some database engines cannot optimize all forms of JOIN, nor can they optimize all syntax for a join, but these are deficiencies in the database engine, not in SQL itself. > > > > If you think this is easier to read, then consider creating VIEWs > > > > to > > > > store intermediate queries. > > > > > > I have never considered this as I am not a power user of SQLite. > > > Sorry...! > > > > VIEWs are quite normal in other SQL providers. That said, you may want > > to become more familiar with SQL before you start suggesting extensions > > to it- at the very least, be aware that syntactical additions may very > > well be fine, but you will face criticism if your reasoning is flawed. > > > > In SQLite, VIEWs are simply an expedient for expressing your query - there > is no performance upside to using them opposed to a fully specified SQL > command. Agreed. They were covered because another reason for the "new syntax" was because they're "easier to read" (something I happen to disagree with). > > If you have a table which has a value that can return multiple values > > you MUST alter the API of the callback function or of the fetch > > function. Otherwise how is older code supposed to deal with a table > > having multiple values in an attribute? > > > > Why? > Most systems will return you the whole column value with delimiter > character(s) separating the sub-fields. > I think the alterations to the API will not be in the callback at all, but > WILL be in the query processor and index engine. > We are talking about changing the search engine to delineate certain field > values when querying. It's easier for the indexer, which simply creates > multiple postings for the same record but with different terms (each > sub-field). No, we weren't talking about that. That's one solution that I suggested (using a user-defined function to split out various portions of the value). We were talking about adding new syntax to SQL(ite) that performs implicit joins making it possible to return and store multiple _values_ within a single column/row. > I fully understand your point about normalisation and fully understanding > issues before pronouncing on them. > However, I think that your not seeing the full picture of the usefulness of > multi-occurance columns with the simple example mentioned previously. > Take a trivial names and address scenario - each person can have multiple > addresses, phone numbers, emails, fax numbers. > In a normalised world where we have a table for each of these, you soon run > into a complicated query if you want to find a person who has a particular > address, a particular phone number, a particular email etc. Don't we have > to do unions of a number of joins? > Whereas, in a multi-occurrence schema it's simply > select * from names where address='vghg' and phone='776' and email='jhjh' > etc. and how is the callback called for each _phone_ where: select * from names where address='yghg'; How many times is callback called? for which rows? How do we tell which inner-structure we are in? Is the callback going to get a folded string? an incomplete one? or just some dummy blob to be passed to another sqlite function? These are very important questions that _must_ be resolved. If they aren't- then compatibility is broken. Thus I suggested moving it entirely into functions, that is: select * from names where any_equals('address','yghg'); where any_equals() is a function that splits/checks values as appropriate. besides not making any changes to the callback API, this method is future proof, although as you might have noticed, it cannot presently take advantage of indexes (sqlite doesn't presently index functions, IIRC). if the question is simple enough- and all your operations are like any_equals() you can optimize this to use the LIKE operator which doesn't require any new functions AND can utilize indexes (my other suggestion). - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [sqlite] Subject: Re: [sqlite] Let us make SQLite more powerful
> -Original Message- > From: Mrs. Brisby [mailto:[EMAIL PROTECTED] > Sent: 29 December 2003 23:41 > To: Yogesh Vachhani > Cc: [EMAIL PROTECTED] > Subject: Re: [sqlite] Subject: Re: [sqlite] Let us make SQLite more > powerful > > > On Sat, 2003-12-27 at 07:16, Yogesh Vachhani wrote: > > > Why exactly do you think the JOIN operator is the wrong tool for > > > this? > > > > I feel JOIN operations are taxing on the Process as well as on RAM > > and HDD (in form of temporary files) > > All right then. SQLite doesn't produce temporary files for JOIN, and > your query _can_ be expressed relationally as a JOIN, so it should never > be any faster than a JOIN operation- except due to parsing. Can you > verify that parsing is taking "too long" here? > My understanding is that logically, by their very nature, joins are always likely to be slower than single table queries - I thought that this was a given downside to normalisation or am I incorrect? Maybe the downside is not pronounced in SQLite but I'm sure that you see it in other RDBMS (if you crank up the results set volume enough). > > > > If you think this is easier to read, then consider creating VIEWs > > > to > > > store intermediate queries. > > > > I have never considered this as I am not a power user of SQLite. > > Sorry...! > > VIEWs are quite normal in other SQL providers. That said, you may want > to become more familiar with SQL before you start suggesting extensions > to it- at the very least, be aware that syntactical additions may very > well be fine, but you will face criticism if your reasoning is flawed. > In SQLite, VIEWs are simply an expedient for expressing your query - there is no performance upside to using them opposed to a fully specified SQL command. > > > > > So one can see that one would not lose anything by using SQLite's > > > > extended storage model (if implemented). Even though the > > > LineItems > > > > information is embedded into the Invoices table, one can still > > > write > > > > a simple query to access it as if it were in an independent > > > table. > > > > > > Backwards compatibility gets lost. > > > API changes so programs need to be altered. > > I did say at the start that we should extend the existing > > functionality so that who are comfortable with the existing on will > > keep on using this one and other can try out the other one! > > > > This not lead to lost in BACKWARD compatibility?! > > If you have a table which has a value that can return multiple values > you MUST alter the API of the callback function or of the fetch > function. Otherwise how is older code supposed to deal with a table > having multiple values in an attribute? > Why? Most systems will return you the whole column value with delimiter character(s) separating the sub-fields. I think the alterations to the API will not be in the callback at all, but WILL be in the query processor and index engine. We are talking about changing the search engine to delineate certain field values when querying. It's easier for the indexer, which simply creates multiple postings for the same record but with different terms (each sub-field). > > > > That's an idea > > > anyway, it can already be done with SQLite by supplying your own > > > function. Additionally, you could add lists like this yourself > > > (pick a terminator, and make a function that indexes) > > > > By the way does anyone know whether SQLite supports User Defined > > Functions? If such a thing is possible then all such functionality > > can be implementd out side SQLite in a separate DLL. What do u all > > think? > > Yes, SQLite supports "user-defined" functions. The Wiki has a great deal > of documentation about this... > > > > Generally speaking, I think that trying to store an unknown number > > > of distinct values in one record column is a bad idea, and it > > > violates level 1 normalization. You really should use a separate > > > table for the phone numbers and/or addresses, one record per > > > instance. > > > > It does not violate some of the normalization rules but then how many > > follow this pracitcally. In fact I have noticed that in practicaly > > implementations many times a developer has to avoid normalization > > rules to implement some pecular requirements of their customers > > I do not think you know what you are talking about. "level 1 > normalization" comes from graph theory and describes a manner in which > every graph can be translated into
Re: [sqlite] Subject: Re: [sqlite] Let us make SQLite more powerful
On Sat, 2003-12-27 at 07:16, Yogesh Vachhani wrote: > > Why exactly do you think the JOIN operator is the wrong tool for > > this? > > I feel JOIN operations are taxing on the Process as well as on RAM > and HDD (in form of temporary files) All right then. SQLite doesn't produce temporary files for JOIN, and your query _can_ be expressed relationally as a JOIN, so it should never be any faster than a JOIN operation- except due to parsing. Can you verify that parsing is taking "too long" here? > > If you think this is easier to read, then consider creating VIEWs > > to > > store intermediate queries. > > I have never considered this as I am not a power user of SQLite. > Sorry...! VIEWs are quite normal in other SQL providers. That said, you may want to become more familiar with SQL before you start suggesting extensions to it- at the very least, be aware that syntactical additions may very well be fine, but you will face criticism if your reasoning is flawed. > > > So one can see that one would not lose anything by using SQLite's > > > extended storage model (if implemented). Even though the > > LineItems > > > information is embedded into the Invoices table, one can still > > write > > > a simple query to access it as if it were in an independent > > table. > > > > Backwards compatibility gets lost. > > API changes so programs need to be altered. > I did say at the start that we should extend the existing > functionality so that who are comfortable with the existing on will > keep on using this one and other can try out the other one! > > This not lead to lost in BACKWARD compatibility?! If you have a table which has a value that can return multiple values you MUST alter the API of the callback function or of the fetch function. Otherwise how is older code supposed to deal with a table having multiple values in an attribute? > > That's an idea > > anyway, it can already be done with SQLite by supplying your own > > function. Additionally, you could add lists like this yourself > > (pick a terminator, and make a function that indexes) > > By the way does anyone know whether SQLite supports User Defined > Functions? If such a thing is possible then all such functionality > can be implementd out side SQLite in a separate DLL. What do u all > think? Yes, SQLite supports "user-defined" functions. The Wiki has a great deal of documentation about this... > > Generally speaking, I think that trying to store an unknown number > > of distinct values in one record column is a bad idea, and it > > violates level 1 normalization. You really should use a separate > > table for the phone numbers and/or addresses, one record per > > instance. > > It does not violate some of the normalization rules but then how many > follow this pracitcally. In fact I have noticed that in practicaly > implementations many times a developer has to avoid normalization > rules to implement some pecular requirements of their customers I do not think you know what you are talking about. "level 1 normalization" comes from graph theory and describes a manner in which every graph can be translated into another graph as utilizing no nested functions - which are called "level 1 normal forms". I don't think the responder quite spoke correctly either. The structure you desire _can_ be folded down (normalized) into the view normal people have of SQL: You aren't suggesting that anything be made available that isn't presently available- but you want to write these systems without learning SQL, or because you believe giving syntactical hints will make queries run faster. I do think that this reasoning is erroneous. Profile, don't speculate, and get the very basics of relational calculus down before you decide SQL is too primitive to support the data structures you need (hint: there _are_ many structures that map very poorly to SQL. the one you selected however, maps very _nicely_ to SQL). - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: [sqlite] Let us make SQLite more powerful
All I was demonstrating was how to use a single field. It's obviously better to use a second table and build a relationship using JOIN. I don't see where "multi-occurrence" fields can do something that JOIN-ed relationships can't. SELECT LastName FROM Contacts JOIN Phones USING (phoneID) HAVING PhoneNumber='2064814442'; or perhaps: SELECT LastName FROM Contacts WHERE phoneID IN (SELECT phoneID FROM Phones WHERE PhoneNumber='2064814442'; Thus it would seem that this can simply be implemented in terms of syntax (automatic table creation, rewriting queries for implicit joins, etc). You'll note I never said that this feature is "uncommon" - merely unnecessary, kludgy, and queries written in this manner are difficult to read, and as at least you've noticed, difficult to port. It would be nice for the engine to do it automatically for you. But it would make the engine more complex- for a direction that isn't standard, that doesn't add the ability to answer questions that previously could not, nor the ability to answer questions faster than before. That is, as near as I can tell, it only provides the ability to answer questions with less typing. Even if BASIS is slower for these implicit joins, that doesn't mean other systems are (PostgreSQL isn't, for example). What it does mean is that BASIS has a crappy query optimizer. That said, I really would like to see user-defined "structures"- but I suspect this will have to wait for SQLite 3.0 or whenever we get non-null-terminating values... On Mon, 2003-12-15 at 07:25, Steve O'Hara wrote: > Here's my tuppence worth; > > I've been working with BASIS for donkeys years and it is one of the few > RDBMS that fully supports multi-occurence fields. > To search for a value in such a field you simply say > SELECT LastName FROM Contacts > WHERE Phones = "2064814442" > This means search in any occurance within Phones > > If you want a specific entry then you say > SELECT LastName FROM Contacts > WHERE Phones(3) = "2064814442" > > Interestingly, if you have a number of fields that have matched values e.g. > Phone, Extension etc you can do this > SELECT LastName FROM Contacts > WHERE Phones="2064814442" WITH Extension="234" > This means find records where the same occurance of Phones and Extension > meets the criteria. > > Although BASIS is fully relational, this feature is very commonly used > instead of creating linked lookup tables and doing a join, because of the > poorer performance of joins. The techniques shown here by Mrs Brisby are > fine but because they use LIKE, they don't use an index and so will be very > slow for large databases. > > However, there are some disadvantages to this stuff; > > Multi-occurance fields are not common to RDBMS > > Sorting can be interesting (you might have to do an "exploding" sort) e.g. > SELECT LastName FROM Contacts > WHERE Phones = "2064814442" ORDER BY EVERY Phones >or > SELECT LastName FROM Contacts > WHERE Phones = "2064814442" ORDER BY Phones(1) > > More complexity in SQLite > > Steve > > > > > -Original Message- > From: Mrs. Brisby [mailto:[EMAIL PROTECTED] > Sent: 12 December 2003 13:56 > To: Yogesh Vachhani > Cc: [EMAIL PROTECTED] > Subject: Re: [sqlite] Let us make SQLite more powerful > > > On Fri, 2003-12-12 at 08:09, Yogesh Vachhani wrote: > > For example, if the Phones column is a collection, one can find all > > rows that contain the phone number 206-481-4442 by specifying the > > following SELECT statement: > > > > SELECT LastName > > FROM Contacts > > WHERE Phones(ANY) = "2064814442" > > Why not encode Phones as ":phonenumber:" > that way you can already do: > SELECT LastName FROM Contacts WHERE Phones LIKE '%:phonenumber:%' > > (btw, your quoting is wrong) > > > If one always store fax numbers in the third element of the Phones > > collection, one could find all rows that contain a particular fax > > number > > like this: > > > > SELECT LastName > > FROM Contacts > > WHERE Phones(3) = "2064814442" > > SELECT LastName FROM Contacts WHERE Phones LIKE ':%::%::phonenumber:%'; > > > > For example, lets return to the contact database, which contains a > > Phones column whose > > data type is PhoneStructure, so the Phones column contains the > > columns PhoneType and PhoneNum. The following query could returns > > just the
Re: [sqlite] Let us make SQLite more powerful
Generally speaking, I think that trying to store an unknown number of distinct values in one record column is a bad idea, and it violates level 1 normalization. You really should use a separate table for the phone numbers and/or addresses, one record per instance. However, I don't mind extensions to the current functionality as long as they conform to the SQL standard. If you look at the SQL-99 standard, on which the ANSI and ISO ones are identical, they include support for user-defined data types. So if SQLite is going to implement them, it should conform to standard. In regards to joins returning duplicate data in multiple columns from one table, the solution to this is simple: do two selects. One select gets one copy of the non-multi-instance data from the master table, and the second gets the multiple-instance data like phone numbers and addresses. Just because you are doing a join, you don't have to return every column the join can see. -- Darren Duncan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [sqlite] Let us make SQLite more powerful
On Fri, 2003-12-12 at 08:09, Yogesh Vachhani wrote: > For example, if the Phones column is a collection, one can find all > rows that contain the phone number 206-481-4442 by specifying the > following SELECT statement: > > SELECT LastName > FROM Contacts > WHERE Phones(ANY) = "2064814442" Why not encode Phones as ":phonenumber:" that way you can already do: SELECT LastName FROM Contacts WHERE Phones LIKE '%:phonenumber:%' (btw, your quoting is wrong) > If one always store fax numbers in the third element of the Phones > collection, one could find all rows that contain a particular fax > number > like this: > > SELECT LastName > FROM Contacts > WHERE Phones(3) = "2064814442" SELECT LastName FROM Contacts WHERE Phones LIKE ':%::%::phonenumber:%'; > For example, lets return to the contact database, which contains a > Phones column whose > data type is PhoneStructure, so the Phones column contains the > columns PhoneType and PhoneNum. The following query could returns > just the phone number 206-402-9400 and not the phone type for that > number: > > SELECT Phones(*)!PhoneNum > FROM Contacts > WHERE LastName = "Calder" This didn't require any special syntax. > The following query would return the phone numbers for every contact > who has a fax number: > > SELECT Phones(*)!PhoneNum > FROM Contacts > WHERE Phones(ANY).PhoneType = "fax" Instead of encoding phone numbers as :phonenumber:, you could also use :type=phonenumber: > Suppose that we put the PhoneType and PhoneNum in a separate table > instead of in a PhoneStructure. To tie rows in the Phones table to a > contact in the Contacts table, we then add a RefTo collection column > called PhoneRef to the Contacts table. This RefTo column contains the > Row IDs of one or more rows in the Phones table. To access phone > numbers in the Phones table, one uses an identical query (except for > minor name changes for clarity): > > SELECT PhoneRef(*)!PhoneNum > FROM Contacts > WHERE PhoneRef(ANY).PhoneType = "fax" This is redundant. > One can query data in a collection or structure-valued column just as > one can query data in a conventional table. To the query interface, a > collection of structures should appears as a logical table with its > parent table name automatically prefixed. So, for instance, a query > to find all LineItems in the Invoices table would look like this: > > SELECT PartNo, QtyOrdered, UnitPrice, QtyOnHand, EditDate > FROM Invoices.LineItems Why exactly do you think the JOIN operator is the wrong tool for this? If you think this is easier to read, then consider creating VIEWs to store intermediate queries. > So one can see that one would not lose anything by using SQLite's > extended storage model (if implemented). Even though the LineItems > information is embedded into the Invoices table, one can still write > a simple query to access it as if it were in an independent table. Backwards compatibility gets lost. API changes so programs need to be altered. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]