RE: [sqlite] Subject: Re: [sqlite] Let us make SQLite more powerful

2003-12-30 Thread Mrs. Brisby
On Tue, 2003-12-30 at 06:46, Steve O'Hara wrote:
> My understanding is that logically, by their very nature, joins are always
> likely to be slower than single table queries - I thought that this was a
> given downside to normalisation or am I incorrect?  Maybe the downside is
> not pronounced in SQLite but I'm sure that you see it in other RDBMS (if you
> crank up the results set volume enough).

Not necessarily; JOIN is an algebraic operation. Most RDBMS can handle
JOINs efficiently - especially with the aid of indexes. Some RDBMS even
translate "multi-occurrence columns" into JOIN operations transparently.

Some database engines cannot optimize all forms of JOIN, nor can they
optimize all syntax for a join, but these are deficiencies in the
database engine, not in SQL itself.


> > > > If you think this is easier to read, then consider creating VIEWs
> > > > to
> > > > store intermediate queries.
> > >
> > > I have never considered this as I am not a power user of SQLite.
> > > Sorry...!
> >
> > VIEWs are quite normal in other SQL providers. That said, you may want
> > to become more familiar with SQL before you start suggesting extensions
> > to it- at the very least, be aware that syntactical additions may very
> > well be fine, but you will face criticism if your reasoning is flawed.
> >
> 
> In SQLite, VIEWs are simply an expedient for expressing your query - there
> is no performance upside to using them opposed to a fully specified SQL
> command.

Agreed. They were covered because another reason for the "new syntax"
was because they're "easier to read" (something I happen to disagree
with).

> > If you have a table which has a value that can return multiple values
> > you MUST alter the API of the callback function or of the fetch
> > function. Otherwise how is older code supposed to deal with a table
> > having multiple values in an attribute?
> >
> 
> Why?
> Most systems will return you the whole column value with delimiter
> character(s) separating the sub-fields.
> I think the alterations to the API will not be in the callback at all, but
> WILL be in the query processor and index engine.
> We are talking about changing the search engine to delineate certain field
> values when querying.  It's easier for the indexer, which simply creates
> multiple postings for the same record but with different terms (each
> sub-field).

No, we weren't talking about that. That's one solution that I suggested
(using a user-defined function to split out various portions of the
value). We were talking about adding new syntax to SQL(ite) that
performs implicit joins making it possible to return and store multiple
_values_ within a single column/row.


> I fully understand your point about normalisation and fully understanding
> issues before pronouncing on them.
> However, I think that your not seeing the full picture of the usefulness of
> multi-occurance columns with the simple example mentioned previously.



> Take a trivial names and address scenario - each person can have multiple
> addresses, phone numbers, emails, fax numbers.
> In a normalised world where we have a table for each of these, you soon run
> into a complicated query if you want to find a person who has a particular
> address, a particular phone number, a particular email etc.  Don't we have
> to do unions of a number of joins?
> Whereas, in a multi-occurrence schema it's simply
>   select * from names where address='vghg' and phone='776' and email='jhjh'
> etc.

and how is the callback called for each _phone_ where:
select * from names where address='yghg';

How many times is callback called? for which rows? How do we tell which
inner-structure we are in? Is the callback going to get a folded string?
an incomplete one? or just some dummy blob to be passed to another
sqlite function?

These are very important questions that _must_ be resolved. If they
aren't- then compatibility is broken.

Thus I suggested moving it entirely into functions, that is:

select * from names where any_equals('address','yghg');

where any_equals() is a function that splits/checks values as
appropriate.

besides not making any changes to the callback API, this method is
future proof, although as you might have noticed, it cannot presently
take advantage of indexes (sqlite doesn't presently index functions,
IIRC).

if the question is simple enough- and all your operations are like
any_equals() you can optimize this to use the LIKE operator which
doesn't require any new functions AND can utilize indexes (my other
suggestion).



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [sqlite] Subject: Re: [sqlite] Let us make SQLite more powerful

2003-12-30 Thread Steve O'Hara
> -Original Message-
> From: Mrs. Brisby [mailto:[EMAIL PROTECTED]
> Sent: 29 December 2003 23:41
> To: Yogesh Vachhani
> Cc: [EMAIL PROTECTED]
> Subject: Re: [sqlite] Subject: Re: [sqlite] Let us make SQLite more
> powerful
>
>
> On Sat, 2003-12-27 at 07:16, Yogesh Vachhani wrote:
> > > Why exactly do you think the JOIN operator is the wrong tool for
> > > this?
> >
> > I feel JOIN operations are taxing on the Process as well as on RAM
> > and HDD (in form of temporary files)
>
> All right then. SQLite doesn't produce temporary files for JOIN, and
> your query _can_ be expressed relationally as a JOIN, so it should never
> be any faster than a JOIN operation- except due to parsing. Can you
> verify that parsing is taking "too long" here?
>

My understanding is that logically, by their very nature, joins are always
likely to be slower than single table queries - I thought that this was a
given downside to normalisation or am I incorrect?  Maybe the downside is
not pronounced in SQLite but I'm sure that you see it in other RDBMS (if you
crank up the results set volume enough).

>
> > > If you think this is easier to read, then consider creating VIEWs
> > > to
> > > store intermediate queries.
> >
> > I have never considered this as I am not a power user of SQLite.
> > Sorry...!
>
> VIEWs are quite normal in other SQL providers. That said, you may want
> to become more familiar with SQL before you start suggesting extensions
> to it- at the very least, be aware that syntactical additions may very
> well be fine, but you will face criticism if your reasoning is flawed.
>

In SQLite, VIEWs are simply an expedient for expressing your query - there
is no performance upside to using them opposed to a fully specified SQL
command.

>
> > > > So one can see that one would not lose anything by using SQLite's
> > > > extended storage model (if implemented). Even though the
> > > LineItems
> > > > information is embedded into the Invoices table, one can still
> > > write
> > > > a simple query to access it as if it were in an independent
> > > table.
> > >
> > > Backwards compatibility gets lost.
> > > API changes so programs need to be altered.
> > I did say at the start that we should extend the existing
> > functionality so that who are comfortable with the existing on will
> > keep on using this one and other can try out the other one!
> >
> > This not lead to lost in BACKWARD compatibility?!
>
> If you have a table which has a value that can return multiple values
> you MUST alter the API of the callback function or of the fetch
> function. Otherwise how is older code supposed to deal with a table
> having multiple values in an attribute?
>

Why?
Most systems will return you the whole column value with delimiter
character(s) separating the sub-fields.
I think the alterations to the API will not be in the callback at all, but
WILL be in the query processor and index engine.
We are talking about changing the search engine to delineate certain field
values when querying.  It's easier for the indexer, which simply creates
multiple postings for the same record but with different terms (each
sub-field).

>
> > > That's an idea
> > > anyway, it can already be done with SQLite by supplying your own
> > > function. Additionally, you could add lists like this yourself
> > > (pick a terminator, and make a function that indexes)
> >
> > By the way does  anyone know whether SQLite supports User Defined
> > Functions? If such a thing is possible then all such functionality
> > can be implementd out side SQLite in a separate DLL. What do u all
> > think?
>
> Yes, SQLite supports "user-defined" functions. The Wiki has a great deal
> of documentation about this...
>
> > > Generally speaking, I think that trying to store an unknown number
> > > of distinct values in one record column is a bad idea, and it
> > > violates level 1 normalization.  You really should use a separate
> > > table for the phone numbers and/or addresses, one record per
> > > instance.
> >
> > It does not violate some of the normalization rules but then how many
> > follow this pracitcally. In fact I have noticed that in practicaly
> > implementations many times a developer has to avoid normalization
> > rules to implement some pecular requirements of their customers
>
> I do not think you know what you are talking about. "level 1
> normalization" comes from graph theory and describes a manner in which
> every graph can be translated into

Re: [sqlite] Subject: Re: [sqlite] Let us make SQLite more powerful

2003-12-29 Thread Mrs. Brisby
On Sat, 2003-12-27 at 07:16, Yogesh Vachhani wrote:
> > Why exactly do you think the JOIN operator is the wrong tool for
> > this?
> 
> I feel JOIN operations are taxing on the Process as well as on RAM
> and HDD (in form of temporary files)

All right then. SQLite doesn't produce temporary files for JOIN, and
your query _can_ be expressed relationally as a JOIN, so it should never
be any faster than a JOIN operation- except due to parsing. Can you
verify that parsing is taking "too long" here?


> > If you think this is easier to read, then consider creating VIEWs
> > to
> > store intermediate queries.
> 
> I have never considered this as I am not a power user of SQLite.
> Sorry...!

VIEWs are quite normal in other SQL providers. That said, you may want
to become more familiar with SQL before you start suggesting extensions
to it- at the very least, be aware that syntactical additions may very
well be fine, but you will face criticism if your reasoning is flawed.


> > > So one can see that one would not lose anything by using SQLite's
> > > extended storage model (if implemented). Even though the
> > LineItems
> > > information is embedded into the Invoices table, one can still
> > write
> > > a simple query to access it as if it were in an independent
> > table.
> > 
> > Backwards compatibility gets lost.
> > API changes so programs need to be altered.
> I did say at the start that we should extend the existing
> functionality so that who are comfortable with the existing on will
> keep on using this one and other can try out the other one!
> 
> This not lead to lost in BACKWARD compatibility?!

If you have a table which has a value that can return multiple values
you MUST alter the API of the callback function or of the fetch
function. Otherwise how is older code supposed to deal with a table
having multiple values in an attribute?


> > That's an idea
> > anyway, it can already be done with SQLite by supplying your own
> > function. Additionally, you could add lists like this yourself
> > (pick a terminator, and make a function that indexes)
> 
> By the way does  anyone know whether SQLite supports User Defined
> Functions? If such a thing is possible then all such functionality
> can be implementd out side SQLite in a separate DLL. What do u all
> think?

Yes, SQLite supports "user-defined" functions. The Wiki has a great deal
of documentation about this...

> > Generally speaking, I think that trying to store an unknown number
> > of distinct values in one record column is a bad idea, and it
> > violates level 1 normalization.  You really should use a separate
> > table for the phone numbers and/or addresses, one record per
> > instance.
> 
> It does not violate some of the normalization rules but then how many
> follow this pracitcally. In fact I have noticed that in practicaly
> implementations many times a developer has to avoid normalization
> rules to implement some pecular requirements of their customers

I do not think you know what you are talking about. "level 1
normalization" comes from graph theory and describes a manner in which
every graph can be translated into another graph as utilizing no nested
functions - which are called "level 1 normal forms".

I don't think the responder quite spoke correctly either. The structure
you desire _can_ be folded down (normalized) into the view normal people
have of SQL: You aren't suggesting that anything be made available that
isn't presently available- but you want to write these systems without
learning SQL, or because you believe giving syntactical hints will make
queries run faster.

I do think that this reasoning is erroneous. Profile, don't speculate,
and get the very basics of relational calculus down before you decide
SQL is too primitive to support the data structures you need (hint:
there _are_ many structures that map very poorly to SQL. the one you
selected however, maps very _nicely_ to SQL).


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: [sqlite] Let us make SQLite more powerful

2003-12-15 Thread Mrs. Brisby
All I was demonstrating was how to use a single field. It's obviously
better to use a second table and build a relationship using JOIN.

I don't see where "multi-occurrence" fields can do something that
JOIN-ed relationships can't.

SELECT LastName FROM Contacts JOIN Phones USING (phoneID) HAVING
PhoneNumber='2064814442';

or perhaps:

SELECT LastName FROM Contacts WHERE phoneID IN (SELECT phoneID FROM
Phones WHERE PhoneNumber='2064814442';

Thus it would seem that this can simply be implemented in terms of
syntax (automatic table creation, rewriting queries for implicit joins,
etc).

You'll note I never said that this feature is "uncommon" - merely
unnecessary, kludgy, and queries written in this manner are difficult to
read, and as at least you've noticed, difficult to port. It would be
nice for the engine to do it automatically for you.

But it would make the engine more complex- for a direction that isn't
standard, that doesn't add the ability to answer questions that
previously could not, nor the ability to answer questions faster than
before. That is, as near as I can tell, it only provides the ability to
answer questions with less typing.

Even if BASIS is slower for these implicit joins, that doesn't mean
other systems are (PostgreSQL isn't, for example). What it does mean is
that BASIS has a crappy query optimizer.

That said, I really would like to see user-defined "structures"- but I
suspect this will have to wait for SQLite 3.0 or whenever we get
non-null-terminating values...


On Mon, 2003-12-15 at 07:25, Steve O'Hara wrote:
> Here's my tuppence worth;
> 
> I've been working with BASIS for donkeys years and it is one of the few
> RDBMS that fully supports multi-occurence fields.
> To search for a value in such a field you simply say
> SELECT LastName FROM Contacts
> WHERE Phones = "2064814442"
> This means search in any occurance within Phones
> 
> If you want a specific entry then you say
> SELECT LastName FROM Contacts
> WHERE Phones(3) = "2064814442"
> 
> Interestingly, if you have a number of fields that have matched values e.g.
> Phone, Extension etc you can do this
> SELECT LastName FROM Contacts
> WHERE Phones="2064814442" WITH Extension="234"
> This means find records where the same occurance of Phones and Extension
> meets the criteria.
> 
> Although BASIS is fully relational, this feature is very commonly used
> instead of creating linked lookup tables and doing a join, because of the
> poorer performance of joins. The techniques shown here by Mrs Brisby are
> fine but because they use LIKE, they don't use an index and so will be very
> slow for large databases.
> 
> However, there are some disadvantages to this stuff;
> 
>   Multi-occurance fields are not common to RDBMS
> 
>   Sorting can be interesting (you might have to do an "exploding" sort) e.g.
> SELECT LastName FROM Contacts
> WHERE Phones = "2064814442" ORDER BY EVERY Phones
>or
> SELECT LastName FROM Contacts
> WHERE Phones = "2064814442" ORDER BY Phones(1)
> 
>   More complexity in SQLite
> 
> Steve
> 
> 
> 
> 
> -Original Message-
> From: Mrs. Brisby [mailto:[EMAIL PROTECTED]
> Sent: 12 December 2003 13:56
> To: Yogesh Vachhani
> Cc: [EMAIL PROTECTED]
> Subject: Re: [sqlite] Let us make SQLite more powerful
> 
> 
> On Fri, 2003-12-12 at 08:09, Yogesh Vachhani wrote:
> > For example, if the Phones column is a collection, one can find all
> > rows that contain the phone number 206-481-4442 by specifying the
> > following SELECT statement:
> >
> > SELECT LastName
> > FROM Contacts
> > WHERE Phones(ANY) = "2064814442"
> 
> Why not encode Phones as ":phonenumber:"
> that way you can already do:
> SELECT LastName FROM Contacts WHERE Phones LIKE '%:phonenumber:%'
> 
> (btw, your quoting is wrong)
> 
> > If one always store fax numbers in the third element of the Phones
> > collection, one could find all rows that contain a particular fax
> > number
> > like this:
> >
> > SELECT LastName
> > FROM Contacts
> > WHERE Phones(3) = "2064814442"
> 
> SELECT LastName FROM Contacts WHERE Phones LIKE ':%::%::phonenumber:%';
> 
> 
> > For example, lets return to the contact database, which contains a
> > Phones column whose
> > data type is PhoneStructure, so the Phones column contains the
> > columns PhoneType and PhoneNum. The following query could returns
> > just the 

Re: [sqlite] Let us make SQLite more powerful

2003-12-13 Thread Darren Duncan
Generally speaking, I think that trying to store an unknown number of distinct values 
in one record column is a bad idea, and it violates level 1 normalization.  You really 
should use a separate table for the phone numbers and/or addresses, one record per 
instance.

However, I don't mind extensions to the current functionality as long as they conform 
to the SQL standard.  If you look at the SQL-99 standard, on which the ANSI and ISO 
ones are identical, they include support for user-defined data types.  So if SQLite is 
going to implement them, it should conform to standard.

In regards to joins returning duplicate data in multiple columns from one table, the 
solution to this is simple: do two selects.  One select gets one copy of the 
non-multi-instance data from the master table, and the second gets the 
multiple-instance data like phone numbers and addresses.  Just because you are doing a 
join, you don't have to return every column the join can see.

-- Darren Duncan

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [sqlite] Let us make SQLite more powerful

2003-12-12 Thread Mrs. Brisby
On Fri, 2003-12-12 at 08:09, Yogesh Vachhani wrote:
> For example, if the Phones column is a collection, one can find all
> rows that contain the phone number 206-481-4442 by specifying the
> following SELECT statement:
> 
> SELECT LastName 
>   FROM Contacts 
>   WHERE Phones(ANY) = "2064814442"

Why not encode Phones as ":phonenumber:"
that way you can already do:
SELECT LastName FROM Contacts WHERE Phones LIKE '%:phonenumber:%'

(btw, your quoting is wrong)

> If one always store fax numbers in the third element of the Phones
> collection, one could find all rows that contain a particular fax
> number
> like this:
> 
> SELECT LastName
>   FROM Contacts
>   WHERE Phones(3) = "2064814442"

SELECT LastName FROM Contacts WHERE Phones LIKE ':%::%::phonenumber:%';


> For example, lets return to the contact database, which contains a
> Phones column whose
> data type is PhoneStructure, so the Phones column contains the
> columns PhoneType and PhoneNum. The following query could returns
> just the phone number 206-402-9400 and not the phone type for that
> number:
> 
> SELECT Phones(*)!PhoneNum
>   FROM Contacts
>   WHERE LastName = "Calder"

This didn't require any special syntax.

> The following query would return the phone numbers for every contact
> who has a fax number:
> 
> SELECT Phones(*)!PhoneNum
>   FROM Contacts
>   WHERE Phones(ANY).PhoneType = "fax"

Instead of encoding phone numbers as :phonenumber:, you could also use
:type=phonenumber:

> Suppose that we put the PhoneType and PhoneNum in a separate table
> instead of in a PhoneStructure. To tie rows in the Phones table to a
> contact in the Contacts table, we then add a RefTo collection column
> called PhoneRef to the Contacts table. This RefTo column contains the
> Row IDs of one or more rows in the Phones table. To access phone
> numbers in the Phones table, one uses an identical query (except for
> minor name changes for clarity):
> 
> SELECT PhoneRef(*)!PhoneNum
>   FROM Contacts
>   WHERE PhoneRef(ANY).PhoneType = "fax"

This is redundant.


> One can query data in a collection or structure-valued column just as
> one can query data in a conventional table. To the query interface, a
> collection of structures should appears as a logical table with its
> parent table name automatically prefixed. So, for instance, a query
> to find all LineItems in the Invoices table would look like this:
> 
> SELECT PartNo, QtyOrdered, UnitPrice, QtyOnHand, EditDate
>   FROM Invoices.LineItems

Why exactly do you think the JOIN operator is the wrong tool for this?
If you think this is easier to read, then consider creating VIEWs to
store intermediate queries.


> So one can see that one would not lose anything by using SQLite's
> extended storage model (if implemented). Even though the LineItems
> information is embedded into the Invoices table, one can still write
> a simple query to access it as if it were in an independent table.

Backwards compatibility gets lost.
API changes so programs need to be altered.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[sqlite] Let us make SQLite more powerful

2003-12-12 Thread Yogesh Vachhani
Hello everyone,

This is espcially for Dr. Hipp.

First I will try to highlite the reasons as to why I am raising this
topic just to establish proper grounds for what I am going to Say and
Suggest.

I am a software developer using VB as my main development tool and an
trying to learn Delphi but the process is very slow for me as
learning Delphi does'nt come in my priority. I don't know C or C++
(in fact I left C/C++ behind when I saw VB version 2.0) and am not
well versed with SQL scripting either.

I also work as a freelance consultant for a few firms here in my
region (Rajkot, Gujarat, INDIA). My areas of consultancy are
computerising business operatins, helping my clients to convert their
manual process to computer, help clients in purchasing the best
computer harware/software appropriate to their business requirement,
managing software development of clients and helping them develop
their own software inhouse by their own inhouse comuter talents,
suggest them which back end database to use for their requirements,
etc.. etc...

I have been following SQLite's progress for quite some time now. I
have got a few IDEAS regarding enhancements to SQLite to make it even
more powerfull and faster than what it is today.

Before I go any further. Let me confess that I have never used SQLite
in any commercial projects because I feel it requires more work to
use SQLite in VB. And for people like me working in VB the question
of SIZE is immaterial, what we people prefer the mose it ease of use.


The features that I am suggesting here will not affect the current
functionality of SQLite in any but it will just enhance it in multi
folds. So here are my IDEAS:

---


Storing Collections Of Values In A Single Column
-
Traditional relational databases restrict one to storing one value in
each column. Let us add such a facility to SQLite to store
collections of values in a single column. With collections, one would
no longer need to create multiple columns (or separate tables) for
multiple street addresses, phone numbers, status codes, and so on.
Instead, one just define a column as a collection, and can store as
many or as few values in that column as one wants. There should be no
limit on the number of values a collection column can hold.

I think Collection columns are well suited for situations where one
has an unknown number of data items of the same type. Consider the
case of the Phone column. Instead of allowing only one telephone
number or perhaps a small, fixed number (Phone1, Phone2, and so on),
one can define the Phone column as a collection. If some rows have a
dozen phone numbers while most others have only one or two, a
collection works well (I feel so). When multiple values are added to
the collection, SQLite would (probably) delimits multiple values with
a separator character.

I think that in many cases, one can use collections to replace
foreign tables and, thereby, eliminate joins that would have to be
executed to retrive data. This can dramatically improve query
performance. 

To query collections, SQLite can inmplement an ANY clause in the
SELECT statement as an extension to ANSI SQL to conveniently access
the collection values either individually or as a group.


Storing User-Defined Structures In A Single Column
--
SQLite can provide facility to allow the user to create and define
structures, or groups of columns separately (kind of separate logical
table), and store those structures in a single column in the main
table. This grouping can make it easier to define tables that happen
to contain the same kind of information. Structures should be allowed
to include other structures, but should not be allowed to be defined
recursively. As per previous suggestion just like normal columns,
structures can be collections. In effect, this lets one nest an
entire table in a single column. Moreover, because one can nest
collections and structures to any depth, one can store even highly
complex objects in a single column.

At the SQL from just add an extensions that lets one access the
entire structure as a single entity, or directly access specific
components within the structure.

For example, one might define an Address structure that contains
Street, City, State, and ZipCode columns. Let us assume that this
Address structure appears both in the Customers table and in the
Employees table. Say when one's company goes multi-national and one
need to redefine the Address structure to allow for international
address formats generally one whould have to make changes in all such
Tables which contain such informations so loking at the example one
would have to make changes in teh Customer Table as well as in
emplouees Table. Imagine how easy it can get as one will only have to
make change in the schema in one place: the definition