Re: [sqlite] Creating indexes on non-existent columns with double quotes

2017-01-25 Thread Ersin Akinci
On Wed, Jan 25, 2017 at 9:55 AM, Simon Slavin  wrote:
>
> On 25 Jan 2017, at 5:45pm, Ersin Akinci  wrote:
>
>> Thanks Richard and Simon for your insights. I think I'm still missing
>> a few things, though.
>>
>> 1. What does it mean when SQLite tries to create an index on a string?
>> Simon suggested that it's creating a calculated index, but I'm not
>> sure what means. (Does it just mean an "index" literally just on the
>> word "yearz_doesnt_exist"? That seems quite silly.)
>
> SQLite supports indexes on calculations.  Or other expressions.  So you 
> should be able to do things like
>
> CREATE INDEX fred ON MyTable (CASE WHEN stock < 0 THEN 0 ELSE stock END)
>
> Since a constant string is an expression, and single quotes must be used to 
> quote strings, this means you can do something like
>
> CREATE INDEX mabel ON MyTable ('hello mom')
>
> It won’t do anything useful, but it should work.
>
>> 2. And regardless, why is there a different result when using single
>> quotes vs. double quotes?
>
> The two quotes mean completely different things to SQLite.  Single quotes are 
> used to quote strings.  Double quotes are used to quote entity names (like 
> column names, table names, etc.).  Properly speaking the version of your 
> command with double quotes should be rejected for the reason you expected 
> but, as DRH explained, for historical reasons if there’s no column with the 
> given name SQLite thinks you want the contents of the quotes as a string.  
> And that’s what it’s doing.
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@mailinglists.sqlite.org
> http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users

Aha. Makes sense. And as you noted in your bug report, Simon, the
single quoted version should create a computed index, as well, but for
some reason it doesn't.

Thanks for the clarification!

-Ersin

-- 
Ersin Y. Akinci -- ersinakinci.com
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Creating indexes on non-existent columns with double quotes

2017-01-25 Thread Simon Slavin

On 25 Jan 2017, at 5:45pm, Ersin Akinci  wrote:

> Thanks Richard and Simon for your insights. I think I'm still missing
> a few things, though.
> 
> 1. What does it mean when SQLite tries to create an index on a string?
> Simon suggested that it's creating a calculated index, but I'm not
> sure what means. (Does it just mean an "index" literally just on the
> word "yearz_doesnt_exist"? That seems quite silly.)

SQLite supports indexes on calculations.  Or other expressions.  So you should 
be able to do things like

CREATE INDEX fred ON MyTable (CASE WHEN stock < 0 THEN 0 ELSE stock END)

Since a constant string is an expression, and single quotes must be used to 
quote strings, this means you can do something like

CREATE INDEX mabel ON MyTable ('hello mom')

It won’t do anything useful, but it should work.

> 2. And regardless, why is there a different result when using single
> quotes vs. double quotes?

The two quotes mean completely different things to SQLite.  Single quotes are 
used to quote strings.  Double quotes are used to quote entity names (like 
column names, table names, etc.).  Properly speaking the version of your 
command with double quotes should be rejected for the reason you expected but, 
as DRH explained, for historical reasons if there’s no column with the given 
name SQLite thinks you want the contents of the quotes as a string.  And that’s 
what it’s doing.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Creating indexes on non-existent columns with double quotes

2017-01-25 Thread Ersin Akinci
Thanks Richard and Simon for your insights. I think I'm still missing
a few things, though.

1. What does it mean when SQLite tries to create an index on a string?
Simon suggested that it's creating a calculated index, but I'm not
sure what means. (Does it just mean an "index" literally just on the
word "yearz_doesnt_exist"? That seems quite silly.) My assumption is
that a calculated index is just an index on a view or some other kind
of dynamically rendered column, but according to
http://sqlite.1065341.n5.nabble.com/Computed-column-or-create-index-on-a-view-td89442.html,
SQLite doesn't support indexes on views.

2. And regardless, why is there a different result when using single
quotes vs. double quotes?

-- 
Ersin Y. Akinci -- ersinakinci.com
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Creating indexes on non-existent columns with double quotes

2017-01-25 Thread Simon Slavin

On 23 Jan 2017, at 9:33pm, Ersin Akinci  wrote:

> CREATE INDEX index_reports_on_yearz_doesnt_exist ON reports
> (yearz_doesnt_exist);
> CREATE INDEX index_reports_on_yearz_doesnt_exist ON reports
> ('yearz_doesnt_exist');
> CREATE INDEX index_reports_on_yearz_doesnt_exist ON reports
> ("yearz_doesnt_exist");
> 
> The first two of these statements will fail with the error 'Error: no such
> column: yearz_doesnt_exist', but the last one goes through.

The first one is doing what you said.

In the second index you are, technically, asking it to create a computed index 
on a constant string (that’s what the apostrophes mean).  I don’t understand 
why this doesn’t work for me.  I’m using SQLite 3.14.0.

The third one /should/ do what the first one does, since double-quotes indicate 
entity-names.

> After executing
> the last statement, if we look at the output of index_list and index_info,
> we get:
> 
> sqlite> PRAGMA index_list(reports);
> 0|index_reports_on_yearz_doesnt_exist|0|c|0
> sqlite> PRAGMA index_info(index_reports_on_yearz_doesnt_exist);
> 0|-2|

I don’t get that.  Instead I get

sqlite> CREATE INDEX index_reports_quotes ON reports
   ...> ("yearz_doesnt_exist");
sqlite> PRAGMA index_info('index_reports_quotes');
0|-2|
sqlite> PRAGMA index_xinfo('index_reports_quotes');
0|-2||0|BINARY|1
1|-1||0|BINARY|0

The "-2" in each case is undocumented.  I suspect that that it means 
'calculated column' and it’s just not documented yet.

[later] I see DRH has cleared this up.  SQLite failed to find a column with 
that name and is interpreting the string as a text string.  Therefore it is 
indeed creating a calculated index.

Simon.
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Creating indexes on non-existent columns with double quotes

2017-01-25 Thread Richard Hipp
On 1/23/17, Ersin Akinci  wrote:
> CREATE INDEX index_reports_on_yearz_doesnt_exist ON reports
> ("yearz_doesnt_exist");
>
>
> I understand that the double quotation syntax is used to indicate
> identifiers.  Why am I allowed to create an index on a non-existent column
> when I use double quotes?
>

SQLite has a (mis-)feature that double-quoted names that cannot be
resolved to a table or column name are treated as strings.  This was a
very early design decision, made long before SQLite went viral and
found itself running in everything device on the planet, and was
intended to make SQLite more compatible with MySQL, which at the time
was the most widely deployed database engine in the world.  I regret
that choice now, but I cannot undo it without breaking backwards
compatibility.

On the other hand, the ability to index an expression is a relatively
new feature.  Maybe we can disable the double-quote-to-string
mis-feature for that one isolated case.  Maybe.  I'll look into it.

-- 
D. Richard Hipp
d...@sqlite.org
___
sqlite-users mailing list
sqlite-users@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-26 Thread Paul Sanderson
Sorry - generally the sorts will be on one column - but they may choose at
a later time to sort by another column. They will (but rarely - sort by two
or more columns at the same time).


On 26 November 2012 14:20, Clemens Ladisch  wrote:

> Paul Sanderson wrote:
> > My software creates a large table containing anything between about 250K
> > and Millions of rows when first run, the indexes are created immediately
> > after the table is populated and the tables do not change afterwards.
> >
> > The reason for the indexes is that the data is then reviewed by a user
> and
> > each user will have a different end result in mind and so will be looking
> > at (lots of) different subsets of the data but will want pretty much
> > instant results, i.e. a few seconds not minutes. I have no idea which
> > columns, other than a few of the main columns, that they will need to do
> > selects and sorts on.
>
> If they can doing sorts on more than one column, you need indexes for
> all combinations of those columns.  (For n columns, that's "only" n!
> combinations.  Database size might become an issue ...)
>
> > The size of the database is never an issue. Memory is unlikely to be an
> > issue. The initial processing time (data loading and indexing) is not an
> > issue.
>
> Then why not creating covering indexes for every combination of sort
> columns?  (This is essentially the same as creating lots of sorted
> copies of the table.)
>
>
> Regards,
> Clemens
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
Paul
www.sandersonforensics.com
skype: r3scue193
twitter: @sandersonforens
Tel +44 (0)1326 572786
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-26 Thread Clemens Ladisch
Paul Sanderson wrote:
> My software creates a large table containing anything between about 250K
> and Millions of rows when first run, the indexes are created immediately
> after the table is populated and the tables do not change afterwards.
>
> The reason for the indexes is that the data is then reviewed by a user and
> each user will have a different end result in mind and so will be looking
> at (lots of) different subsets of the data but will want pretty much
> instant results, i.e. a few seconds not minutes. I have no idea which
> columns, other than a few of the main columns, that they will need to do
> selects and sorts on.

If they can doing sorts on more than one column, you need indexes for
all combinations of those columns.  (For n columns, that's "only" n!
combinations.  Database size might become an issue ...)

> The size of the database is never an issue. Memory is unlikely to be an
> issue. The initial processing time (data loading and indexing) is not an
> issue.

Then why not creating covering indexes for every combination of sort
columns?  (This is essentially the same as creating lots of sorted
copies of the table.)


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-26 Thread Paul Sanderson
Thanks for the replies - I'll try and read through them all thoroughly a
bit later.

But for now a bit of background.

My software creates a large table containing anything between about 250K
and Millions of rows when first run, the indexes are created immediately
after the table is populated and the tables do not change afterwards. This
process generally takes 10 minutes to an hour or so.

The reason for the indexes is that the data is then reviewed by a user and
each user will have a different end result in mind and so will be looking
at (lots of) different subsets of the data but will want pretty much
instant results, i.e. a few seconds not minutes. I have no idea which
columns, other than a few of the main columns, that they will need to do
selects and sorts on. It's unlikely to be all of them for a specific user
but all users are very likely to use all columns but wont really know which
until they start looking at the data and even then the desired end reult is
sometimes fluid so requirements change.

The size of the database is never an issue. Memory is unlikely to be an
issue. The initial processing time (data loading and indexing) is not an
issue.

The whole target here is to drive the end user experience and reduce any
wait time when they are at the keyboard to a minimum.

I have considered just indexing the main columns and indexing other columns
on demand (first use if you like) but can't see the benefit of that other
than saving time and disk space at stage 1, but there is no need for this




On 26 November 2012 01:52, Simon Slavin  wrote:

>
> On 25 Nov 2012, at 11:30pm, "Jay A. Kreibich"  wrote:
>
> > If you view an index as an optimization, then the idea
> >  is usually to increase overall performance, so that there is net win.
> [snip]
> >
> >  I disagree with this idea, as it implies there is a 1:1 exchange in
> >  read performance gains and write performance loss.
>
> To enlarge upon Jay's monotribe, I'll put these two together.
>  Optimization of a system sometimes doesn't mean optimizing /net/
> performance, it means speeding things up when time matters.  I once got an
> ovation from a roomfull of staff by deleting a couple of indexes, not
> adding them.  Because they spent most of their time entering data, but
> reports were generated by other people they never met and didn't care
> about. My changes reduced a two-second pause after hitting 'send' to half a
> second, massively reducing false keystrokes.
>
> The report-needing people, on the other hand didn't care.  It already took
> the system 30 minutes to generate the reports: they pressed the button and
> came back an hour later.  Changing the 30 minutes to 50 minutes didn't
> impact their workflow badly.  Out of courtesy I explained why the
> end-of-day reports were taking longer and nobody complained.  (By the way
> this sort of thing is why banks used to close to customers at 3:30pm even
> though the staff still worked until 5pm.)
>
> This was decades ago on a pre-SQL system.  One thing I like about SQL is
> that the language used to change and consult the database doesn't say
> anything about indexes (except in some non-standard extensions).  So you
> can add or delete indexes when you want without needing to change a line of
> code in any of your software and having to deploy new versions of your
> software.  [insert dramatic evil genius music here]
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-26 Thread Petite Abeille

On Nov 26, 2012, at 12:30 AM, Jay A. Kreibich  wrote:

>  "Using SQLite" (http://shop.oreilly.com/product/9780596521196.do)
>  has a very lengthy discussion of indexes and how they work,
>  specifically because it is difficult to generalize the use of
>  indexes.  One must really look at each application and each database
>  (and the data in that database!) with a solid knowledge of what an
>  index can-- or cannot-- provide in order to find places where an
>  index will make a positive difference.

All true, and well articulated. 

But, at the end of the day, SQLite only provides b-tree indexes and that is 
that.

And b-trees are not that useful and/or detrimental for a wide range of use case.

What about providing bitmap indexes?

http://en.wikipedia.org/wiki/Bitmap_index

Bitmap Index vs. B-tree Index: Which and When?

http://www.oracle.com/technetwork/articles/sharma-indexes-093638.html

Now, that would be a breakthrough :)



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-25 Thread Simon Slavin

On 25 Nov 2012, at 11:30pm, "Jay A. Kreibich"  wrote:

> If you view an index as an optimization, then the idea 
>  is usually to increase overall performance, so that there is net win. [snip]
> 
>  I disagree with this idea, as it implies there is a 1:1 exchange in
>  read performance gains and write performance loss.

To enlarge upon Jay's monotribe, I'll put these two together.  Optimization of 
a system sometimes doesn't mean optimizing /net/ performance, it means speeding 
things up when time matters.  I once got an ovation from a roomfull of staff by 
deleting a couple of indexes, not adding them.  Because they spent most of 
their time entering data, but reports were generated by other people they never 
met and didn't care about. My changes reduced a two-second pause after hitting 
'send' to half a second, massively reducing false keystrokes.

The report-needing people, on the other hand didn't care.  It already took the 
system 30 minutes to generate the reports: they pressed the button and came 
back an hour later.  Changing the 30 minutes to 50 minutes didn't impact their 
workflow badly.  Out of courtesy I explained why the end-of-day reports were 
taking longer and nobody complained.  (By the way this sort of thing is why 
banks used to close to customers at 3:30pm even though the staff still worked 
until 5pm.)

This was decades ago on a pre-SQL system.  One thing I like about SQL is that 
the language used to change and consult the database doesn't say anything about 
indexes (except in some non-standard extensions).  So you can add or delete 
indexes when you want without needing to change a line of code in any of your 
software and having to deploy new versions of your software.  [insert dramatic 
evil genius music here]

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-25 Thread Jay A. Kreibich
On Sun, Nov 25, 2012 at 12:41:21PM -0700, Keith Medcalf scratched on the wall:
> > On Sunday, 25 November, 2012, 11:58, Jay A. Kreibich wrote:
> 
> >   each column is usually undesirable.  A given SELECT can usually only
> >   use one index per query (or sub-query), so it rarely makes sense to
> >   stack up the indexes... adding unused indexes only slows down
> >   insert/update/deletes, as well as makes the file size much larger.
> 
> Generally speaking indexes should be treated as a performance trade-off. 

  Ideally they're a performance gain.  That's kind of the point.  If
  they're not an overall gain, you likely shouldn't be using them.

> When you create an index (other than a UNIQUE index used to enforce a
> constraint, or an index on a parent or child key in a foreign-key
> relationship, where such an index may greatly increase INSERT or UPDATE
> performance) you are "moving" execution time from the retrieval
> processing to the maintenance processing of your application.

  Yes and no.  If you view an index as an optimization, then the idea 
  is usually to increase overall performance, so that there is net win.
  It is true that an index will (hopefully) increase the performance
  of many queries at the possible cost of additional maintenance processing,
  but the net change depends on the application's read/write ratios,
  and the maintenance patterns.

  In that sense, I'd say the indexes do not move execution *time*, so
  much as move and adjust execution *cost*.  If the application's needs
  and access patterns are such that the refactored set of costs is lower,
  the indexes are usually seen as a correct and positive optimization.
  If the new set of costs results in lower overall performance, the
  index is seen as a negative thing.

  There is also nothing special about an automatic UNIQUE index.
  The costs are the same.  The only difference is the motivation for
  creating it, accepting the write performance cost as the price of
  enforcing the constraint.  Adding a UNIQUE constraint (and the index
  that goes with it) is not an optimization, but an application requirement.
  
> When you add an index, you are (usually) optimizing retrieval and
> query operations.  The execution time saved during such query
> operations does not disappear (it is not recovered).  These processor
> cycles and I/O operations are "removed" from retrieval operations
> and "spent" when you perform updates to the database to maintain
> the indexes.

  I disagree with this idea, as it implies there is a 1:1 exchange in
  read performance gains and write performance loss.  That is very
  rarely the case.  Even the relative costs of a single read vs a
  single write are not linear or particularly related.  All of these
  things are highly dependent on the table size, the different indexes,
  and the operation being performed.
  
  It isn't about moving costs from category A to category B, it is
  about finding efficient work flows that work faster and better.  We
  already know there are some situations when an index will provide
  significant performance benefits, and other cases when the same index
  may slow things down.  A big part of what the query optimizer must do
  is identify these situations and pick the best use of the available
  resources-- in many situations that may be to ignore the index.
  That's not about shifting execution time, it is about getting rid of
  it, and picking the fastest path for the situation at hand.  The
  balance is that an index opens up more options to the query
  optimizer, but there is also an associated cost-- and that cost
  should only be paid if the index is used from time to time.

  This balance is true of all operations, not just SELECTs.  For
  example, a good index can make UPDATE and DELETE operations faster,
  just as it may make SELECTs faster.  That kind of blows the theory of
  moving execution time around, since there can be advantages in both
  types of operations.
  
  Index optimization is in no way a zero-sum game.  The bag of water is
  *not* fixed, and it is just as possible to remove water as it is to
  add it.

> >   Additionally, indexing any column (even one used heavily in a query
> >   filter) is not useful if the index doesn't reduce the working data
> >   set to ~10% of the rows.  In other words, having an index on a column
> >   that has all one value (be it NULL or anything else) accomplishes
> >   nothing but significantly slowing down queries that use the index.
> >   Same is true, for example, of a True/False column with an even
> >   distribution.  Generally, unless the index can be used to eliminate
> >   the vast majority of rows, it will slow down, rather than speed up, a
> >   query.
> 
> The exception to this is, of course, where the index created is a
> covering index because using a covering index, while it may not
> necessarily reduce the number of rows significantly, eliminates the
> accesses to the base table to retrieve data 

Re: [sqlite] creating indexes on empty columns

2012-11-25 Thread Petite Abeille

On Nov 25, 2012, at 8:41 PM, Keith Medcalf  wrote:

>  (ie, be careful not to just add water into the bag -- the objective is to 
> poke it around, not just add more water)

+1 for the, hmmm, treading water metaphore

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-25 Thread Keith Medcalf

> On Sunday, 25 November, 2012, 11:58, Jay A. Kreibich wrote:

>   each column is usually undesirable.  A given SELECT can usually only
>   use one index per query (or sub-query), so it rarely makes sense to
>   stack up the indexes... adding unused indexes only slows down
>   insert/update/deletes, as well as makes the file size much larger.

Generally speaking indexes should be treated as a performance trade-off.  When 
you create an index (other than a UNIQUE index used to enforce a constraint, or 
an index on a parent or child key in a foreign-key relationship, where such an 
index may greatly increase INSERT or UPDATE performance) you are "moving" 
execution time from the retrieval processing to the maintenance processing of 
your application.

When you add an index, you are (usually) optimizing retrieval and query 
operations.  The execution time saved during such query operations does not 
disappear (it is not recovered).  These processor cycles and I/O operations are 
"removed" from retrieval operations and "spent" when you perform updates to the 
database to maintain the indexes.  In other words, optimization does not make 
all operations faster -- the time required for your application to perform its 
functions can be viewed as a fixed size bag of water.  When you optimize some 
operation by adding an index you are "pushing in" the bag of water in some 
specific place.  It bulges out somewhere else, and some other operation becomes 
slower because now you have to perform additional operations elsewhere to 
maintain the indexes.

The objective is to save a "huge" amount of processing time in one operation 
and spend that savings in "wee bits" distributed over many other operations 
where it is not noticeable.  For example, adding a particular index may 
decrease the time to perform a certain query from 5 minutes to 5 seconds.  
However, it will add a few hundred milliseconds to each update or insert 
operation.  If this trade-off is acceptable, then you have made a good 
trade-off.  On the other hand, if the indexes you added to optimize the query 
increases each update/insert by a few seconds, the trade-off may not be 
acceptable.  You may need to find an acceptable middle ground ...

>   Additionally, indexing any column (even one used heavily in a query
>   filter) is not useful if the index doesn't reduce the working data
>   set to ~10% of the rows.  In other words, having an index on a column
>   that has all one value (be it NULL or anything else) accomplishes
>   nothing but significantly slowing down queries that use the index.
>   Same is true, for example, of a True/False column with an even
>   distribution.  Generally, unless the index can be used to eliminate
>   the vast majority of rows, it will slow down, rather than speed up, a
>   query.

The exception to this is, of course, where the index created is a covering 
index because using a covering index, while it may not necessarily reduce the 
number of rows significantly, eliminates the accesses to the base table to 
retrieve data that might be being used in further operations.

Care needs to be taken to not prematurely add indexes that will add maintenance 
cost but not significantly improve query performance (ie, be careful not to 
just add water into the bag -- the objective is to poke it around, not just add 
more water).  Adding indexes required to enforce or optimize constraint and 
foreign key operations are almost always required -- but other indexes should 
not be added until you are sure that they will actually be required because 
overuse of indexes can severely hamper overall performance.

---
()  ascii ribbon campaign against html e-mail
/\  www.asciiribbon.org



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-25 Thread Jay A. Kreibich
On Sun, Nov 25, 2012 at 01:29:48PM +, Paul Sanderson scratched on the wall:
> Yes NULL -
> 
> I underastand that ecvery coumn needs to be read, that is self evident,
> however my feeling (not tested) is that the process is much slower than it
> needs to be, i..e the process of creating an index on a column whos values
> are all NULL takes longer than just reading all of the columns - I have to
> admit to not testing/timimg this though, I'll have a go when time permits.

  An index is essentially a specialized table that uses a different key.
  Tables and indexes use an almost identical data structure when
  stored.  The main difference is that tables are sorted and keyed via
  the ROWID column, while indexes are sorted and keyed by one or more
  indexed columns.  Every index has one row for each row in the table.

  So the question then becomes, if you were creating a one column table
  and inserted a large number of rows, would you expect a significant
  difference in the insert time if all the values were the same, or if
  they were different?

> That does lead to another question. Is their a method of creating multiple
> indexes at the same time, e.g. create an index on each (or specified)
> column in a table in one pass - rather than do each column in turn. This
> would save on the overhead of reading the entire table for each column.

  Scanning the table is not the expensive part of creating an index.
  Creating a large index requires a significant amount of memory to sort
  and rebalance the B-Tree used to store the index data.  Doing more
  than one index at a time would make that problem much worse, and
  would likely be slower, overall, then creating the indexes one at a
  time.


  As others have pointed out, it might be worth backing and asking what
  you're trying to achieve with all these indexes.  Putting an index on
  each column is usually undesirable.  A given SELECT can usually only
  use one index per query (or sub-query), so it rarely makes sense to
  stack up the indexes... adding unused indexes only slows down
  insert/update/deletes, as well as makes the file size much larger.
  Additionally, indexing any column (even one used heavily in a query
  filter) is not useful if the index doesn't reduce the working data
  set to ~10% of the rows.  In other words, having an index on a column
  that has all one value (be it NULL or anything else) accomplishes
  nothing but significantly slowing down queries that use the index.
  Same is true, for example, of a True/False column with an even
  distribution.  Generally, unless the index can be used to eliminate 
  the vast majority of rows, it will slow down, rather than speed up, a
  query.

  Indexes are not magic bullets, and using them properly requires
  understanding how they work and how they are used.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-25 Thread Simon Slavin

On 25 Nov 2012, at 6:13pm, Jay A. Kreibich  wrote:

> On Sun, Nov 25, 2012 at 04:56:44PM +, Simon Slavin scratched on the wall:
>> 
>> In SQLite, all columns are in all indexes even if the column contains a
>> NULL.  NULL has a sorting order, and anything that does
> 
>  Rows, Simon, rows... not columns.  Watch your terminology or your
>  answers will be more confusing than the questions.

Whoops.  Right.  Thanks.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-25 Thread Jay A. Kreibich
On Sun, Nov 25, 2012 at 04:56:44PM +, Simon Slavin scratched on the wall:
> 
> On 25 Nov 2012, at 1:29pm, Paul Sanderson  
> wrote:
> 
> > I underastand that ecvery coumn needs to be read, that is self evident,
> > however my feeling (not tested) is that the process is much slower than it
> > needs to be, i..e the process of creating an index on a column whos values
> > are all NULL takes longer than just reading all of the columns - I have to
> > admit to not testing/timimg this though, I'll have a go when time permits.
> 
> In SQLite, all columns are in all indexes even if the column contains a
> NULL.  NULL has a sorting order, and anything that does

  Rows, Simon, rows... not columns.  Watch your terminology or your
  answers will be more confusing than the questions.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-25 Thread Clemens Ladisch
Paul Sanderson wrote:
> I underastand that ecvery coumn needs to be read, that is self evident,
> however my feeling (not tested) is that the process is much slower than it
> needs to be, i..e the process of creating an index on a column whos values
> are all NULL takes longer than just reading all of the columns

There's also the time needed for sorting the values and constructing and
writing the index.

> I have to admit to not testing/timimg this though

sqlite> .timer on
sqlite> SELECT COUNT(*) FROM t;
4194304
CPU Time: user 0.008000 sys 0.00
sqlite> SELECT DISTINCT x FROM t;

CPU Time: user 0.608000 sys 0.012000
sqlite> CREATE INDEX i ON t(x);
CPU Time: user 6.592000 sys 0.064000
sqlite> DROP INDEX i;
CPU Time: user 0.08 sys 0.004000
sqlite> UPDATE t SET x=1;
CPU Time: user 4.556000 sys 0.052000
sqlite> SELECT DISTINCT x FROM t;
1
CPU Time: user 0.628000 sys 0.00
sqlite> CREATE INDEX i ON t(x);
CPU Time: user 6.648000 sys 0.08

> That does lead to another question. Is their a method of creating multiple
> indexes at the same time, e.g. create an index on each (or specified)
> column in a table in one pass - rather than do each column in turn. This
> would save on the overhead of reading the entire table for each column.

No such commands exists.  Just have enough file cache.  :-)


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-25 Thread Simon Slavin

On 25 Nov 2012, at 1:29pm, Paul Sanderson  wrote:

> I underastand that ecvery coumn needs to be read, that is self evident,
> however my feeling (not tested) is that the process is much slower than it
> needs to be, i..e the process of creating an index on a column whos values
> are all NULL takes longer than just reading all of the columns - I have to
> admit to not testing/timimg this though, I'll have a go when time permits.

In SQLite, all columns are in all indexes even if the column contains a NULL.  
NULL has a sorting order, and anything that does

SELECT * FROM myTable ORDER BY myIndexedColumn

still has to return all rows.

> That does lead to another question. Is their a method of creating multiple
> indexes at the same time, e.g. create an index on each (or specified)
> column in a table in one pass - rather than do each column in turn. This
> would save on the overhead of reading the entire table for each column.

It is very rare to need to index every column of a table.  That is generally a 
sign that what you really want is some sort of three-column attribute store 
rather than a conventional database.  You should know that any SELECT will only 
use either zero or one index: once you've used one index for sorting, the 
others are useless because they list rows in an unsorted order.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-25 Thread Simon Slavin

On 25 Nov 2012, at 1:29pm, Paul Sanderson  wrote:

> That does lead to another question. Is their a method of creating multiple
> indexes at the same time, e.g. create an index on each (or specified)
> column in a table in one pass - rather than do each column in turn. This
> would save on the overhead of reading the entire table for each column.

Looking again at this I'm not sure I understand why you're indexing every 
column.  Do you understand the difference between creating one index for each 
column and creating one index on all columns ?  It is worth the huge increase 
in filesize ?  Is the slow-down on data entry (where entering one row must 
modify many indexes) worth the speed-up in your SELECTs ?  Do you understand 
that any SELECT will use only one index ?

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-25 Thread Paul Sanderson
Yes NULL -

I underastand that ecvery coumn needs to be read, that is self evident,
however my feeling (not tested) is that the process is much slower than it
needs to be, i..e the process of creating an index on a column whos values
are all NULL takes longer than just reading all of the columns - I have to
admit to not testing/timimg this though, I'll have a go when time permits.

That does lead to another question. Is their a method of creating multiple
indexes at the same time, e.g. create an index on each (or specified)
column in a table in one pass - rather than do each column in turn. This
would save on the overhead of reading the entire table for each column.


On 24 November 2012 14:40, Clemens Ladisch  wrote:

> Paul Sanderson wrote:
> > Whilst building a new app I created an index on every column some of
> which
> > were empty.
>
> And with "empty", you mean that every value in that column is NULL?
>
> > The database is reasonably large (400K rows) and I notcied that
> > it seems to take as long to create an index on a column in which all the
> > rows are empty as it does on one in which all the rows are unique.
>
> SQLite still has to read all records to determine what the values are,
> and to create the index with all these values and RowIDs.  That these
> values all happen to have the same value does not change the algorithm.
>
>
> Regards,
> Clemens
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-24 Thread Clemens Ladisch
Paul Sanderson wrote:
> Whilst building a new app I created an index on every column some of which
> were empty.

And with "empty", you mean that every value in that column is NULL?

> The database is reasonably large (400K rows) and I notcied that
> it seems to take as long to create an index on a column in which all the
> rows are empty as it does on one in which all the rows are unique.

SQLite still has to read all records to determine what the values are,
and to create the index with all these values and RowIDs.  That these
values all happen to have the same value does not change the algorithm.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-24 Thread Igor Tandetnik
Paul Sanderson  wrote:
> Whilst building a new app I created an index on every column some of which
> were empty. The database is reasonably large (400K rows) and I notcied that
> it seems to take as long to create an index on a column in which all the
> rows are empty as it does on one in which all the rows are unique.
> 
> I don't really have an idea of how sqlite works internally, but this seems
> like an area where there could be some optimisation - or anm I totally off
> track?

What kind of optimization do you have in mind? How exactly do you propose 
determining that a column has empty values in all rows, any faster than looking 
at every row? Also, what constitutes an "empty value" - NULL, empty string, 
zero-length blob, integer 0, something else?
-- 
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes

2010-01-24 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

alenD wrote:
> Should an index be created before insertions or after insertions?:super:

What performance difference did you see after trying both options?

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAktc24gACgkQmOOfHg372QT9yQCfbIPLPg/HNZ1xqkv6cnbdDfxP
2KQAoKqgGHzqiz/MCHa90NyC4vfIhxvy
=W1wk
-END PGP SIGNATURE-
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Creating Indexes

2008-08-06 Thread Kees Nuyt
On Wed, 6 Aug 2008 16:45:35 -0400, Jeffrey Becker wrote:

> On Wed, Aug 6, 2008 at 4:10 PM, 
> Kees Nuyt <[EMAIL PROTECTED]> wrote:

>> Additionally: NodeID and ParentID shouldn't be blobs.
>> Integer is the most suitable type for IDs.
>
>The blobs I'm inserting are actually a binary representation designed
>_specifically_ to index properly.  So yes they should be blobs.

Aha, ok, all clear now.
-- 
  (  Kees Nuyt
  )
c[_]
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Creating Indexes

2008-08-06 Thread Jeffrey Becker
On Wed, Aug 6, 2008 at 4:10 PM, Kees Nuyt <[EMAIL PROTECTED]> wrote:
> On Tue, 05 Aug 2008 17:22:05 -0500, you wrote:
>
>>Jeffrey Becker wrote:
>>> I have a table 'SiteMap' defined as :
>>>
>>> Create Table SiteMap
>>> (
>>>  NodeID blob not null PRIMARY KEY,
>>>  Title text NOT NULL UNIQUE,
>>>  Url text NOT NULL
>>> );
>>>
>>> I'd like to index on the node's parent value as defined by the
>>> expression ancestor(NodeID,1).  'ancestor' being a user defined
>>> function.
>>> However sqlite doesnt permit indicies on expressions, only columns.
>>> Why is this? More importantly what's my best bet on achieveing
>>> something similar?
>>
>>Create Table SiteMap
>>(
>>   NodeID blob not null PRIMARY KEY,
>>   ParentID blob not null,
>>   Title text NOT NULL UNIQUE,
>>   Url text NOT NULL
>>);
>>
>>Then index on (ParentID, NodeID) or something like that.
>>
>>-Steve
>
> Additionally: NodeID and ParentID shouldn't be blobs.
> Integer is the most suitable type for IDs.
>
> http://article.gmane.org/gmane.comp.db.sqlite.general/17286/match=managing+trees+database
> and
> http://www.sitepoint.com/article/hierarchical-data-database
> may help on maintaining trees in a relational database.
>
> Or search the net on "Joe Celko" and "nested sets".
> --
>  (  Kees Nuyt
>  )
> c[_]

The blobs I'm inserting are actually a binary representation designed
_specifically_ to index properly.  So yes they should be blobs.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Creating Indexes

2008-08-06 Thread Kees Nuyt
On Tue, 05 Aug 2008 17:22:05 -0500, you wrote:

>Jeffrey Becker wrote:
>> I have a table 'SiteMap' defined as :
>> 
>> Create Table SiteMap
>> (
>>  NodeID blob not null PRIMARY KEY,
>>  Title text NOT NULL UNIQUE,
>>  Url text NOT NULL
>> );
>> 
>> I'd like to index on the node's parent value as defined by the
>> expression ancestor(NodeID,1).  'ancestor' being a user defined
>> function.
>> However sqlite doesnt permit indicies on expressions, only columns.
>> Why is this? More importantly what's my best bet on achieveing
>> something similar?
>
>Create Table SiteMap
>(
>   NodeID blob not null PRIMARY KEY,
>   ParentID blob not null,
>   Title text NOT NULL UNIQUE,
>   Url text NOT NULL
>);
>
>Then index on (ParentID, NodeID) or something like that.
>
>-Steve

Additionally: NodeID and ParentID shouldn't be blobs.
Integer is the most suitable type for IDs.

http://article.gmane.org/gmane.comp.db.sqlite.general/17286/match=managing+trees+database
and
http://www.sitepoint.com/article/hierarchical-data-database
may help on maintaining trees in a relational database.

Or search the net on "Joe Celko" and "nested sets".
-- 
  (  Kees Nuyt
  )
c[_]
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] Creating Indexes

2008-08-05 Thread Stephen Woodbridge
Jeffrey Becker wrote:
> I have a table 'SiteMap' defined as :
> 
> Create Table SiteMap
> (
>   NodeID blob not null PRIMARY KEY,
>   Title text NOT NULL UNIQUE,
>   Url text NOT NULL
> );
> 
> I'd like to index on the node's parent value as defined by the
> expression ancestor(NodeID,1).  'ancestor' being a user defined
> function.
> However sqlite doesnt permit indicies on expressions, only columns.
> Why is this? More importantly what's my best bet on achieveing
> something similar?

Create Table SiteMap
(
NodeID blob not null PRIMARY KEY,
ParentID blob not null,
Title text NOT NULL UNIQUE,
Url text NOT NULL
);

Then index on (ParentID, NodeID) or something like that.

-Steve
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes in attached databases

2006-08-19 Thread Jonathan Ellis

On 8/19/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:

"Jonathan Ellis" <[EMAIL PROTECTED]> wrote:
> sqlite> attach 'foo.db' as foo;
> sqlite> create table foo.bar (i int);
> sqlite> create index foo.bar_i on foo.bar(i);


Should be:

   create index foo.bar_i on bar(i);


Thanks, that fixed it.

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] creating indexes in attached databases

2006-08-19 Thread drh
"Jonathan Ellis" <[EMAIL PROTECTED]> wrote:
> sqlite> attach 'foo.db' as foo;
> sqlite> create table foo.bar (i int);
> sqlite> create index foo.bar_i on foo.bar(i);


Should be:

   create index foo.bar_i on bar(i);


> SQL error: near ".": syntax error
> sqlite>
> 
> Is there another way to do this?
> 
> -
> To unsubscribe, send email to [EMAIL PROTECTED]
> -
> 
> 
> .



-
To unsubscribe, send email to [EMAIL PROTECTED]
-