Re: [sqlite] creating indexes on empty columns

2012-11-26 Thread Paul Sanderson
Sorry - generally the sorts will be on one column - but they may choose at
a later time to sort by another column. They will (but rarely - sort by two
or more columns at the same time).


On 26 November 2012 14:20, Clemens Ladisch  wrote:

> Paul Sanderson wrote:
> > My software creates a large table containing anything between about 250K
> > and Millions of rows when first run, the indexes are created immediately
> > after the table is populated and the tables do not change afterwards.
> >
> > The reason for the indexes is that the data is then reviewed by a user
> and
> > each user will have a different end result in mind and so will be looking
> > at (lots of) different subsets of the data but will want pretty much
> > instant results, i.e. a few seconds not minutes. I have no idea which
> > columns, other than a few of the main columns, that they will need to do
> > selects and sorts on.
>
> If they can doing sorts on more than one column, you need indexes for
> all combinations of those columns.  (For n columns, that's "only" n!
> combinations.  Database size might become an issue ...)
>
> > The size of the database is never an issue. Memory is unlikely to be an
> > issue. The initial processing time (data loading and indexing) is not an
> > issue.
>
> Then why not creating covering indexes for every combination of sort
> columns?  (This is essentially the same as creating lots of sorted
> copies of the table.)
>
>
> Regards,
> Clemens
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>



-- 
Paul
www.sandersonforensics.com
skype: r3scue193
twitter: @sandersonforens
Tel +44 (0)1326 572786
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-26 Thread Clemens Ladisch
Paul Sanderson wrote:
> My software creates a large table containing anything between about 250K
> and Millions of rows when first run, the indexes are created immediately
> after the table is populated and the tables do not change afterwards.
>
> The reason for the indexes is that the data is then reviewed by a user and
> each user will have a different end result in mind and so will be looking
> at (lots of) different subsets of the data but will want pretty much
> instant results, i.e. a few seconds not minutes. I have no idea which
> columns, other than a few of the main columns, that they will need to do
> selects and sorts on.

If they can doing sorts on more than one column, you need indexes for
all combinations of those columns.  (For n columns, that's "only" n!
combinations.  Database size might become an issue ...)

> The size of the database is never an issue. Memory is unlikely to be an
> issue. The initial processing time (data loading and indexing) is not an
> issue.

Then why not creating covering indexes for every combination of sort
columns?  (This is essentially the same as creating lots of sorted
copies of the table.)


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-26 Thread Paul Sanderson
Thanks for the replies - I'll try and read through them all thoroughly a
bit later.

But for now a bit of background.

My software creates a large table containing anything between about 250K
and Millions of rows when first run, the indexes are created immediately
after the table is populated and the tables do not change afterwards. This
process generally takes 10 minutes to an hour or so.

The reason for the indexes is that the data is then reviewed by a user and
each user will have a different end result in mind and so will be looking
at (lots of) different subsets of the data but will want pretty much
instant results, i.e. a few seconds not minutes. I have no idea which
columns, other than a few of the main columns, that they will need to do
selects and sorts on. It's unlikely to be all of them for a specific user
but all users are very likely to use all columns but wont really know which
until they start looking at the data and even then the desired end reult is
sometimes fluid so requirements change.

The size of the database is never an issue. Memory is unlikely to be an
issue. The initial processing time (data loading and indexing) is not an
issue.

The whole target here is to drive the end user experience and reduce any
wait time when they are at the keyboard to a minimum.

I have considered just indexing the main columns and indexing other columns
on demand (first use if you like) but can't see the benefit of that other
than saving time and disk space at stage 1, but there is no need for this




On 26 November 2012 01:52, Simon Slavin  wrote:

>
> On 25 Nov 2012, at 11:30pm, "Jay A. Kreibich"  wrote:
>
> > If you view an index as an optimization, then the idea
> >  is usually to increase overall performance, so that there is net win.
> [snip]
> >
> >  I disagree with this idea, as it implies there is a 1:1 exchange in
> >  read performance gains and write performance loss.
>
> To enlarge upon Jay's monotribe, I'll put these two together.
>  Optimization of a system sometimes doesn't mean optimizing /net/
> performance, it means speeding things up when time matters.  I once got an
> ovation from a roomfull of staff by deleting a couple of indexes, not
> adding them.  Because they spent most of their time entering data, but
> reports were generated by other people they never met and didn't care
> about. My changes reduced a two-second pause after hitting 'send' to half a
> second, massively reducing false keystrokes.
>
> The report-needing people, on the other hand didn't care.  It already took
> the system 30 minutes to generate the reports: they pressed the button and
> came back an hour later.  Changing the 30 minutes to 50 minutes didn't
> impact their workflow badly.  Out of courtesy I explained why the
> end-of-day reports were taking longer and nobody complained.  (By the way
> this sort of thing is why banks used to close to customers at 3:30pm even
> though the staff still worked until 5pm.)
>
> This was decades ago on a pre-SQL system.  One thing I like about SQL is
> that the language used to change and consult the database doesn't say
> anything about indexes (except in some non-standard extensions).  So you
> can add or delete indexes when you want without needing to change a line of
> code in any of your software and having to deploy new versions of your
> software.  [insert dramatic evil genius music here]
>
> Simon.
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-26 Thread Petite Abeille

On Nov 26, 2012, at 12:30 AM, Jay A. Kreibich  wrote:

>  "Using SQLite" (http://shop.oreilly.com/product/9780596521196.do)
>  has a very lengthy discussion of indexes and how they work,
>  specifically because it is difficult to generalize the use of
>  indexes.  One must really look at each application and each database
>  (and the data in that database!) with a solid knowledge of what an
>  index can-- or cannot-- provide in order to find places where an
>  index will make a positive difference.

All true, and well articulated. 

But, at the end of the day, SQLite only provides b-tree indexes and that is 
that.

And b-trees are not that useful and/or detrimental for a wide range of use case.

What about providing bitmap indexes?

http://en.wikipedia.org/wiki/Bitmap_index

Bitmap Index vs. B-tree Index: Which and When?

http://www.oracle.com/technetwork/articles/sharma-indexes-093638.html

Now, that would be a breakthrough :)



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-25 Thread Simon Slavin

On 25 Nov 2012, at 11:30pm, "Jay A. Kreibich"  wrote:

> If you view an index as an optimization, then the idea 
>  is usually to increase overall performance, so that there is net win. [snip]
> 
>  I disagree with this idea, as it implies there is a 1:1 exchange in
>  read performance gains and write performance loss.

To enlarge upon Jay's monotribe, I'll put these two together.  Optimization of 
a system sometimes doesn't mean optimizing /net/ performance, it means speeding 
things up when time matters.  I once got an ovation from a roomfull of staff by 
deleting a couple of indexes, not adding them.  Because they spent most of 
their time entering data, but reports were generated by other people they never 
met and didn't care about. My changes reduced a two-second pause after hitting 
'send' to half a second, massively reducing false keystrokes.

The report-needing people, on the other hand didn't care.  It already took the 
system 30 minutes to generate the reports: they pressed the button and came 
back an hour later.  Changing the 30 minutes to 50 minutes didn't impact their 
workflow badly.  Out of courtesy I explained why the end-of-day reports were 
taking longer and nobody complained.  (By the way this sort of thing is why 
banks used to close to customers at 3:30pm even though the staff still worked 
until 5pm.)

This was decades ago on a pre-SQL system.  One thing I like about SQL is that 
the language used to change and consult the database doesn't say anything about 
indexes (except in some non-standard extensions).  So you can add or delete 
indexes when you want without needing to change a line of code in any of your 
software and having to deploy new versions of your software.  [insert dramatic 
evil genius music here]

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-25 Thread Jay A. Kreibich
On Sun, Nov 25, 2012 at 12:41:21PM -0700, Keith Medcalf scratched on the wall:
> > On Sunday, 25 November, 2012, 11:58, Jay A. Kreibich wrote:
> 
> >   each column is usually undesirable.  A given SELECT can usually only
> >   use one index per query (or sub-query), so it rarely makes sense to
> >   stack up the indexes... adding unused indexes only slows down
> >   insert/update/deletes, as well as makes the file size much larger.
> 
> Generally speaking indexes should be treated as a performance trade-off. 

  Ideally they're a performance gain.  That's kind of the point.  If
  they're not an overall gain, you likely shouldn't be using them.

> When you create an index (other than a UNIQUE index used to enforce a
> constraint, or an index on a parent or child key in a foreign-key
> relationship, where such an index may greatly increase INSERT or UPDATE
> performance) you are "moving" execution time from the retrieval
> processing to the maintenance processing of your application.

  Yes and no.  If you view an index as an optimization, then the idea 
  is usually to increase overall performance, so that there is net win.
  It is true that an index will (hopefully) increase the performance
  of many queries at the possible cost of additional maintenance processing,
  but the net change depends on the application's read/write ratios,
  and the maintenance patterns.

  In that sense, I'd say the indexes do not move execution *time*, so
  much as move and adjust execution *cost*.  If the application's needs
  and access patterns are such that the refactored set of costs is lower,
  the indexes are usually seen as a correct and positive optimization.
  If the new set of costs results in lower overall performance, the
  index is seen as a negative thing.

  There is also nothing special about an automatic UNIQUE index.
  The costs are the same.  The only difference is the motivation for
  creating it, accepting the write performance cost as the price of
  enforcing the constraint.  Adding a UNIQUE constraint (and the index
  that goes with it) is not an optimization, but an application requirement.
  
> When you add an index, you are (usually) optimizing retrieval and
> query operations.  The execution time saved during such query
> operations does not disappear (it is not recovered).  These processor
> cycles and I/O operations are "removed" from retrieval operations
> and "spent" when you perform updates to the database to maintain
> the indexes.

  I disagree with this idea, as it implies there is a 1:1 exchange in
  read performance gains and write performance loss.  That is very
  rarely the case.  Even the relative costs of a single read vs a
  single write are not linear or particularly related.  All of these
  things are highly dependent on the table size, the different indexes,
  and the operation being performed.
  
  It isn't about moving costs from category A to category B, it is
  about finding efficient work flows that work faster and better.  We
  already know there are some situations when an index will provide
  significant performance benefits, and other cases when the same index
  may slow things down.  A big part of what the query optimizer must do
  is identify these situations and pick the best use of the available
  resources-- in many situations that may be to ignore the index.
  That's not about shifting execution time, it is about getting rid of
  it, and picking the fastest path for the situation at hand.  The
  balance is that an index opens up more options to the query
  optimizer, but there is also an associated cost-- and that cost
  should only be paid if the index is used from time to time.

  This balance is true of all operations, not just SELECTs.  For
  example, a good index can make UPDATE and DELETE operations faster,
  just as it may make SELECTs faster.  That kind of blows the theory of
  moving execution time around, since there can be advantages in both
  types of operations.
  
  Index optimization is in no way a zero-sum game.  The bag of water is
  *not* fixed, and it is just as possible to remove water as it is to
  add it.

> >   Additionally, indexing any column (even one used heavily in a query
> >   filter) is not useful if the index doesn't reduce the working data
> >   set to ~10% of the rows.  In other words, having an index on a column
> >   that has all one value (be it NULL or anything else) accomplishes
> >   nothing but significantly slowing down queries that use the index.
> >   Same is true, for example, of a True/False column with an even
> >   distribution.  Generally, unless the index can be used to eliminate
> >   the vast majority of rows, it will slow down, rather than speed up, a
> >   query.
> 
> The exception to this is, of course, where the index created is a
> covering index because using a covering index, while it may not
> necessarily reduce the number of rows significantly, eliminates the
> accesses to the base table to retrieve data th

Re: [sqlite] creating indexes on empty columns

2012-11-25 Thread Petite Abeille

On Nov 25, 2012, at 8:41 PM, Keith Medcalf  wrote:

>  (ie, be careful not to just add water into the bag -- the objective is to 
> poke it around, not just add more water)

+1 for the, hmmm, treading water metaphore

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-25 Thread Keith Medcalf

> On Sunday, 25 November, 2012, 11:58, Jay A. Kreibich wrote:

>   each column is usually undesirable.  A given SELECT can usually only
>   use one index per query (or sub-query), so it rarely makes sense to
>   stack up the indexes... adding unused indexes only slows down
>   insert/update/deletes, as well as makes the file size much larger.

Generally speaking indexes should be treated as a performance trade-off.  When 
you create an index (other than a UNIQUE index used to enforce a constraint, or 
an index on a parent or child key in a foreign-key relationship, where such an 
index may greatly increase INSERT or UPDATE performance) you are "moving" 
execution time from the retrieval processing to the maintenance processing of 
your application.

When you add an index, you are (usually) optimizing retrieval and query 
operations.  The execution time saved during such query operations does not 
disappear (it is not recovered).  These processor cycles and I/O operations are 
"removed" from retrieval operations and "spent" when you perform updates to the 
database to maintain the indexes.  In other words, optimization does not make 
all operations faster -- the time required for your application to perform its 
functions can be viewed as a fixed size bag of water.  When you optimize some 
operation by adding an index you are "pushing in" the bag of water in some 
specific place.  It bulges out somewhere else, and some other operation becomes 
slower because now you have to perform additional operations elsewhere to 
maintain the indexes.

The objective is to save a "huge" amount of processing time in one operation 
and spend that savings in "wee bits" distributed over many other operations 
where it is not noticeable.  For example, adding a particular index may 
decrease the time to perform a certain query from 5 minutes to 5 seconds.  
However, it will add a few hundred milliseconds to each update or insert 
operation.  If this trade-off is acceptable, then you have made a good 
trade-off.  On the other hand, if the indexes you added to optimize the query 
increases each update/insert by a few seconds, the trade-off may not be 
acceptable.  You may need to find an acceptable middle ground ...

>   Additionally, indexing any column (even one used heavily in a query
>   filter) is not useful if the index doesn't reduce the working data
>   set to ~10% of the rows.  In other words, having an index on a column
>   that has all one value (be it NULL or anything else) accomplishes
>   nothing but significantly slowing down queries that use the index.
>   Same is true, for example, of a True/False column with an even
>   distribution.  Generally, unless the index can be used to eliminate
>   the vast majority of rows, it will slow down, rather than speed up, a
>   query.

The exception to this is, of course, where the index created is a covering 
index because using a covering index, while it may not necessarily reduce the 
number of rows significantly, eliminates the accesses to the base table to 
retrieve data that might be being used in further operations.

Care needs to be taken to not prematurely add indexes that will add maintenance 
cost but not significantly improve query performance (ie, be careful not to 
just add water into the bag -- the objective is to poke it around, not just add 
more water).  Adding indexes required to enforce or optimize constraint and 
foreign key operations are almost always required -- but other indexes should 
not be added until you are sure that they will actually be required because 
overuse of indexes can severely hamper overall performance.

---
()  ascii ribbon campaign against html e-mail
/\  www.asciiribbon.org



___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-25 Thread Jay A. Kreibich
On Sun, Nov 25, 2012 at 01:29:48PM +, Paul Sanderson scratched on the wall:
> Yes NULL -
> 
> I underastand that ecvery coumn needs to be read, that is self evident,
> however my feeling (not tested) is that the process is much slower than it
> needs to be, i..e the process of creating an index on a column whos values
> are all NULL takes longer than just reading all of the columns - I have to
> admit to not testing/timimg this though, I'll have a go when time permits.

  An index is essentially a specialized table that uses a different key.
  Tables and indexes use an almost identical data structure when
  stored.  The main difference is that tables are sorted and keyed via
  the ROWID column, while indexes are sorted and keyed by one or more
  indexed columns.  Every index has one row for each row in the table.

  So the question then becomes, if you were creating a one column table
  and inserted a large number of rows, would you expect a significant
  difference in the insert time if all the values were the same, or if
  they were different?

> That does lead to another question. Is their a method of creating multiple
> indexes at the same time, e.g. create an index on each (or specified)
> column in a table in one pass - rather than do each column in turn. This
> would save on the overhead of reading the entire table for each column.

  Scanning the table is not the expensive part of creating an index.
  Creating a large index requires a significant amount of memory to sort
  and rebalance the B-Tree used to store the index data.  Doing more
  than one index at a time would make that problem much worse, and
  would likely be slower, overall, then creating the indexes one at a
  time.


  As others have pointed out, it might be worth backing and asking what
  you're trying to achieve with all these indexes.  Putting an index on
  each column is usually undesirable.  A given SELECT can usually only
  use one index per query (or sub-query), so it rarely makes sense to
  stack up the indexes... adding unused indexes only slows down
  insert/update/deletes, as well as makes the file size much larger.
  Additionally, indexing any column (even one used heavily in a query
  filter) is not useful if the index doesn't reduce the working data
  set to ~10% of the rows.  In other words, having an index on a column
  that has all one value (be it NULL or anything else) accomplishes
  nothing but significantly slowing down queries that use the index.
  Same is true, for example, of a True/False column with an even
  distribution.  Generally, unless the index can be used to eliminate 
  the vast majority of rows, it will slow down, rather than speed up, a
  query.

  Indexes are not magic bullets, and using them properly requires
  understanding how they work and how they are used.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-25 Thread Simon Slavin

On 25 Nov 2012, at 6:13pm, Jay A. Kreibich  wrote:

> On Sun, Nov 25, 2012 at 04:56:44PM +, Simon Slavin scratched on the wall:
>> 
>> In SQLite, all columns are in all indexes even if the column contains a
>> NULL.  NULL has a sorting order, and anything that does
> 
>  Rows, Simon, rows... not columns.  Watch your terminology or your
>  answers will be more confusing than the questions.

Whoops.  Right.  Thanks.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-25 Thread Jay A. Kreibich
On Sun, Nov 25, 2012 at 04:56:44PM +, Simon Slavin scratched on the wall:
> 
> On 25 Nov 2012, at 1:29pm, Paul Sanderson  
> wrote:
> 
> > I underastand that ecvery coumn needs to be read, that is self evident,
> > however my feeling (not tested) is that the process is much slower than it
> > needs to be, i..e the process of creating an index on a column whos values
> > are all NULL takes longer than just reading all of the columns - I have to
> > admit to not testing/timimg this though, I'll have a go when time permits.
> 
> In SQLite, all columns are in all indexes even if the column contains a
> NULL.  NULL has a sorting order, and anything that does

  Rows, Simon, rows... not columns.  Watch your terminology or your
  answers will be more confusing than the questions.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-25 Thread Clemens Ladisch
Paul Sanderson wrote:
> I underastand that ecvery coumn needs to be read, that is self evident,
> however my feeling (not tested) is that the process is much slower than it
> needs to be, i..e the process of creating an index on a column whos values
> are all NULL takes longer than just reading all of the columns

There's also the time needed for sorting the values and constructing and
writing the index.

> I have to admit to not testing/timimg this though

sqlite> .timer on
sqlite> SELECT COUNT(*) FROM t;
4194304
CPU Time: user 0.008000 sys 0.00
sqlite> SELECT DISTINCT x FROM t;

CPU Time: user 0.608000 sys 0.012000
sqlite> CREATE INDEX i ON t(x);
CPU Time: user 6.592000 sys 0.064000
sqlite> DROP INDEX i;
CPU Time: user 0.08 sys 0.004000
sqlite> UPDATE t SET x=1;
CPU Time: user 4.556000 sys 0.052000
sqlite> SELECT DISTINCT x FROM t;
1
CPU Time: user 0.628000 sys 0.00
sqlite> CREATE INDEX i ON t(x);
CPU Time: user 6.648000 sys 0.08

> That does lead to another question. Is their a method of creating multiple
> indexes at the same time, e.g. create an index on each (or specified)
> column in a table in one pass - rather than do each column in turn. This
> would save on the overhead of reading the entire table for each column.

No such commands exists.  Just have enough file cache.  :-)


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-25 Thread Simon Slavin

On 25 Nov 2012, at 1:29pm, Paul Sanderson  wrote:

> I underastand that ecvery coumn needs to be read, that is self evident,
> however my feeling (not tested) is that the process is much slower than it
> needs to be, i..e the process of creating an index on a column whos values
> are all NULL takes longer than just reading all of the columns - I have to
> admit to not testing/timimg this though, I'll have a go when time permits.

In SQLite, all columns are in all indexes even if the column contains a NULL.  
NULL has a sorting order, and anything that does

SELECT * FROM myTable ORDER BY myIndexedColumn

still has to return all rows.

> That does lead to another question. Is their a method of creating multiple
> indexes at the same time, e.g. create an index on each (or specified)
> column in a table in one pass - rather than do each column in turn. This
> would save on the overhead of reading the entire table for each column.

It is very rare to need to index every column of a table.  That is generally a 
sign that what you really want is some sort of three-column attribute store 
rather than a conventional database.  You should know that any SELECT will only 
use either zero or one index: once you've used one index for sorting, the 
others are useless because they list rows in an unsorted order.

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-25 Thread Simon Slavin

On 25 Nov 2012, at 1:29pm, Paul Sanderson  wrote:

> That does lead to another question. Is their a method of creating multiple
> indexes at the same time, e.g. create an index on each (or specified)
> column in a table in one pass - rather than do each column in turn. This
> would save on the overhead of reading the entire table for each column.

Looking again at this I'm not sure I understand why you're indexing every 
column.  Do you understand the difference between creating one index for each 
column and creating one index on all columns ?  It is worth the huge increase 
in filesize ?  Is the slow-down on data entry (where entering one row must 
modify many indexes) worth the speed-up in your SELECTs ?  Do you understand 
that any SELECT will use only one index ?

Simon.
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-25 Thread Paul Sanderson
Yes NULL -

I underastand that ecvery coumn needs to be read, that is self evident,
however my feeling (not tested) is that the process is much slower than it
needs to be, i..e the process of creating an index on a column whos values
are all NULL takes longer than just reading all of the columns - I have to
admit to not testing/timimg this though, I'll have a go when time permits.

That does lead to another question. Is their a method of creating multiple
indexes at the same time, e.g. create an index on each (or specified)
column in a table in one pass - rather than do each column in turn. This
would save on the overhead of reading the entire table for each column.


On 24 November 2012 14:40, Clemens Ladisch  wrote:

> Paul Sanderson wrote:
> > Whilst building a new app I created an index on every column some of
> which
> > were empty.
>
> And with "empty", you mean that every value in that column is NULL?
>
> > The database is reasonably large (400K rows) and I notcied that
> > it seems to take as long to create an index on a column in which all the
> > rows are empty as it does on one in which all the rows are unique.
>
> SQLite still has to read all records to determine what the values are,
> and to create the index with all these values and RowIDs.  That these
> values all happen to have the same value does not change the algorithm.
>
>
> Regards,
> Clemens
> ___
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
>
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-24 Thread Clemens Ladisch
Paul Sanderson wrote:
> Whilst building a new app I created an index on every column some of which
> were empty.

And with "empty", you mean that every value in that column is NULL?

> The database is reasonably large (400K rows) and I notcied that
> it seems to take as long to create an index on a column in which all the
> rows are empty as it does on one in which all the rows are unique.

SQLite still has to read all records to determine what the values are,
and to create the index with all these values and RowIDs.  That these
values all happen to have the same value does not change the algorithm.


Regards,
Clemens
___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users


Re: [sqlite] creating indexes on empty columns

2012-11-24 Thread Igor Tandetnik
Paul Sanderson  wrote:
> Whilst building a new app I created an index on every column some of which
> were empty. The database is reasonably large (400K rows) and I notcied that
> it seems to take as long to create an index on a column in which all the
> rows are empty as it does on one in which all the rows are unique.
> 
> I don't really have an idea of how sqlite works internally, but this seems
> like an area where there could be some optimisation - or anm I totally off
> track?

What kind of optimization do you have in mind? How exactly do you propose 
determining that a column has empty values in all rows, any faster than looking 
at every row? Also, what constitutes an "empty value" - NULL, empty string, 
zero-length blob, integer 0, something else?
-- 
Igor Tandetnik

___
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users