Re: [PERFORM] browsing table with 2 million records

2005-10-27 Thread Merlin Moncure
Christopher 
  - Present a nifty date selector to choose the records from any
day,
  hour,  minute, second
  - show them, with next day and previous day buttons
 
  - It's more useful to the user (most likely he wants to know
what
  happened on 01/05/2005 rather than view page 2857)
  - It's faster (no more limit/offset ! just date BETWEEN a AND
b,
  indexed of course)
  - no more new items pushing old ones to the next page while you
 browse
  - you can pretend to your boss it's just like a paginated list
 
 All very well and good, but now do it generically...

I've done it...  
First of all I totally agree with PFC's rant regarding absolute
positioning while browsing datasets.  Among other things, it has serious
problems if you have multiple updating your table.  Also it's kind of
silly to be doing this in a set based data paradigm.

The 'SQL' way to browse a dataset is by key.  If your key has multiple
parts or you are trying to sort on two or more fields, you are supposed
to use the row constructor:

select * from t where (x, y)  (xc, yc) order by x,y;

Unfortunately, this gives the wrong answer in postgresql :(.

The alternative is to use boolean logic.  Here is a log snippit from my
ISAM driver (in ISAM, you are *always* browsing datasets):

prepare system_read_next_menu_item_favorite_file_0 (character varying,
int4, int4, int4)
as select from system.menu_item_favorite_file
where mif_user_id = $1 and 
(mif_user_id   $1 or  mif_menu_item_id = $2) and 
(mif_user_id   $1 or  mif_menu_item_id   $2 or
mif_sequence_no   $3) 
order by mif_user_id, mif_menu_item_id, mif_sequence_no
limit $4

This is a Boolean based 'get next record' in a 3 part key plus a
parameterized limit.  You can do this without using prepared statements
of course but with the prepared version you can at least do 

execute system_read_next_menu_item_favorite_file_0('abc', 1, 2, 1);

Merlin


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] browsing table with 2 million records

2005-10-27 Thread PFC




I've done it...
First of all I totally agree with PFC's rant regarding absolute
positioning while browsing datasets.  Among other things, it has serious
problems if you have multiple updating your table.  Also it's kind of
silly to be doing this in a set based data paradigm.


	Recently I've been browsing some site and it had this problem : as users  
kept adding new entries as I was browsing the list page after page, when I  
hit next page I got on the next page half of what I already saw on the  
previous page. Of course the webmaster has set the visited links color  
the same as unvisited links, so I couldn't tell, and had to use my  
brain, which was quite upsetting XDDD


	And bookmarking a page to resume browsing at some later time does not  
work either, because if I bookmark page 15, then when I come back, users  
have added 10 pages of content and what I bookmarked is now on page 25...



All very well and good, but now do it generically...


Hehe. I like ranting...
	It is not possible to do it in a generic way that works in all cases. For  
instance :


Forum topic case :
- posts are added at the bottom and not at the top
- page number is relevant and meaningful

However, in most cases, you can use a multipart key and get it right.
	Suppose that, for instance, you have a base of several million records,  
organized according to :


- date (like the original poster)
or :
- country, region, city, customer last name, first name.

	You could ask for the first three, but then you'll get 5 Smiths in  
New York and 1 Van Bliezinsky.


	Or you could precalculate, once a week, a key interval distribution that  
creates reasonable sized intervals (for instance, 100 values in each),  
maybe asking that each interval should only contain only one city. So, you  
would get :


Country Region City LastNameFirstName
USA NYC NY  Smith,  ''
USA NYC NY  Smith,  Albert
USA NYC NY  Smith,  Bernard
.
USA NYC NY  Smith,  William
...
USA NYC NY  Von Braun
...

	So you'd predetermine your page breaks ahead of time, and recompute  
them once in a while. You won't get identically sized pages, but if the  
statistical distribution of the data plays nice, you should get evenly  
sized pages.


	The interesting part is that you can present the user with a selector  
which presents meaningful and useful data, AND is fast to compute, AND is  
fast to use.
	In this case, it would amount to Select country, region, city, then,  
display a list like this :

Smith, ...Albert
Smith, Albus...Bernard
...
Smith, William...
...
Von Braun...Von Schwarts
...

	So Jeannette Smith would be easy to find, being in the link Smith,  
Jean...John for instance.


	If the aim is to quickly locate a particular record, I like  
javascript-powered autocompletion better ; but for browsing, this  
pagination method is cool.


Regards !


























---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] browsing table with 2 million records

2005-10-26 Thread Mark Lewis
Do you have an index on the date column?  Can you post an EXPLAIN
ANALYZE for the slow query?

-- Mark Lewis

On Wed, 2005-10-26 at 13:41 -0700, aurora wrote:
 I am running Postgre 7.4 on FreeBSD. The main table have 2 million
 record (we would like to do at least 10 mil or more). It is mainly a
 FIFO structure with maybe 200,000 new records coming in each day that
 displace the older records.
 
 We have a GUI that let user browser through the record page by page at
 about 25 records a time. (Don't ask me why but we have to have this
 GUI). This translates to something like
 
   select count(*) from table   -- to give feedback about the DB size
   select * from table order by date limit 25 offset 0
 
 Tables seems properly indexed, with vacuum and analyze ran regularly.
 Still this very basic SQLs takes up to a minute run.
 
 I read some recent messages that select count(*) would need a table
 scan for Postgre. That's disappointing. But I can accept an
 approximation if there are some way to do so. But how can I optimize
 select * from table order by date limit x offset y? One minute
 response time is not acceptable.
 
 Any help would be appriciated.
 
 Wy
 
 


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] browsing table with 2 million records

2005-10-26 Thread Scott Marlowe
On Wed, 2005-10-26 at 15:41, aurora wrote:
 I am running Postgre 7.4 on FreeBSD. The main table have 2 million
 record (we would like to do at least 10 mil or more). It is mainly a
 FIFO structure with maybe 200,000 new records coming in each day that
 displace the older records.
 
 We have a GUI that let user browser through the record page by page at
 about 25 records a time. (Don't ask me why but we have to have this
 GUI). This translates to something like
 
   select count(*) from table   -- to give feedback about the DB size
   select * from table order by date limit 25 offset 0
 
 Tables seems properly indexed, with vacuum and analyze ran regularly.
 Still this very basic SQLs takes up to a minute run.
 
 I read some recent messages that select count(*) would need a table
 scan for Postgre. That's disappointing. But I can accept an
 approximation if there are some way to do so. But how can I optimize
 select * from table order by date limit x offset y? One minute
 response time is not acceptable.

Have you run your script without the select count(*) part and timed it?

What does

explain analyze select * from table order by date limit 25 offset 0

say? 

Is date indexed?

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] browsing table with 2 million records

2005-10-26 Thread Joshua D. Drake

 We have a GUI that let user browser through the record page by page at
 about 25 records a time. (Don't ask me why but we have to have this
 GUI). This translates to something like
 
   select count(*) from table   -- to give feedback about the DB size

Do you have a integer field that is an ID that increments? E.g; serial?

   select * from table order by date limit 25 offset 0

You could use a cursor.

Sincerely,

Joshua D. Drake


 
 Tables seems properly indexed, with vacuum and analyze ran regularly.
 Still this very basic SQLs takes up to a minute run.
 
 I read some recent messages that select count(*) would need a table
 scan for Postgre. That's disappointing. But I can accept an
 approximation if there are some way to do so. But how can I optimize
 select * from table order by date limit x offset y? One minute
 response time is not acceptable.
 
 Any help would be appriciated.
 
 Wy
 
 
-- 
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] browsing table with 2 million records

2005-10-26 Thread Alex Turner
You could also create your own index so to speak as a table that
simply contains a list of primary keys and an order value field that
you can use as your offset.  This can be kept in sync with the master
table using triggers pretty easily.  2 million is not very much if you
only have a integer pkey, and an integer order value, then you can
join it against the main table.

create table my_index_table (
primary_key_value int,
order_val int,
primary key (primary_key_value));

create index my_index_table_order_val_i on index_table (order_val);

select * from main_table a, my_index_table b where b.order_val=25 and
b.order_val50 and a.primary_key_id=b.primary_key_id

If the data updates alot then this won't work as well though as the
index table will require frequent updates to potentialy large number
of records (although a small number of pages so it still won't be
horrible).

Alex Turner
NetEconomist

On 10/26/05, Joshua D. Drake [EMAIL PROTECTED] wrote:

  We have a GUI that let user browser through the record page by page at
  about 25 records a time. (Don't ask me why but we have to have this
  GUI). This translates to something like
 
select count(*) from table   -- to give feedback about the DB size

 Do you have a integer field that is an ID that increments? E.g; serial?

select * from table order by date limit 25 offset 0

 You could use a cursor.

 Sincerely,

 Joshua D. Drake


 
  Tables seems properly indexed, with vacuum and analyze ran regularly.
  Still this very basic SQLs takes up to a minute run.
 
  I read some recent messages that select count(*) would need a table
  scan for Postgre. That's disappointing. But I can accept an
  approximation if there are some way to do so. But how can I optimize
  select * from table order by date limit x offset y? One minute
  response time is not acceptable.
 
  Any help would be appriciated.
 
  Wy
 
 
 --
 The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
 PostgreSQL Replication, Consulting, Custom Development, 24x7 support
 Managed Services, Shared and Dedicated Hosting
 Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/


 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] browsing table with 2 million records

2005-10-26 Thread Tom Lane
aurora [EMAIL PROTECTED] writes:
 It would still be helpful if select count(*) can perform well.

If you can settle for an approximate count, pg_class.reltuples might
help you.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] browsing table with 2 million records

2005-10-26 Thread PFC



I am running Postgre 7.4 on FreeBSD. The main table have 2 million record
(we would like to do at least 10 mil or more). It is mainly a FIFO  
structure

with maybe 200,000 new records coming in each day that displace the older
records.


I'm so sorry, but I have to rant XDDD

	People who present a list of 100 items, paginated with 10 items per page  
so that it fits on half a 800x600 screen should be shot.

I can scroll with my mousewheel and use text search in my browser...

	People who present a paginated view with 100.000 pages where you have to  
apply bisection search by hand to find records starting with F are on  
page 38651 should be forced to use a keyboard with just 1 key and type in  
morse code.


	Problem of pagination is that the page number is meaningless and rather  
useless to the user. It is also meaningless to the database, which means  
you have to use slow kludges like count() and limit/offset. And as people  
insert stuff in the table while you browse, when you hit next page you  
will see on top, half of what was on the previous page, because it was  
pushed down by new records. Or you might miss records.


	So, rather than using a meaningless record offset as a page number, you  
can use something meaningful, like a date, first letter of a name, region,  
etc.


	Of course, MySQL, always eager to encourage sucky-sucky practices,  
provides a neat CALC_FOUND_ROWS hack, which, while not being super SQL  
standard compliant, allows you to retrieve the number of rows the query  
would have returned if you wouldn't have used limit, so you can compute  
the number of pages and grab one page with only one query.


	So people use paginators instead of intelligent solutions, like  
xmlhttp+javascript enabled autocompletion in forms, etc. And you have to  
scroll to page 38651 to find letter F.


So if you need to paginate on your site :

CHEAT 

Who needs a paginated view with 100.000 pages ?

- Select min(date) and max(date) from your table
	- Present a nifty date selector to choose the records from any day, hour,  
minute, second

- show them, with next day and previous day buttons

	- It's more useful to the user (most likely he wants to know what  
happened on 01/05/2005 rather than view page 2857)
	- It's faster (no more limit/offset ! just date BETWEEN a AND b,  
indexed of course)

- no more new items pushing old ones to the next page while you browse
- you can pretend to your boss it's just like a paginated list














---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] browsing table with 2 million records

2005-10-26 Thread Christopher Kings-Lynne
We have a GUI that let user browser through the record page by page at 
about 25 records a time. (Don't ask me why but we have to have this 
GUI). This translates to something like


  select count(*) from table   -- to give feedback about the DB size
  select * from table order by date limit 25 offset 0


Heh, sounds like phpPgAdmin...I really should do something about that.

Tables seems properly indexed, with vacuum and analyze ran regularly. 
Still this very basic SQLs takes up to a minute run.


Yes, COUNT(*) on a large table is always slow in PostgreSQL.  Search the 
mailing lists for countless discussions about it.


Chris


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] browsing table with 2 million records

2005-10-26 Thread Christopher Kings-Lynne

Who needs a paginated view with 100.000 pages ?

- Select min(date) and max(date) from your table
- Present a nifty date selector to choose the records from any day, 
hour,  minute, second

- show them, with next day and previous day buttons

- It's more useful to the user (most likely he wants to know what  
happened on 01/05/2005 rather than view page 2857)
- It's faster (no more limit/offset ! just date BETWEEN a AND b,  
indexed of course)

- no more new items pushing old ones to the next page while you browse
- you can pretend to your boss it's just like a paginated list


All very well and good, but now do it generically...


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster