date:20040620

Re: [PERFORM] pg_fetch_array

2004-06-20 Thread Rod Taylor

> Does php need to read database everytime when pg_fetch_array is executed in
> the while loop or all the rows have been in the memory after pg_query?

You may need to ask the php people about this one. The PostgreSQL
protocol would allow data to continue streaming in at the same time as
you are processing other rows (asynchronous retrieval). So, optionally
they may fetch and cache all rows in local memory at pg_query OR grab
them in sets of 1000 rows and cache that (fetching the next set when the
first set runs out) OR grab one row for each fetch.

You could run a simple select that will fetch 100M rows from a table
with no WHERE clause. See how quickly the first row come in, and how
much memory is used by the process.

I suspect they call all of the rows at pg_query execution. Otherwise
they wouldn't know how to respond to a pg_num_rows() call.


On a side note, that is a rather unique email address.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] Indexing question

2004-06-20 Thread Tom Lane

Stan Bielski <[EMAIL PROTECTED]> writes:
>  Table "public.allflow_tv_mydoom"
>Column   |  Type   | Modifiers
> +-+---
>  tv_s   | bigint  |
^^
> Indexes: allflow_tv_mydoom_x btree (tv_s) 

> standb=# explain select * from allflow_tv_mydoom where tv_s < 1074200099
> and tv_s > 107506499;
> [ gives seqscan ]

This is a FAQ :-(.  Unadorned integer constants are taken to be int4
not int8 (unless they are too large for int4), and cross-data-type
comparisons are not indexable in existing releases.  So you have to
explicitly cast the comparison values to int8:

explain select * from allflow_tv_mydoom where tv_s < 1074200099::bigint
and tv_s > 107506499::bigint;

(or use the standard CAST syntax if you prefer).

7.5 will have a fix for this ancient annoyance.

BTW, is there a reason to be using tv_s+tv_us and not just a single
timestamptz column?

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] 50 000 000 Table entries and I have to do a keyword search HELP NEEDED

2004-06-20 Thread Aaron

Title: Message



The words for the keyword can be made up of a 
sentace, ie 10 or more keywords to one entry.
Also incase I didnt answer before, we are using 
TSearch2 and all tables have been fully analyzed and indexed.
Any other suggestions?
How long do searches take when 10 000 rows are 
returned?
We can not use a limit of 100 because we need to 
analyze the entire data set returned.
Thanks,

  - Original Message - 
  From: 
  Jeremy 
  Dunn 
  To: 'borajetta' 
  Cc: Postgresql Performance 
  
  Sent: Tuesday, June 15, 2004 5:43 
AM
  Subject: RE: [PERFORM] 50 000 000 Table 
  entries and I have to do a keyword search HELP NEEDED
  
  One 
  option that does not take advantage of any fancy indexing methods is to create 
  a trigger on the table, on insert/update/delete, which extracts each 
  individual word from the field you care about, and creates an entry in another 
  'keyword' table, id = 'word', value = pk of your original table.  then 
  index the keyword table on the 'keyword' field, and do your searches from 
  there.  this should improve performance substantially, even on very large 
  return sets, because the keyword table rows are very small and thus a lot of 
  them fit in a disk block.
   
  - 
  Jeremy
  

-Original Message-From: 
[EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of 
borajettaSent: Monday, June 07, 2004 5:47 PMTo: 
[EMAIL PROTECTED]Subject: [PERFORM] 50 000 000 
Table entries and I have to do a keyword search HELP 
NEEDED

So I have a table with about 50 million entries in it, I have to do a 
keyword search.
 
The keyword search is done on the title of the entry.  For example a entry could be "This 
is a title string which could be searched"
 
I have tried a few ways to search but I get horrible search 
times.  Some keywords will come 
up with matches as big as .25 million but most are around 
1000-5000.
 
I use an index which narrows the table down to about 1.5-2million 
entries.
 
I used 2 tables which had a 1:1 correspondence.
One held a gist index which was on a int field which searched the for 
the keyword.  Then I would join 
the table to another to retrieve the rest of the information about the items 
it matched.
 
This was slow even for returning 100 entries.  About 10 seconds, sometimes 5.  But when I start getting 1xxx 
entries its about 30-50 seconds.  
The rest is just horrible.
 
How should I set up my indexes and or tables.
We were thinking of putting the index inside one table then the join 
would not have to be done but this still returns rather slow 
results.
 
I have not fully tested this method but it looks like when run for 
just the keyword search on the title and no joining it can return in about 
10 seconds or less.
This is a great improvement but I am currently going 
to make the table all in one and see how long it will take.  I believe it will not be much more 
as there will be no join needed only the returning of some attribute 
fields.  
 
This is still not the kind of time I would like to see, I wanted 
something around 2 seconds or less.  
I know there is a lot of information especially if .25 million rows 
are to be returned but if there is only 1xxx-9xxx rows to be returned I 
believe 2 seconds seems about right.
 
How do search engines do it?
Any suggestions are welcome,
 
Thanks

[PERFORM] Indexing question

2004-06-20 Thread Stan Bielski

Hello all,

I have a rather large table (~20 GB) of network logs taken over the period
of a month and stored under postgres 7.3. I'm trying to create an indexing
scheme so that I can do a query with a "where time > foo and time < bar"
and get the results without too much waiting. With this in mind, I created
a new database ordering each log entry by the time values (seconds, and
microseconds), and then created an index on the time value for seconds. I
didn't use the clusterdb command, but I did do a "select * into foo_tbl
order by time", which I understand to be the functional equivalent. The
result looks something like this: 

 Table "public.allflow_tv_mydoom"
   Column   |  Type   | Modifiers
+-+---
 tv_s   | bigint  |
 tv_us  | bigint  |
 src| inet|
 dst| inet|
 sport  | integer |
 dport  | integer |
 bytes_in   | bigint  |
 bytes_out  | bigint  |
 isn_in | bigint  |
 isn_out| bigint  |
 num_starts | integer |
 result | integer |
 flags  | integer |
 age_s  | bigint  |
 age_us | bigint  |
 ttl_left   | bigint  |
 end_s  | bigint  |
 end_us | bigint  |
Indexes: allflow_tv_mydoom_x btree (tv_s) 

with tv_s, of course, being my value in seconds. I followed this up with
an ANALYZE so postrgres could absorb this structure into its logic. 
However, whenever I want to do my query, it *insists* on doing a
sequential scan, even when I explicitly turn sequential scan off in
postgres.conf...  here's an example: 

standb=# explain select * from allflow_tv_mydoom where tv_s < 1074200099
and tv_s > 107506499;
QUERY PLAN
---
 Seq Scan on allflow_tv_mydoom  (cost=1.00..102303307.94 rows=1
width=132)
   Filter: ((tv_s < 1074200099) AND (tv_s > 107506499))
(2 rows)  

In this query I'm basically asking for a day's worth of data. It should be
straightforward: it's all laid out in order and indexed, but it still
insists on (what I believe to be) tearing through the entire DB and
filtering out the other time values as it sees them. Even if I want just
the data from a particular second, it does the same thing:

standb=# explain select * from allflow_tv_mydoom where tv_s = 1074200099;
 QUERY PLAN
-
 Seq Scan on allflow_tv_mydoom  (cost=1.00..102140682.45 rows=145
width=132)
   Filter: (tv_s = 1074200099)
(2 rows) 

Naturally, this is incredibly frustrating because it takes forever,
regardless of the query.

The funny thing though is that I have the same data in another table
where it is ordered and indexed by IP address, and the queries use the
index and work the way I want them to. Example:

standb=# explain select * from flow_ip_mydoom where src = '10.0.5.5';
 QUERY PLAN
-
 Index Scan using flow_ip_mydoom_x on flow_ip_mydoom  (cost=0.00..333.41
rows=376 width=132)
   Index Cond: (src = '10.0.5.5'::inet)
(2 rows)

Can someone please explain to me what's going on and how to fix it? If
there's an easier way to 'jump' to different time regions in the table
without indexing all the different seconds values, I'd like to know this
as well. 

Thanks a bunch,
-S





---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

[PERFORM] pg_fetch_array

2004-06-20 Thread Thanks

Hello

I would like to know the performance of pg_fetch_array.  Cosider the code:

$query = "select * from foo";
$result = pg_query( $db, $query );

while ($row = pg_fetch_array($result))
{
   $a = $row["a"];
   $b = $row["b"];
   $c = $row["c"];
   $d = $row["d"];
}

Does php need to read database everytime when pg_fetch_array is executed in
the while loop or all the rows have been in the memory after pg_query?

If read database is needed, is there any method to copy all the rows into
memory by using other command? (because I have a application which needs
large amount database update/retrieval and I wish the performance of the
overall applications run faster.)

or other method you would like to recommend in order to make the faster
response time?

Thank you in advance.



---(end of broadcast)---
TIP 8: explain analyze is your friend

[PERFORM] Hi!

2004-06-20 Thread Janio Rosa da Silva




Hi,
 
I am trying install the postgresql-7.4.3 simple 
installation. I did ./configure command at the postgresql directory source. 
While the configuring proccess I receiving the follow message:
 
checking for tar... /bin/tarchecking for 
strip... stripchecking whether it is possible to strip libraries... 
yeschecking for bison... bison -y*** The installed version of Bison is 
too old.  PostgreSQL needs*** Bison version 1.875 or later.checking 
for perl... /usr/bin/perlchecking for main in -lbsd... nochecking for 
setproctitle in -lutil... nochecking for main in -lm... yes
 
But, after this message the install proccess 
continue like this message. The problem is that the installation never finish. I 
am thinking that the configure proccess is in loop. Have it anything relation 
with my hardware configuration? The computer where I did this is:  AMD 
K6-II 200 MHZ; 64 MB memory;I would like why the configure proccess never 
finish.
 
Regards,
 
Janio

Re: [PERFORM] pg_fetch_array

Re: [PERFORM] Indexing question

Re: [PERFORM] 50 000 000 Table entries and I have to do a keyword search HELP NEEDED

[PERFORM] Indexing question

[PERFORM] pg_fetch_array

[PERFORM] Hi!

6 matches

Site Navigation

Mail list logo

Footer information