Re: [PERFORM] pg_fetch_array
> Does php need to read database everytime when pg_fetch_array is executed in > the while loop or all the rows have been in the memory after pg_query? You may need to ask the php people about this one. The PostgreSQL protocol would allow data to continue streaming in at the same time as you are processing other rows (asynchronous retrieval). So, optionally they may fetch and cache all rows in local memory at pg_query OR grab them in sets of 1000 rows and cache that (fetching the next set when the first set runs out) OR grab one row for each fetch. You could run a simple select that will fetch 100M rows from a table with no WHERE clause. See how quickly the first row come in, and how much memory is used by the process. I suspect they call all of the rows at pg_query execution. Otherwise they wouldn't know how to respond to a pg_num_rows() call. On a side note, that is a rather unique email address. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Indexing question
Stan Bielski <[EMAIL PROTECTED]> writes: > Table "public.allflow_tv_mydoom" >Column | Type | Modifiers > +-+--- > tv_s | bigint | ^^ > Indexes: allflow_tv_mydoom_x btree (tv_s) > standb=# explain select * from allflow_tv_mydoom where tv_s < 1074200099 > and tv_s > 107506499; > [ gives seqscan ] This is a FAQ :-(. Unadorned integer constants are taken to be int4 not int8 (unless they are too large for int4), and cross-data-type comparisons are not indexable in existing releases. So you have to explicitly cast the comparison values to int8: explain select * from allflow_tv_mydoom where tv_s < 1074200099::bigint and tv_s > 107506499::bigint; (or use the standard CAST syntax if you prefer). 7.5 will have a fix for this ancient annoyance. BTW, is there a reason to be using tv_s+tv_us and not just a single timestamptz column? regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] 50 000 000 Table entries and I have to do a keyword search HELP NEEDED
Title: Message The words for the keyword can be made up of a sentace, ie 10 or more keywords to one entry. Also incase I didnt answer before, we are using TSearch2 and all tables have been fully analyzed and indexed. Any other suggestions? How long do searches take when 10 000 rows are returned? We can not use a limit of 100 because we need to analyze the entire data set returned. Thanks, - Original Message - From: Jeremy Dunn To: 'borajetta' Cc: Postgresql Performance Sent: Tuesday, June 15, 2004 5:43 AM Subject: RE: [PERFORM] 50 000 000 Table entries and I have to do a keyword search HELP NEEDED One option that does not take advantage of any fancy indexing methods is to create a trigger on the table, on insert/update/delete, which extracts each individual word from the field you care about, and creates an entry in another 'keyword' table, id = 'word', value = pk of your original table. then index the keyword table on the 'keyword' field, and do your searches from there. this should improve performance substantially, even on very large return sets, because the keyword table rows are very small and thus a lot of them fit in a disk block. - Jeremy -Original Message-From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of borajettaSent: Monday, June 07, 2004 5:47 PMTo: [EMAIL PROTECTED]Subject: [PERFORM] 50 000 000 Table entries and I have to do a keyword search HELP NEEDED So I have a table with about 50 million entries in it, I have to do a keyword search. The keyword search is done on the title of the entry. For example a entry could be "This is a title string which could be searched" I have tried a few ways to search but I get horrible search times. Some keywords will come up with matches as big as .25 million but most are around 1000-5000. I use an index which narrows the table down to about 1.5-2million entries. I used 2 tables which had a 1:1 correspondence. One held a gist index which was on a int field which searched the for the keyword. Then I would join the table to another to retrieve the rest of the information about the items it matched. This was slow even for returning 100 entries. About 10 seconds, sometimes 5. But when I start getting 1xxx entries its about 30-50 seconds. The rest is just horrible. How should I set up my indexes and or tables. We were thinking of putting the index inside one table then the join would not have to be done but this still returns rather slow results. I have not fully tested this method but it looks like when run for just the keyword search on the title and no joining it can return in about 10 seconds or less. This is a great improvement but I am currently going to make the table all in one and see how long it will take. I believe it will not be much more as there will be no join needed only the returning of some attribute fields. This is still not the kind of time I would like to see, I wanted something around 2 seconds or less. I know there is a lot of information especially if .25 million rows are to be returned but if there is only 1xxx-9xxx rows to be returned I believe 2 seconds seems about right. How do search engines do it? Any suggestions are welcome, Thanks
[PERFORM] Indexing question
Hello all, I have a rather large table (~20 GB) of network logs taken over the period of a month and stored under postgres 7.3. I'm trying to create an indexing scheme so that I can do a query with a "where time > foo and time < bar" and get the results without too much waiting. With this in mind, I created a new database ordering each log entry by the time values (seconds, and microseconds), and then created an index on the time value for seconds. I didn't use the clusterdb command, but I did do a "select * into foo_tbl order by time", which I understand to be the functional equivalent. The result looks something like this: Table "public.allflow_tv_mydoom" Column | Type | Modifiers +-+--- tv_s | bigint | tv_us | bigint | src| inet| dst| inet| sport | integer | dport | integer | bytes_in | bigint | bytes_out | bigint | isn_in | bigint | isn_out| bigint | num_starts | integer | result | integer | flags | integer | age_s | bigint | age_us | bigint | ttl_left | bigint | end_s | bigint | end_us | bigint | Indexes: allflow_tv_mydoom_x btree (tv_s) with tv_s, of course, being my value in seconds. I followed this up with an ANALYZE so postrgres could absorb this structure into its logic. However, whenever I want to do my query, it *insists* on doing a sequential scan, even when I explicitly turn sequential scan off in postgres.conf... here's an example: standb=# explain select * from allflow_tv_mydoom where tv_s < 1074200099 and tv_s > 107506499; QUERY PLAN --- Seq Scan on allflow_tv_mydoom (cost=1.00..102303307.94 rows=1 width=132) Filter: ((tv_s < 1074200099) AND (tv_s > 107506499)) (2 rows) In this query I'm basically asking for a day's worth of data. It should be straightforward: it's all laid out in order and indexed, but it still insists on (what I believe to be) tearing through the entire DB and filtering out the other time values as it sees them. Even if I want just the data from a particular second, it does the same thing: standb=# explain select * from allflow_tv_mydoom where tv_s = 1074200099; QUERY PLAN - Seq Scan on allflow_tv_mydoom (cost=1.00..102140682.45 rows=145 width=132) Filter: (tv_s = 1074200099) (2 rows) Naturally, this is incredibly frustrating because it takes forever, regardless of the query. The funny thing though is that I have the same data in another table where it is ordered and indexed by IP address, and the queries use the index and work the way I want them to. Example: standb=# explain select * from flow_ip_mydoom where src = '10.0.5.5'; QUERY PLAN - Index Scan using flow_ip_mydoom_x on flow_ip_mydoom (cost=0.00..333.41 rows=376 width=132) Index Cond: (src = '10.0.5.5'::inet) (2 rows) Can someone please explain to me what's going on and how to fix it? If there's an easier way to 'jump' to different time regions in the table without indexing all the different seconds values, I'd like to know this as well. Thanks a bunch, -S ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[PERFORM] pg_fetch_array
Hello I would like to know the performance of pg_fetch_array. Cosider the code: $query = "select * from foo"; $result = pg_query( $db, $query ); while ($row = pg_fetch_array($result)) { $a = $row["a"]; $b = $row["b"]; $c = $row["c"]; $d = $row["d"]; } Does php need to read database everytime when pg_fetch_array is executed in the while loop or all the rows have been in the memory after pg_query? If read database is needed, is there any method to copy all the rows into memory by using other command? (because I have a application which needs large amount database update/retrieval and I wish the performance of the overall applications run faster.) or other method you would like to recommend in order to make the faster response time? Thank you in advance. ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] Hi!
Hi, I am trying install the postgresql-7.4.3 simple installation. I did ./configure command at the postgresql directory source. While the configuring proccess I receiving the follow message: checking for tar... /bin/tarchecking for strip... stripchecking whether it is possible to strip libraries... yeschecking for bison... bison -y*** The installed version of Bison is too old. PostgreSQL needs*** Bison version 1.875 or later.checking for perl... /usr/bin/perlchecking for main in -lbsd... nochecking for setproctitle in -lutil... nochecking for main in -lm... yes But, after this message the install proccess continue like this message. The problem is that the installation never finish. I am thinking that the configure proccess is in loop. Have it anything relation with my hardware configuration? The computer where I did this is: AMD K6-II 200 MHZ; 64 MB memory;I would like why the configure proccess never finish. Regards, Janio