Re: [PERFORM] Queries with conditions using bitand operator
One of the possibilities would be to decompose your bitmap into an array of base integers and then create a GIN (or GIST) index on that array (intarray contrib package). This would make sense if your articles are distributed relatively equally and if do not do big ORDER BY and then LIMIT/OFFSET queries, that usually will need to sort the results gotten from the GIN index. As your are also probably doing some tsearch queries on the articles, you can actually build combined (tverctor, intarray) GIN/GIST index to optimize your searches. A simple function, that can help you stripping your bitmap integer to array of positions could look like: -- DROP FUNCTION utils.bitmap_to_position_intarray(bitmap integer); CREATE OR REPLACE FUNCTION utils.bitmap_to_position_intarray(bitmap integer) RETURNS integer[] AS $BODY$ -- test -- select utils.bitmap_to_position_intarray(5); -- test performance -- select utils.bitmap_to_position_intarray(s.i) from generate_series(1, 1) as s(i); -- SELECT ARRAY( SELECT s.i + 1 -- here we do +1 to make the position of the first bit 1 FROM generate_series(0, 31) as s(i) WHERE $1 ( 1 s.i ) 0 ); $BODY$ LANGUAGE SQL IMMUTABLE STRICT; You can create a GIN index directly using this function over your bitmap field and then using array set operations will make the planner to use the GIN index (more information about these indexes here: http://www.postgresql.org/docs/8.4/interactive/textsearch-indexes.html): CREATE INDEX idx_article_status_gin ON article USING gin( (utils.bitmap_to_position_intarray(STATUS) ) ); and then you can do: SELECT * FROM article WHERE utils.bitmap_to_position_intarray(STATUS) ARRAY[1,5]; or SELECT * FROM article WHERE utils.bitmap_to_position_intarray(STATUS) utils.bitmap_to_position_intarray(5); Have a look on the possible array set operations in http://www.postgresql.org/docs/8.4/interactive/intarray.html. Otherwise a solution from Jeo Conway to create separate indexes for each bit also is worth to be looked up. This has actually drawbacks, that you cannot look up combinations of bits efficiently. As an advantage in the example from Jeo, you can efficiently do ORDER BY article (or any other field, that you add into these limited indexes). -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Full text search with ORDER BY performance issue
On Jul 21, 6:06 am, scott.marl...@gmail.com (Scott Marlowe) wrote: On Mon, Jul 20, 2009 at 9:35 PM, Kradekr...@krade.com wrote: But I think I might just do: select * from a where comment_tsv @@ plainto_tsquery('query') and timestamp cast(floor(extract(epoch from CURRENT_TIMESTAMP) - 864000) as integer) order by timestamp desc limit 24 offset 0; And if I get less than 24 rows, issue the regular query: select * from a where comment_tsv @@ plainto_tsquery('query') order by timestamp desc limit 24 offset 0; Couldn't you do tge second query as a with query then run another query to limit that result to everything greater than now()-xdays ? -- Sent via pgsql-performance mailing list (pgsql-performa...@postgresql.org) To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-performance Hi, There is a problem with GIN and GIST indexes, that they cannot be used by the ORDER BY. Maybe it will be a nice idea to ask Oleg to make it possible to use the b-tree columns in GIST or GIN to make the sort easier, but I have no idea how difficult it will be to implement it in current GIN or GIST structures. I think Oleg or even Tom will be the right people to ask it :) But even if it is possible it will not be implemented at least until 8.5 that will need a year to come, so until then... It is possible to strip your table in several smaller ones putting them on different machines and then splitting your query with DBLINK. This will distribute the burden of sorting to several machines that will have to sort smaller parts as well. After you have your 25 ids from each of the machines, you can merge them, sort again and limit as you wish. Doing large offsets will be still problematic but faster anyway in most reasonable offset ranges. (Load balancing tools like pg_pool can automate this task, but I do not have practical experience using them for that purposes) Yet another very interesting technology -- sphinx search (http:// www.sphinxsearch.com/). It can distribute data on several machines automatically, but it will be probably too expensive to start using (if your task is not your main one :)) as they do not have standard automation scripts, it does not support live updates (so you will always have some minutes delay), and this is a standalone service, that needs to be maintained and configured and synchronized with our main database separately (though you can use pg/python to access it from postgres). Good luck with your task :) -- Valentine Gogichashvili -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Terrible Write Performance of a Stored Procedure
On Jun 26, 9:30 pm, goofyheadedp...@gmail.com (Brian Troutwine) wrote: Hello, all. CREATE OR REPLACE FUNCTION item_data_insert( iasin TEXT, iauthor TEXT, ibinding TEXT, icurrency_code TEXT, iisbn TEXT, iheight INTEGER, iwidth INTEGER, ilength INTEGER, iweight INTEGER, ilist_price INTEGER, iproduct_group TEXT, isales_rank INTEGER, ititle TEXT, ioffer_pages INTEGER, ioffers TEXT) RETURNS VOID AS $$ DECLARE y integer[]; BEGIN y[1] := iwidth; y[2] := ilength; y[3] := iheight; y[4] := iweight; BEGIN INSERT INTO item_details (isbn, title, author, binding, list_price, dimensions) VALUES (iisbn, ititle, iauthor, ibinding, ilist_price, y); EXCEPTION WHEN unique_violation THEN UPDATE item_details SET title = ititle, author = iauthor, binding = ibinding, list_price = ilist_price, dimensions = y WHERE isbn = iisbn; END; BEGIN INSERT INTO amazon_items (asin, sales_rank, offers, offer_pages, isbn) VALUES (iasin, isales_rank, crunch(ioffers), ioffer_pages, iisbn); EXCEPTION WHEN unique_violation THEN IF isales_rank IS NOT NULL THEN UPDATE amazon_items SET sales_rank = isales_rank WHERE asin = iasin; END IF; IF ioffers IS NOT NULL THEN UPDATE amazon_items SET offers = crunch(ioffers), offers_last_updated = CURRENT_TIMESTAMP, offer_pages = ioffer_pages WHERE asin = iasin; END IF; END; END; $$ LANGUAGE plpgsql; Hi, did the index on isbn field help? Another note, that is more fine tuning actually, then the real cause of the slow execution of your procedure. If you are expecting to update more, then insert, then you probably should not wait for the exception to be thrown as all the BEGIN EXCEPTION END blocks are more expensive to execute, then simple calls. Have a look here: http://www.postgresql.org/docs/8.3/interactive/plpgsql-control-structures.html#PLPGSQL-ERROR-TRAPPING Also note that if you UPDATE first, and then try to INSERT only when UPDATE could not find anything to update, you really HAVE to expect INSERT to fail and then retry updating, as another, parallel transaction, could be fast enough to INSERT a record after you tried to update and before your transaction starts to insert. With best regards, -- Valentine Gogichashvili -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance with temporary table
I see, I am having practically the same problem... utilizing partitioning idea http://www.postgresql.org/docs/8.3/interactive/ddl-partitioning.html by table inheritance. I have prepared a post with some trigger and rule examples for you http://valgogtech.blogspot.com/2008/04/table-partitioning-automation-triggers.html . So I hope you will find it useful if you are not doing it already yourself :-). About the use of the temporary table, I would say, that you actually could try to add some special row status flag colum (I use char for such flags) to your partitioned tables to mark some rows as unused and then create some conditional indexes that consider this flag for your data operation... This would make it possible for you not to creating temporary tables I hope... With best regards, -- Valentine On Apr 10, 1:33 am, [EMAIL PROTECTED] (samantha mahindrakar) wrote: The partitions are used to separate the data according to months. I have run a query o find bad data from each such partition. The imputation algorithm that i use requires data from 10 previous weeks in order to impute the data. This historical data i store in a temporary table, the i query this data so that i can take a average of all the historical data. Before taking average some computations are performed. Since i need the historical data for every minute of data that i need to impute i have to store the data in some intermediate table. Hence the temporary table. Now i changed the code to use a permanent table that is truncated after one set of data is imputed. I hope this makes sense. Samantha On Wed, Apr 9, 2008 at 6:44 AM, valgog [EMAIL PROTECTED] wrote: On Apr 7, 8:27 pm, [EMAIL PROTECTED] (samantha mahindrakar) wrote: Hi I have written a program that imputes(or rather corrects data) with in my database. Iam using a temporary table where in i put data from other partitoined table. I then query this table to get the desired data.But the thing is this temporary table has to be craeted for every record that i need to correct and there are thousands of such records that need to be corrected. So the program necessarily creates a temporary table evrytime it has to correct a record. However this table is dropeed after each record is corrected. The program works fine.but it runs for a very long timeor it runs for days. Iam particularyly finding that it takes more time during this statement: NOTICE: theQuery in createtablevolumelaneshist CREATE TEMPORARY TABLE predictiontable(lane_id, measurement_start, speed,volume,occupancy) AS SELECT lane_id, measurement_start, speed,volume,occupancy FROM samantha.lane_data_I_495 WHERE lane_id IN (1317) AND measurement_start BETWEEN '2007-11-18 09:25:00' AND 2007-11-19 01:39:06' Iam not sure if i can use a cursor to replicate the functionality of the temp table. Is the performance bad because of the creation and deletion of the temp table? Thanks Samantha -- Sent via pgsql-performance mailing list ([EMAIL PROTECTED]) To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-performance And why do you copy data from the partition tables? Did you try to manipulate data directly in the needed tables? Or you are aggregating some of the data there? How the partitioning is actually designed? Do you use table inheritance? -- Valentine -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance with temporary table
On Apr 7, 8:27 pm, [EMAIL PROTECTED] (samantha mahindrakar) wrote: Hi I have written a program that imputes(or rather corrects data) with in my database. Iam using a temporary table where in i put data from other partitoined table. I then query this table to get the desired data.But the thing is this temporary table has to be craeted for every record that i need to correct and there are thousands of such records that need to be corrected. So the program necessarily creates a temporary table evrytime it has to correct a record. However this table is dropeed after each record is corrected. The program works fine.but it runs for a very long timeor it runs for days. Iam particularyly finding that it takes more time during this statement: NOTICE: theQuery in createtablevolumelaneshist CREATE TEMPORARY TABLE predictiontable(lane_id, measurement_start, speed,volume,occupancy) AS SELECT lane_id, measurement_start, speed,volume,occupancy FROM samantha.lane_data_I_495 WHERE lane_id IN (1317) AND measurement_start BETWEEN '2007-11-18 09:25:00' AND 2007-11-19 01:39:06' Iam not sure if i can use a cursor to replicate the functionality of the temp table. Is the performance bad because of the creation and deletion of the temp table? Thanks Samantha -- Sent via pgsql-performance mailing list ([EMAIL PROTECTED]) To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-performance And why do you copy data from the partition tables? Did you try to manipulate data directly in the needed tables? Or you are aggregating some of the data there? How the partitioning is actually designed? Do you use table inheritance? -- Valentine -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] response time when querying via JDBC and via psql differs
Do not use setString() method to pass the parameter to the PreparedStatement in JDBC. Construct an SQL query string as you write it here and query the database with this new SQL string. This will make the planner to recreate a plan every time for every new SQL string per session (that is not usually good) but it will make the planner to choose a correct plan. -- Valentine Gogichashvili On Feb 25, 11:06 am, [EMAIL PROTECTED] (Pavel Rotek) wrote: Hi all, i have strange problem with performance in PostgreSQL (8.1.9). My problem shortly: I'm using postgreSQL via JDBC driver (postgresql-8.1-404.jdbc3.jar) and asking the database for search on table with approximately 3 000 000 records. I have created functional index table(lower(href) varchar_pattern_ops) because of lower case like searching. When i ask the database directly from psql, it returns result in 0,5 ms, but when i put the same command via jdbc driver, it returns in 10 000 ms. Where can be the problem?? Any problem with PostgreSQL tuning?? The command is select df.id as id, df.c as c, df.href as href, df.existing as existing, df.filesize as filesize from documentfile df where (lower(href) like 'aba001!_223581.djvu' escape '!' ) order by id limit 1 Thank you very much for any help, Kind regards, Pavel Rotek ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Index usage when bitwise operator is used
Hi Tom, do you think it would be a good idea to ask GIN index team to implement an int-based bitmap set indexing operator for GIN/GiST based indexes? Or there will be a possibility to somehow optimally index arrays of enumerations to implement such bitmap structures in 8.3 or later postgresql versions? With best regards, -- Valentine On Sep 17, 3:37 pm, [EMAIL PROTECTED] (Tom Lane) wrote: Kevin Grittner [EMAIL PROTECTED] writes: On Mon, Sep 17, 2007 at 2:49 AM, in message [EMAIL PROTECTED], valgog [EMAIL PROTECTED] wrote:=20 Are you sure you understood what was the question? Is the TBL1.CATEGORY = TBL2.CATEGORY the same as TBL1.CATEGORY TBL2.CATEGORY 0? Yes, given that he stipulated that one and only one bit would be set. Really? In that case, isn't this bit-field just a bad implementation of an enum-style field? regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Index usage when bitwise operator is used
Hi, I could not find and normal solution for that issue. But I am using some workarounds for that issue. The solution, that I am using now is to create an index for every bit of your bitmap field. So something like CREATE INDEX idx_hobbybit_0_limited ON versionA.user_fast_index USING btree (gender, dateofbirth) -- here the gender and dateofbirth fields are the fields that we usually ORDER BY in the select statements, but you can play with the needed fields WHERE (hobby_bitmap 1) 0; by creating such an index for every used bit and combining WHERE (hobby_bitmap 1 ) 0 like statements the planner will be choosing the right index to use. Another workaround, that will be more applicable in your case I think, is to create a functional GIN index on your bitmap field using a static function to create an array of bitmap keys from your bitmap field. CREATE OR REPLACE FUNCTION versionA.bitmap_to_bit_array(source_bitmap integer) RETURNS integer[] AS 'select ARRAY( select (1 s.i) from generate_series(0, 32) as s(i) where ( 1 s.i ) $1 0 )' LANGUAGE 'sql' IMMUTABLE STRICT; And than create a GIN index on the needed field using this stored procedure. After that, it would be possible to use intarray set operators on the result of that function. This will also make it possible to use that GIN index. Actually it would be much much better if it were possible to build GIN indexes directly on the bitmap fields. But this is to be implemented by GIN and GiST index development team. Probably would be not a bad idea to make a feature request on them. With best regards, Valentine Gogichashvili On Sep 13, 2:30 pm, [EMAIL PROTECTED] (W.Alphonse HAROUNY) wrote: Hello, My question is about index usage when bitwise operations are invoked. Situation Context: -- Lets suppose we have 2 tables TBL1 and TBL2 as the following: TBL1 { . ; integer categoryGroup; // categoryGroup is declared as an index on TABL1 . ; } TBL2 { . ; integer categoryGroup; // categoryGroup is declared as an index on TABL2 . ; } By conception, I suppose that: - [categoryGroup] may hold a limited number of values, less than 32 values. - [categoryGroup] is of type integer = it means 4 bytes = 32 bits = 32 places available to hold binary '0' or binary '1' values. - [categoryGroup] is the result of an OR bitwise operation among a predefined set of variables [variableCategory]. We suppose that [variableCategory] is of type integer (=32 bits) and each binary value of [variableCategory] may only hold a single binary '1'. Ex: variableCategory1 = 0010 variableCategory2 = 0010 variableCategory3 = 1000 If [categoryGroup] = variableCategory1 | variableCategory2 | variableCategory3 =[categoryGroup] = 00101010 Question: -- I have an SQL request similar to: SELECT . FROM TBL1, TBL2 WHERE inner join between TBL1 and TBL2 is True AND TBL1.CATEGORY TBL2.CATEGORY 0 //-- where is the AND bitwise operator Qst: 1/ IS the above SQL request will use the INDEX [categoryGroup] defined on TBL1 and TBL2 ? 2/ What should I do or How should I modify my SQL request in order to force the query engine to use an index ? (the already defined index or another useful index) Thx a lot ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Index usage when bitwise operator is used
What about saying?: TBL1.CATEGORY = TBL2.CATEGORY Are you sure you understood what was the question? Is the TBL1.CATEGORY = TBL2.CATEGORY the same as TBL1.CATEGORY TBL2.CATEGORY 0? ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] select count(*) performance
On Aug 11, 5:54 pm, Detlef Rudolph [EMAIL PROTECTED] wrote: Hello Group, I've tried the VACUUM ANALYSE, that doesn't help much, but VACUUM FULL improves Performance down from about 40 secs to 8. I think in future I would use the reltuples value from pg_class for the table. Thanks a lot for your answers and a good Sunday, Det just do not forget, that reltuples is count and updated in pg_class only during the vacuuming or analyzing of a table... so the value is only an APPROXIMATE -- Valentine ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] multicolumn index column order
On Jul 25, 2:14 am, Lew [EMAIL PROTECTED] wrote: How about two indexes, one on each column? Then the indexes will cooperate when combined in a WHERE clause. http://www.postgresql.org/docs/8.2/interactive/indexes-bitmap-scans.html I don't believe the index makes a semantic difference with regard to ascending or descending. An index is used to locate records in the selection phase of a query or modification command. -- Lew Ordered indexes (b-tree in this case) are also used to get the needed record order and it is absolutely not necessary to have a WHARE clause in your select statement to use them when you are using ORDER BY. -- Valentine ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] disable archiving
On Jul 23, 7:24 pm, [EMAIL PROTECTED] (Paul van den Bogaard) wrote: the manual somewhere states ... if archiving is enabled... To me this implies that archiving can be disabled. However I cannot find the parameter to use to get this result. Or should I enable archiving and use a backup script like #!/usr/bin/bash exit 0 Would appreciate a hint. And yes I know I put my database in danger etc. This is for some benchmarks where I do not want the overhead of archiving. Jus a file system that will not fill with zillions of these 16MB WAL files ;^) Thanks Paul. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly Is it normal to spoil other threads? or is it a bug? If it is not a bug, please change the subject of the topic back to what it was! With best regards, Valentine Gogichashvili ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] multicolumn index column order
Hello all, how to build an multicolumn index with one column order ASCENDING and another column order DESCENDING? The use case that I have is that I use 2 column index where the first column is kind of flag and the second column is an actual ordering column. The flag should be always ordered DESCENDING, but the second column is ordered DESCENDING when it is a numeric column, and ASCENDING when it is a text column. CREATE TABLE storage (id int, flag int, numeric_data int, text_data text); SELECT * FROM storage ORDER BY flag DESC, numeric_column DESC LIMIT 20 OFFSET 0; SELECT * FROM storage ORDER BY flag DESC, text_column ASC LIMIT 20 OFFSET 0; Definitely the multicolumn index on (flag, numeric_column) is being used. But how to create an index on (flag, text_column DESC)? I will try to index by ((-flag), text_column) and sort by (-flag) ASC, but it, to say the truth, does not really look like a nice solution. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] multicolumn index column order
On Jul 23, 7:00 pm, [EMAIL PROTECTED] (Tom Lane) wrote: valgog [EMAIL PROTECTED] writes: how to build an multicolumn index with one column order ASCENDING and another column order DESCENDING? Use 8.3 ;-) In existing releases you could fake it with a custom reverse-sorting operator class, but it's a pain in the neck to create one. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate ok, thanks for a rapid answer, can live with the ((-flag), text_column) functional multicolumn index by now. Waiting for 8.3 :-) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] [PERFORMANCE] is it possible to force an index to be held in memory?
Hi, I have found some discussions about that issue, but did not find the answer actually. Is there a way to be sure, that some indexes are alway in memory? My tests bringing them to the memory based file system (ramfs) tablespace showed really a very significant performance gain. But a perspective to have a corrupted database after each machine restart does not really make me feel good. With best regards, -- Valentine Gogichashvili ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Key/Value reference table generation: INSERT/UPDATE performance
I found several post about INSERT/UPDATE performance in this group, but actually it was not really what I am searching an answer for... I have a simple reference table WORD_COUNTS that contains the count of words that appear in a word array storage in another table. CREATE TABLE WORD_COUNTS ( word text NOT NULL, count integer, CONSTRAINT PK_WORD_COUNTS PRIMARY KEY (word) ) WITHOUT OIDS; I have some PL/pgSQL code in a stored procedure like FOR r IN select id, array_of_words from word_storage LOOP begin -- insert the missing words insert into WORD_COUNTS ( word, count ) ( select word, 0 from ( select distinct (r.array_of_words) [s.index] as d_word from generate_series(1, array_upper( r.array_of_words, 1 ) ) as s(index) ) as distinct_words where word not in ( select d_word from WORD_COUNTS ) ); -- update the counts update WORD_COUNTS set count = COALESCE( count, 0 ) + 1 where word in ( select distinct (r.array_of_words)[s.index] as word from generate_series(1, array_upper( r.array_of_words, 1) ) as s(index) ); exception when others then error_count := error_count + 1; end; record_count := record_count + 1; END LOOP; This code runs extremely slowly. It takes about 10 minutes to process 1 records and the word storage has more then 2 million records to be processed. Does anybody have a know-how about populating of such a reference tables and what can be optimized in this situation. Maybe the generate_series() procedure to unnest the array is the place where I loose the performance? Are the set update/inserts more effitient, then single inserts/updates run in smaller loops? Thanks for your help, Valentine Gogichashvili ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Key/Value reference table generation: INSERT/UPDATE performance
I have rewritten the code like existing_words_array := ARRAY( select word from WORD_COUNTS where word = ANY ( array_of_words ) ); not_existing_words_array := ARRAY( select distinct_word from ( select distinct (array_of_words)[s.index] as distinct_word from generate_series(1, array_upper( array_of_words, 1 ) ) as s(index) ) as distinct_words where distinct_word ALL ( existing_words_array ) ); -- insert the missing words if not_existing_words_array is not null then insert into WORD_COUNTS ( word, count ) ( select word, 1 from ( select not_existing_words_array[s.index] as word from generate_series( 1, array_upper( not_existing_words_array, 1 ) ) as s(index) ) as distinct_words ); end if; -- update the counts if existing_words_array is not null then update WORD_COUNTS set count = COALESCE( count, 0 ) + 1 where sw_word = ANY ( existing_words_array ); end if; Now it processes a million records in 14 seconds... so it was probably the problem of looking up NOT IN WORD_COUNTS was way too expencive ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Key/Value reference table generation: INSERT/UPDATE performance
On May 22, 12:14 pm, [EMAIL PROTECTED] (PFC) wrote: On Tue, 22 May 2007 10:23:03 +0200, valgog [EMAIL PROTECTED] wrote: I found several post about INSERT/UPDATE performance in this group, but actually it was not really what I am searching an answer for... I have a simple reference table WORD_COUNTS that contains the count of words that appear in a word array storage in another table. Mmm. If I were you, I would : - Create a procedure that flattens all the arrays and returns all the words : PROCEDURE flatten_arrays RETURNS SETOF TEXT FOR word_array IN SELECT word_array FROM your_table LOOP FOR i IN 1...array_upper( word_array ) LOOP RETURN NEXT tolower( word_array[ i ] ) So, SELECT * FROM flatten_arrays() returns all the words in all the arrays. To get the counts quickly I'd do this : SELECT word, count(*) FROM flatten_arrays() AS word GROUP BY word You can then populate your counts table very easily and quickly, since it's just a seq scan and hash aggregate. One second for 10.000 rows would be slow. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq good idea indeed! will try this approach. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Key/Value reference table generation: INSERT/UPDATE performance
On May 22, 12:00 pm, valgog [EMAIL PROTECTED] wrote: I have rewritten the code like existing_words_array := ARRAY( select word from WORD_COUNTS where word = ANY ( array_of_words ) ); not_existing_words_array := ARRAY( select distinct_word from ( select distinct (array_of_words)[s.index] as distinct_word from generate_series(1, array_upper( array_of_words, 1 ) ) as s(index) ) as distinct_words where distinct_word ALL ( existing_words_array ) ); -- insert the missing words if not_existing_words_array is not null then insert into WORD_COUNTS ( word, count ) ( select word, 1 from ( select not_existing_words_array[s.index] as word from generate_series( 1, array_upper( not_existing_words_array, 1 ) ) as s(index) ) as distinct_words ); end if; -- update the counts if existing_words_array is not null then update WORD_COUNTS set count = COALESCE( count, 0 ) + 1 where sw_word = ANY ( existing_words_array ); end if; Now it processes a million records in 14 seconds... so it was probably the problem of looking up NOT IN WORD_COUNTS was way too expencive Sorry... this code did not update anythig at all, as I forgot about the NULL values... had to COALASCE practically everything and use array_upper()... do not have the performance numbers of the insert, updates yet... ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly