date:20070808

Re: [SQL] Converting from MS Access field aliases

2007-08-08 Thread Enrico Weigelt

* Tom Lane <[EMAIL PROTECTED]> wrote:

Hi,

> This is not an "extension", it is *directly* contrary to both 
> the letter and spirit of the SQL standard. 

at which point is this breaking the specification ?
What would happen if postgres would allow this ? 

IMHO supporting aliases in where clauses would make some 
queries easier to read. Think of cases where some column 
is coming from an complex calculation (ie. many nested 
CASE'es, etc, etc) and you need that calculated column
in the WHERE clause. Of course it's good style to encode
those things in views, but is this always very performant ?

cu
-- 
-
 Enrico Weigelt==   metux IT service - http://www.metux.de/
-
 Please visit the OpenSource QM Taskforce:
http://wiki.metux.de/public/OpenSource_QM_Taskforce
 Patches / Fixes for a lot dozens of packages in dozens of versions:
http://patches.metux.de/
-

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

[SQL] Implementing an regex filter

2007-08-08 Thread Enrico Weigelt


Hi folks,


I've coded an auction crawler, which queries several auction
platforms for user-defined keywords. The results are then 
put into queues an sorted out from there. For example each
auction query can be passed along with an maxprice value, 
which is put into the result records. Every few minutes an
filter moves those result records where the article reached
the maxprice away to another queue. 

The blacklist filter (which makes me headaches) moves those 
records where article's title matches some regex'es to another 
queue. One of the users has more than 630 blacklist entries,
linked to one regex about 24kb. I've tried differet approaches,
comparing to each single blacklist entry (join'ing the 
blacklist table) or comparing against one huge regex (put the
compiled regex'es per user to an separate table). Both run 
very, very long (> 15mins) and make heavy load.

My current scheme (reduced to required stuff):

* article table:  article_id(oid) 
  title(text)
  current_price(float)
  ... 

* user table: user_id(oid)
  username(text)
  ...
 
* user_results table: article_id(oid)
  user_id(oid)  
  username(oid)
  queue(oid)   <-- only scanning 'incoming'
  seen(boolean)<-- seen results are skipped
  ...
  
* user_db_list:   username(text)
  dbname(text)
  data(text)
  ...

* heap:   name(text)
  data(text)


This is the explain analyze output of the compiled-regex approach:
(the compiled regex is stored in the "heap" table)

auctionwatch=> explain analyze update base.user_results set queue='FOO' 
   WHERE queue = 'incoming' AND 
 NOT seen AND 
 base.user_results.article_id = base.articles.inode_id AND 
 base.articles.end_time > current_timestamp AND 
 base.articles.title ~ (
SELECT data FROM base.heap WHERE name = 
'blacklist.title::'||base.user_results.username);

Hash Join  (cost=2131.38..7622.69 rows=22 width=56) (actual 
time=1040416.087..1128977.760 rows=1 loops=1)
Hash Cond: ("outer".article_id = "inner".inode_id)
Join Filter: ("inner".title ~ (subplan))
->  Seq Scan on user_results  (cost=0.00..593.08 rows=11724 width=56) 
(actual time=0.104..518.036 rows=11189 loops=1)
Filter: ((queue = 'incoming'::text) AND (NOT seen))
->  Hash  (cost=2014.41..2014.41 rows=8787 width=57) (actual 
time=250.946..250.946 rows=0 loops=1)
->  Seq Scan on articles  (cost=0.00..2014.41 rows=8787 width=57) 
(actual time=0.702..232.754 rows=8663 loops=1)
Filter: (end_time > ('now'::text)::timestamp(6) with time zone)
SubPlan
  ->  Seq Scan on heap  (cost=0.00..1.01 rows=1 width=32) (actual 
time=0.070..0.072 rows=1 loops=5998)
 Filter: (name = ('blacklist.title::'::text || $0))

Total runtime: 1129938.362 ms
  

And the approach via joining the regex table:

auctionwatch=> explain analyze update base.user_results set queue = 'FOO' 
   WHERE queue = 'incoming' AND 
 NOT seen AND 
 base.user_results.article_id = base.articles.inode_id AND 
 base.articles.end_time > current_timestamp AND 
 base.articles.title ~ base.user_db_list.data AND 
 base.user_db_list.username = base.user_results.username 
AND 
 base.user_db_list.dbname = 'BLACKLIST.TITLE' ;

Hash Join  (cost=3457.12..11044097.45 rows=3619812 width=56) (actual 
time=90458.408..126119.167 rows=2 loops=1)
   Hash Cond: ("outer".username = "inner".username)
   Join Filter: ("inner".title ~ "outer".data)
   ->  Seq Scan on user_db_list  (cost=0.00..5268.16 rows=186333 width=51) 
(actual time=512.939..514.394 rows=634 loops=1)
 Filter: (dbname = 'BLACKLIST.TITLE'::text)
   ->  Hash  (cost=3373.49..3373.49 rows=4254 width=109) (actual 
time=466.177..466.177 rows=0 loops=1)
 ->  Hash Join  (cost=2221.01..3373.49 rows=4254 width=109) (actual 
time=225.006..439.334 rows=6023 loops=1)
   Hash Cond: ("outer".article_id = "inner".inode_id)
   ->  Seq Scan on user_results  (cost=0.00..593.08 rows=11724 
width=56) (actual time=0.155..85.865 rows=11223 loops=1)
 Filter: ((queue = 'incoming'::text) AND (NOT seen))
   ->  Hash  (cost=2099.20..2099.20 rows=9127 width=57) (actual 
time=205.996..205.996 rows=0 loops=1)
 ->  Seq Scan on articles  (cost=0.00..2099.20 rows=9127 
width=57) (actual time=0.373..187.468 rows=8662 loops=1)
   Filter:

[SQL] Performance on writable views

2007-08-08 Thread Enrico Weigelt


Hi folks,


I'm often using writable views as interfaces to clients, so 
they only see "virtual" objects and never have to cope with
the actual storage, ie. to give some client an totally 
denormalized view of certain things, containing only those 
information required for certain kind of operations. 

This method is nice for creating easy and robust client 
interfaces - internal schema changes are not visible to 
the client. In situations when many, many clients - often
coded/maintained by different people - have to access an
database which is still under development (typical for 
many inhouse applications), it helps to circument interface
instabilities.

Now I've got the strange feeling that this makes updates
slow, since it always has to run the whole view query to
fetch an record to be updated (ie. to get OLD.*).

Could anyone with some deep insight please give me some 
details about that issue ?


cu 
-- 
-
 Enrico Weigelt==   metux IT service - http://www.metux.de/
-
 Please visit the OpenSource QM Taskforce:
http://wiki.metux.de/public/OpenSource_QM_Taskforce
 Patches / Fixes for a lot dozens of packages in dozens of versions:
http://patches.metux.de/
-

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [SQL] Database synchronization

2007-08-08 Thread Enrico Weigelt

* Jyoti Seth <[EMAIL PROTECTED]> wrote:

Hi,

if you can live with some seconds lag, you can use an separate
transfer process which looks at the source of your prices and 
and updates the second DB if some price changes. 

Depending on how often prices change, there're two good options:

a) add an timestamp field to the original price table, which is 
   maintained by an after-trigger. The transfer regularily 
   (ie. every few secs) fetches all those records which have 
   been changed since last poll (via the new timestamp field).

b) create an new table for an journal of the price changes.
   this journal is filled by rules on the original table and 
   contains exactly what's needed to reproduce the price changes
   in the other DB. The transfer process regularily fetches
   the journal and rolls it out in the second DB.
   
I've implemented the method b) in an realtime stock (FOREX) 
trading applications where trading automatically robots execute 
transactions on the customer's broker accounts on signals 
provided by an team of profession traders. (the client can 
subscribe to several trading signal channels and define in 
which quantities transactions should be performed from which
channels should be executed on their broker accounts and the
broking platform does all the time critical work). The robots
work entirely on separate databases (on separate hosts),
several transfer processes feed in robot commands and fetch
back results to the primary database. So time critical things
are separated to own servers. It works quite fine :)


cu
-- 
-
 Enrico Weigelt==   metux IT service - http://www.metux.de/
-
 Please visit the OpenSource QM Taskforce:
http://wiki.metux.de/public/OpenSource_QM_Taskforce
 Patches / Fixes for a lot dozens of packages in dozens of versions:
http://patches.metux.de/
-

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [SQL] Implementing an regex filter

2007-08-08 Thread Paul Lambert


Enrico Weigelt wrote:

Hi folks,


Any hints for futher optimization appreciated :)


thx



It doesn't look like you have any indexes - I'd add one to at least 
articles.title and blacklist.title to start with and probably also 
user_results.article_id and articles.inode_id.


--
Paul Lambert
Database Administrator
AutoLedgers


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [SQL] Converting from MS Access field aliases

[SQL] Implementing an regex filter

[SQL] Performance on writable views

Re: [SQL] Database synchronization

Re: [SQL] Implementing an regex filter

5 matches

Site Navigation

Mail list logo

Footer information