[SQL] Quickest way to insert unique records?
Hi, I've got a number of files containing generic log data & some of the lines may or may not be duplicated across files that I'm feeding into a database using Perl DBI. I'm just ignoring any duplicate record errors. This is fine for day to day running when the data feeds in at a sensible rate, however, if I wanted to feed in a load of old data in a short space of time, this solution simply is not quick enough. I can modify the feeder script to generate formated CSV files that I can then COPY into the database into a temporary table. However, I'll then need to select each record from the temporary table and insert into the main table, omitting duplicates. I guess I'd need something like this INSERT INTO messages (host, messageid, body, and, loads, more) SELECT host, messageid, body, and, loads, more FROM messages_tmp ; However, when that hit a duplicate, it would fail wouldn't it? Also, would this actually be any quicker than direct insertion from Perl DBI? -- Ian Cass ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[SQL] 2 Selects 1 is faster, why?
If I perform the following 2 selects, the first one is EXTREMELY slow where the 2nd one is very fast. (1) Slow select o.orderid, ol.itemcode, ol.itemname from orders o, orlines ol where o.orderid = '1234' and ol.orderid = o.orderid; (2) VERY FAST select o.orderid, ol.itemcode, ol.itemname from orders o, orlines ol where o.orderid = '1234' and ol.orderid = '1234' Why would 2 be so much faster? I have ran the EXPLAIN on this and index scans are being used. NOTE: The actual queries return more information than this, but the fundamental change shown above seems to give me the instant response I am looking for. (1) takes about 60 seconds to run and (2) takes 3-5 seconds to run. Thanks, Eric ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[SQL] Performance Ideas
I have a SQL which uses a function for one of the returned rows. This stored function does calculations that are expensive & slow. I am looking for ways to speed up this query but having no luck. Any SQL geniuses out there help me with this? select o.orderid, ol.itemcode, ol.itemname, ol.uom, qty_available( ol.itemcode, ol.uom ) as "Qty On Hand" from orders o, orderlines ol, where o.status = 'OPEN' and ol.orderid = o.orderid and qty_onhand( ol.itemcode, ol.uom ) > 0; The function, qty_onhand, calculates the Qty on hand and returns a value in units of measure passed (ol.uom). This function is an expensive function to use -- degrades performance. With out the function in the WHERE or SELECT clause, performances is acceptable. I get marginally better performance if I "select into temporary table" without the function and then run a query on the temporary table which includes the qty_onhand function. I am trying to present the user with a list of open orders that are "READY" to be fulfilled which requires me to do a "stock level check." My fall back solution is to make the user enter some pre-query information like the orderid she is trying to ship against but my customer really likes the current view they have which shows all open orders that are READY to be fulfilled. Any ideas??!?!?! Tricks of the trade?!?!?! Also, side note, I tried creating views assuming PostgreSQL would optimize the view after a vacuum but it does not. Also, the function seems faster in the temporary table, why? Why wouldn't the funciton only evaluate values that match the first 2 criteria (OPEN and ol.orderid = o.orderid)? It's as if the qty_onhand is evaluating ALL records in the orderlines (ol) table. Thanks , Eric ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [SQL] Limiting database size
I like the idea of putting it on a hard disk or partition of fixed size and waiting for the DB to simply crash. hahaha "Josh Berkus" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED]... > Mauricio, > > > Hi, is there any way to limit the database size?? > > First, this is a question for PGSQL-NOVICE or PGSQL-GENERAL, not this mailing > list. > > Second -- sort of. You can put the database on its own hard drive partition. > Then the database will crash when it runs out of space -- so I'm not sure > that helps you. > > Of course, with other RDBMSs, which have administrative settings for size > limits, the database shuts down when you're out of space. So I'm not sure > how useful the whole idea is. > > -- > -Josh Berkus > > __AGLIO DATABASE SOLUTIONS___ > Josh Berkus >Complete information technology [EMAIL PROTECTED] > and data management solutions (415) 565-7293 >for law firms, small businesses fax 621-2533 > and non-profit organizations. San Francisco > > > ---(end of broadcast)--- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to [EMAIL PROTECTED] so that your > message can get through to the mailing list cleanly > ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [SQL] what is the difference between default 0 vs default '0'
In article <000c01c21cab$c2f35ef0$6901a8c0@bethvizx>, "Beth Gatewood" wrote: > all- > Could somebody tell me why I would use default 0 vs default '0' in the > following > CREATE TABLE foo (col1 INTEGER default 0) <-- or default '0' > 0 is an integer '0' is a string default '0' might work (through type coercion) but I think default 0 is more clear. ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
[SQL] sequence chages after firing update
hi all In postgresql 7.1.3 i am updateing a row. it is a 4th record. after updation if i am firing a select query it is coming as a last record ..what shall i do to avoid that.. any help appriciated thankz in advance regards subha __ Do You Yahoo!? Yahoo! - Official partner of 2002 FIFA World Cup http://fifaworldcup.yahoo.com ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [SQL] Performance Ideas
"Eric" <[EMAIL PROTECTED]> writes: > select > o.orderid, > ol.itemcode, > ol.itemname, > ol.uom, > qty_available( ol.itemcode, ol.uom ) as "Qty On Hand" > from > orders o, > orderlines ol, > where > o.status = 'OPEN' and > ol.orderid = o.orderid and > qty_onhand( ol.itemcode, ol.uom ) > 0; > It's as if the qty_onhand is evaluating ALL records in the orderlines > (ol) table. Yeah, it probably is. Given that WHERE condition the planner will try to use the "qty_onhand( ol.itemcode, ol.uom ) > 0" clause as a restriction on "ol" in advance of the join. Since the planner has no idea that qty_onhand() is an expensive function, this is a reasonable choice. Can you restructure things so that the qty_onhand clause uses some value from "o" as well as "ol"? A really grotty way would be to just give qty_onhand a dummy third parameter and write qty_onhand( ol.itemcode, ol.uom, o.whatever ) > 0; but maybe you have a less obscure alternative available. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [SQL] sequence chages after firing update
On Wed, 26 Jun 2002, Subhashini Karthikeyan wrote: > In postgresql 7.1.3 > > i am updateing a row. it is a 4th record. > after updation if i am firing a select query it is > coming as a last record ..what shall i do to avoid > that.. > any help appriciated If I understand the complaint, use an order by to force an order on the select. You can't guarantee the order rows are returned to you unless you do (different plans might give you rows in different orders anyway) ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [SQL] sequence chages after firing update
dear subha, Use explicit ORDER BY if u want to order the records by some column. otherwise the order of output from a select stmt is undefined. bu generally it is found the the last updated record comes last. On Wednesday 26 June 2002 17:17, Subhashini Karthikeyan wrote: > hi all > > > In postgresql 7.1.3 > > i am updateing a row. it is a 4th record. > after updation if i am firing a select query it is > coming as a last record ..what shall i do to avoid > that.. > any help appriciated > > thankz in advance > > regards > subha > > > __ > Do You Yahoo!? > Yahoo! - Official partner of 2002 FIFA World Cup > http://fifaworldcup.yahoo.com > > > > ---(end of broadcast)--- > TIP 4: Don't 'kill -9' the postmaster -- Rajesh Kumar Mallah, Project Manager (Development) Infocom Network Limited, New Delhi phone: +91(11)6152172 (221) (L) ,9811255597 (M) Visit http://www.trade-india.com , India's Leading B2B eMarketplace. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[SQL] Efficient Query For Mapping IP Addresses To Country Code.
Hi folks, the problem is to update one table by querying another. i have a table where i store apache access logs where one of the fields is the host ip address. i need to find corresponding country for all the ip addrresses. for this i have another table that contains apnic,arin and ripe databases in the form of: Table "ip_country_map" Column | Type | Modifiers --+--+--- start_ip | inet | end_ip | inet | country | character(2) | Indexes: end_ip_idx, start_ip_idx I need to update the accees log's country field by searching the ip in ip_country_map for country. i have follwing three alternatives , all seems to be slow. 1 st. (based on implicit join) - explain UPDATE access_log_2002_06_25 set country=ip_country_map.country where host_ip between ip_country_map.start_ip and ip_country_map.end_ip; Nested Loop (cost=0.00..1711037.55 rows=5428333 width=563) -> Seq Scan on ip_country_map (cost=0.00..1112.55 rows=48855 width=70) -> Seq Scan on access_log_2002_06_25 (cost=0.00..20.00 rows=1000 width=493) --- 2nd (based on subselect) --- explain UPDATE access_log_2002_06_25 set country=(select country from ip_country_map where access_log_2002_06_25.host_ip between start_ip and end_ip); NOTICE: QUERY PLAN: Seq Scan on access_log_2002_06_25 (cost=0.00..20.00 rows=1000 width=493) SubPlan -> Seq Scan on ip_country_map (cost=0.00..1356.83 rows=5428 width=6) EXPLAIN 3 rd (do not update country field at all just join both the table) explain SELECT host_ip,ip_country_map.country from access_log_2002_06_25 join ip_country_map on ( host_ip between start_ip and end_ip) ; NOTICE: QUERY PLAN: Nested Loop (cost=0.00..1711037.55 rows=5428333 width=102) -> Seq Scan on ip_country_map (cost=0.00..1112.55 rows=48855 width=70) -> Seq Scan on access_log_2002_06_25 (cost=0.00..20.00 rows=1000 width=32) EXPLAIN Yet Another option -- while loading access_log from file into database i do a select on ip_country_map. also even a simple query like do not use indexes. access_log=# explain SELECT country from ip_country_map where start_ip <= '203.196.129.1' and end_ip >= '203.196.129.1'; NOTICE: QUERY PLAN: Seq Scan on ip_country_map (cost=0.00..1356.83 rows=5428 width=6) EXPLAIN access_log=# explain SELECT country from ip_country_map where '203.196.129.1' between start_ip and end_ip; NOTICE: QUERY PLAN: Seq Scan on ip_country_map (cost=0.00..1356.83 rows=5428 width=6) EXPLAIN access_log=# IS THERE ANYTHING woring with my database schema? how shud i be storing the the data of ipranges and country for efficient utilization in this problem. regds Rajesh Kumar Mallah, Project Manager (Development) Infocom Network Limited, New Delhi phone: +91(11)6152172 (221) (L) ,9811255597 (M) Visit http://www.trade-india.com , India's Leading B2B eMarketplace. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [SQL] 2 Selects 1 is faster, why?
Hi, -- Eric <[EMAIL PROTECTED]> wrote: > If I perform the following 2 selects, the first one is EXTREMELY slow > where the 2nd one is very fast. [...] > Why would 2 be so much faster? I have ran the EXPLAIN on this and index > scans are being used. I guess, the first query has to search for all ol.orderid the equivalent o.orderid; the second variant only has to search for '1234' in each ?.orderid, which is much faster. Explizit joins should speed up this! > NOTE: The actual queries return more information than this, but the > fundamental change shown above seems to give me the instant response I am > looking for. (1) takes about 60 seconds to run and (2) takes 3-5 seconds > to run. 3-5 seconds seems very long to me, if indexes are used and the result is not a set of thousands of rows; are you sure? Ciao Alvar -- // Unterschreiben! http://www.odem.org/informationsfreiheit/ // Internet am Telefon: http://www.teletrust.info/ // Das freieste Medium? http://www.odem.org/insert_coin/ // Blaster: http://www.assoziations-blaster.de/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[SQL] Possibility of Index-Only access in PostgreSQL?
Hello, we have a simple query here that uses a table such as Foo(id*, bar, baz) were id is a primary key and then we have an additional index on Foo(bar, baz). Now, we have a simple query such as this: SELECT bar, baz FROM Foo WHERE bar='HIT'; The Foo table has about 25 million rows and the above query selects 35000 rows. It takes 10 minutes, that's the problem. Yes, it's doing an index scan according to EXPLAIN, and yes VACUUM ANALYZE is done, the machine is decent (dual 1 GHz i*86 processor, 2 GB RAM doing software RAID 0-1 under Linux.) The shared memory buffer is 512 MB and there is no significant swapping activity. The index and the table are on the same spindle, yes, but that's nothing I can change right now. I found that this above query runs in about 8 second if it's a rerun. Presumably our large cache then contains most of the index if not also the data rows. Apparently I am stuck in an I/O bottleneck. So, what can I do? Again I can't touch the hardware for now. Besides, we are running this same query on Oracle (on an already aging Alpha mini) and it consistently runs in 17 seconds. May be there is also caching involved, but there is something Oracle does better about it. Two things come into mind: - Oracle is content with data read from an index if that is all that's needed. Indeed, I only need bar and baz and those are in the index. The benefits of not having to seek the scattered rows from the data table saves 35000 back and forth head movements / seeks. - Even if we had to go to the data rows, could it be that if a good big chunk of the index would be read ahead and buffered in one piece and then iterated over to retrieve the data rows without making the r/w head jump back and forth to and from the index would be good. Is PostgreSQL doing this index read-ahead? Is there a tuning variable I failed to set to make it read-ahead more? I presume that PostgreSQL does good reading-ahead of the index, right? But I heard it could not do index-only reads. Now, since I have several of these problems (each time with different indexes) and our Oracle side of the project uses index-only reads a lot, I would be very eager to have index-only reads in PostgreSQL. Naively it doesn't sound like a hard problem to implement in PostgreSQL, does it? All we would need to do is get the data off the index that we already read, and not do the row lookup. Probably the optimizer has to make different plans here, so the main work will probably have to be in the optimizer code. And in addition it might happen that some lower level constraint exists in the executor who need to go to the rows for some reason. In the worst case the index does not contain all data but only just enough of a portion that distinguishes rows. That would then require to rewrite all index code to have complete data in indexes. But I'm only naively conjecturing. So, please advise, what are the real barriers to this kind of optimization? Is there anything I could do to help the progress on this issue. (I'm generally capable of hacking certain features into existing code if there isn't some subtle logic involved that I do not understand, because it isn't documented.) I appreciate your help, thank you, -Gunther -- Gunther Schadow, M.D., Ph.D.[EMAIL PROTECTED] Medical Information Scientist Regenstrief Institute for Health Care Adjunct Assistant ProfessorIndiana University School of Medicine tel:1(317)630-7960 http://aurora.regenstrief.org ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] Limiting database size
Eric, > I like the idea of putting it on a hard disk or partition of fixed size and > waiting for the DB to simply crash. > hahaha Yeah. It's what MS SQL Server does, though. As I said, I think the whole concept of limiting database size in MB is fundamentally flawed. I mean, what's the database supposed to do when it runs out of space? -Josh ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [SQL] Limiting database size
>> Yeah. It's what MS SQL Server does, though. As I >> said, I think the whole concept of limiting database size in >> MB is fundamentally flawed. I mean, what's the database >> supposed to do when it runs out of space? Display a window that says: "Management has determined that this database shall not exceed XXXmb. If your business function requires more disk space - take it up with your manager." Limiting database size is not a problem for developers, unless they are unable/unwilling to explain to their management that disks are a whole lot less expensive than terminating a database application that depends them. But then, M$ never takes any responsibility for the amount of disk space it wastes. terry ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [SQL] Possibility of Index-Only access in PostgreSQL?
Gunther Schadow <[EMAIL PROTECTED]> writes: > I would be very eager to have > index-only reads in PostgreSQL. It won't happen. See the many, many prior discussions of this point in the archives. Something we perhaps *could* do is to "batch" index reads: pull a bunch of TIDs from the index, sort these by page number, and then access the corresponding heap tuples in page number order. Unfortunately you'd probably need to batch some thousands of TIDs to really improve the locality of reference this way, and that would mean holding pins on quite a few index pages, which would complicate the bookkeeping and probably create a severe concurrency hit (if not actually introduce a risk of deadlock; I haven't thought hard enough about it to be sure). We could avoid these potential downsides by only sorting one index page's worth of TIDs at a time, but most likely that doesn't buy enough locality of reference to be worth the trouble. Still it seems like a useful avenue to investigate. If you want to pursue it further, this is not the list to be discussing it on; pgsql-hackers is the place for such discussions. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [SQL] Limiting database size
Josh Berkus wrote: Eric, I like the idea of putting it on a hard disk or partition of fixed size and waiting for the DB to simply crash. hahaha Yeah. It's what MS SQL Server does, though. As I said, I think the whole concept of limiting database size in MB is fundamentally flawed. I mean, what's the database supposed to do when it runs out of space? Refuse updates, deletes, and inserts (from anything other than a DB superuser). However you should be able to select data. Once the database has exceeded its bounds, it would revert to read only status and wait for administrative intervention such as enlarging the quota (allocatable size) or altering the database to reduce its size. Adding the checks wouldn't break existing databases if you defaulted the checks to off. This could also significantly help in hosting situations by preventing a client for exceeding a specified quota and taking over the allocated area for the postgres database. Thinking about it, it seems like a good safety feature (aside from good partitioning and system layout to begin with).
[SQL] pg_restore cannot restore function
I use pg_dump -Fc mydb > dbf then I create anpther db by: createdb mydb2 I use pg_restore -P myfunction -d mydb2 dbf cannot restore myfunction into mydb2 why?? Jie Liang ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [SQL] Possibility of Index-Only access in PostgreSQL?
> - Oracle is content with data read from an index if that is all that's >needed. Indeed, I only need bar and baz and those are in the index. >The benefits of not having to seek the scattered rows from the data >table saves 35000 back and forth head movements / seeks. Postgres can't use the data from its indices, because of the MVCC mechanism. Maybe you should try CLUSTERing your table to improve performance??? Chris ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])