date:20020626

[SQL] Quickest way to insert unique records?

2002-06-26 Thread Ian Cass


Hi,

I've got a number of files containing generic log data & some of the lines
may or may not be duplicated across files that I'm feeding into a database
using Perl DBI. I'm just ignoring any duplicate record errors. This is fine
for day to day running when the data feeds in at a sensible rate, however,
if I wanted to feed in a load of old data in a short space of time, this
solution simply is not quick enough.

I can modify the feeder script to generate formated CSV files that I can
then COPY into the database into a temporary table. However, I'll then need
to select each record from the temporary table and insert into the main
table, omitting duplicates.

I guess I'd need something like this

INSERT INTO messages (host, messageid, body, and, loads, more)
SELECT host, messageid, body, and, loads, more
FROM messages_tmp ;

However, when that hit a duplicate, it would fail wouldn't it?

Also, would this actually be any quicker than direct insertion from Perl
DBI?

--
Ian Cass






---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

[SQL] 2 Selects 1 is faster, why?

2002-06-26 Thread Eric


If I perform the following 2 selects, the first one is EXTREMELY slow where
the 2nd one is very fast.

(1) Slow

select
  o.orderid,
  ol.itemcode,
  ol.itemname

from
  orders o,
  orlines ol

where
  o.orderid = '1234' and
  ol.orderid = o.orderid;

(2) VERY FAST

select
  o.orderid,
  ol.itemcode,
  ol.itemname

from
  orders o,
  orlines ol

where
  o.orderid = '1234' and
  ol.orderid = '1234'

Why would 2 be so much faster?  I have ran the EXPLAIN on this and index
scans are being used.

NOTE: The actual queries return more information than this, but the
fundamental change shown above seems to give me the instant response I am
looking for. (1) takes about 60 seconds to run and (2) takes 3-5 seconds to
run.

Thanks, Eric





---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[SQL] Performance Ideas

2002-06-26 Thread Eric


I have a SQL which uses a function for one of the returned rows.  This
stored function does calculations that are expensive & slow.  I am looking
for ways to speed up this query but having no luck.

Any SQL geniuses out there help me with this?

select
  o.orderid,
 ol.itemcode,
 ol.itemname,
 ol.uom,
 qty_available( ol.itemcode, ol.uom ) as "Qty On Hand"

from
  orders o,
  orderlines ol,

where
  o.status = 'OPEN' and
  ol.orderid = o.orderid and
  qty_onhand( ol.itemcode, ol.uom ) > 0;

The function, qty_onhand, calculates the Qty on hand and returns a value in
units of measure passed (ol.uom).  This function is an expensive function to
use -- degrades performance.  With out the function in the WHERE or SELECT
clause, performances is acceptable.

I get marginally better performance if I "select into temporary table"
without the function and then run a query on the temporary table which
includes the qty_onhand function.

I am trying to present the user with a list of open orders that are "READY"
to be fulfilled which requires me to do a "stock level check."

My fall back solution is to make the user enter some pre-query information
like the orderid she is trying to ship against but my customer really likes
the current view they have which shows all open orders that are READY to be
fulfilled.


Any ideas??!?!?!  Tricks of the trade?!?!?!

Also, side note, I tried creating views assuming PostgreSQL would optimize
the view after a vacuum but it does not.  Also, the function seems faster in
the temporary table, why?  Why wouldn't the funciton only evaluate values
that match the first 2 criteria (OPEN and ol.orderid = o.orderid)?  It's as
if the qty_onhand is evaluating ALL records in the orderlines (ol) table.

Thanks , Eric





---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [SQL] Limiting database size

2002-06-26 Thread Eric


I like the idea of putting it on a hard disk or partition of fixed size and
waiting for the DB to simply crash.
hahaha

"Josh Berkus" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]...
> Mauricio,
>
> > Hi,  is there any way to limit the database size??
>
> First, this is a question for PGSQL-NOVICE or PGSQL-GENERAL, not this
mailing
> list.
>
> Second -- sort of.  You can put the database on its own hard drive
partition.
> Then the database will crash when it runs out of space -- so I'm not sure
> that helps you.
>
> Of course, with other RDBMSs, which have administrative settings for size
> limits, the database shuts down when you're out of space.  So I'm not sure
> how useful the whole idea is.
>
> --
> -Josh Berkus
>
> __AGLIO DATABASE SOLUTIONS___
> Josh Berkus
>Complete information technology [EMAIL PROTECTED]
> and data management solutions (415) 565-7293
>for law firms, small businesses fax 621-2533
> and non-profit organizations. San Francisco
>
>
> ---(end of broadcast)---
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to [EMAIL PROTECTED] so that your
> message can get through to the mailing list cleanly
>





---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [SQL] what is the difference between default 0 vs default '0'

2002-06-26 Thread Lee Harr


In article <000c01c21cab$c2f35ef0$6901a8c0@bethvizx>, "Beth Gatewood" wrote:
> all-
> Could somebody tell me why I would use default 0 vs default '0' in the
> following
> CREATE TABLE foo (col1 INTEGER default 0) <-- or default '0'
> 

0 is an integer
'0' is a string

default '0' might work (through type coercion)
but I think default 0 is more clear.




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

[SQL] sequence chages after firing update

2002-06-26 Thread Subhashini Karthikeyan


hi all


In postgresql 7.1.3

i am updateing a row. it is a 4th record.
after updation if i am firing a select query it is
coming as a last record ..what shall i do to avoid
that..
any help appriciated

thankz in advance

regards
subha


__
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [SQL] Performance Ideas

2002-06-26 Thread Tom Lane

"Eric" <[EMAIL PROTECTED]> writes:
> select
>   o.orderid,
>  ol.itemcode,
>  ol.itemname,
>  ol.uom,
>  qty_available( ol.itemcode, ol.uom ) as "Qty On Hand"
> from
>   orders o,
>   orderlines ol,
> where
>   o.status = 'OPEN' and
>   ol.orderid = o.orderid and
>   qty_onhand( ol.itemcode, ol.uom ) > 0;

> It's as if the qty_onhand is evaluating ALL records in the orderlines
> (ol) table.

Yeah, it probably is.  Given that WHERE condition the planner will try to
use the "qty_onhand( ol.itemcode, ol.uom ) > 0" clause as a restriction
on "ol" in advance of the join.  Since the planner has no idea that
qty_onhand() is an expensive function, this is a reasonable choice.

Can you restructure things so that the qty_onhand clause uses some value
from "o" as well as "ol"?  A really grotty way would be to just give
qty_onhand a dummy third parameter and write

qty_onhand( ol.itemcode, ol.uom, o.whatever ) > 0;

but maybe you have a less obscure alternative available.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [SQL] sequence chages after firing update

2002-06-26 Thread Stephan Szabo

On Wed, 26 Jun 2002, Subhashini Karthikeyan wrote:

> In postgresql 7.1.3
>
> i am updateing a row. it is a 4th record.
> after updation if i am firing a select query it is
> coming as a last record ..what shall i do to avoid
> that..
> any help appriciated

If I understand the complaint, use an order by to
force an order on the select.  You can't guarantee the
order rows are returned to you unless you do (different
plans might give you rows in different orders anyway)

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [SQL] sequence chages after firing update

2002-06-26 Thread Rajesh Kumar Mallah.


dear subha,

Use explicit ORDER BY if u want to order the records
by some column.

otherwise the order of output from a select stmt is undefined.

bu generally it is found the the last updated record comes last.

On Wednesday 26 June 2002 17:17, Subhashini Karthikeyan wrote:
> hi all
>
>
> In postgresql 7.1.3
>
> i am updateing a row. it is a 4th record.
> after updation if i am firing a select query it is
> coming as a last record ..what shall i do to avoid
> that..
> any help appriciated
>
> thankz in advance
>
> regards
> subha
>
>
> __
> Do You Yahoo!?
> Yahoo! - Official partner of 2002 FIFA World Cup
> http://fifaworldcup.yahoo.com
>
>
>
> ---(end of broadcast)---
> TIP 4: Don't 'kill -9' the postmaster

-- 
Rajesh Kumar Mallah,
Project Manager (Development)
Infocom Network Limited, New Delhi
phone: +91(11)6152172 (221) (L) ,9811255597 (M)

Visit http://www.trade-india.com ,
India's Leading B2B eMarketplace.





---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

[SQL] Efficient Query For Mapping IP Addresses To Country Code.

2002-06-26 Thread Rajesh Kumar Mallah.


Hi folks,

the problem is to update one table by querying another.

i have a table where i store apache access logs where one of the fields is the host ip 
address.
i need to find corresponding country for all the ip addrresses.

for this i have another table that contains apnic,arin and ripe databases
in the form of:

   Table "ip_country_map"
  Column  | Type | Modifiers
--+--+---
 start_ip | inet |
 end_ip   | inet |
 country  | character(2) |
Indexes: end_ip_idx,
 start_ip_idx

I  need to update the accees log's country field by
searching the ip in ip_country_map for country.
i have follwing three alternatives , all seems to be slow.

1 st. (based on implicit join)
-
explain UPDATE access_log_2002_06_25 set country=ip_country_map.country where host_ip 
between
 ip_country_map.start_ip and ip_country_map.end_ip;

Nested Loop  (cost=0.00..1711037.55 rows=5428333 width=563)
  ->  Seq Scan on ip_country_map  (cost=0.00..1112.55 rows=48855 width=70)
  ->  Seq Scan on access_log_2002_06_25  (cost=0.00..20.00 rows=1000 width=493)
---

2nd (based on subselect)
---
explain UPDATE access_log_2002_06_25 set country=(select country from ip_country_map 
where access_log_2002_06_25.host_ip 
between start_ip and end_ip);
NOTICE:  QUERY PLAN:
Seq Scan on access_log_2002_06_25  (cost=0.00..20.00 rows=1000 width=493)
  SubPlan
->  Seq Scan on ip_country_map  (cost=0.00..1356.83 rows=5428 width=6)

EXPLAIN


3 rd (do not update country field at all just join both the table)

explain SELECT  host_ip,ip_country_map.country from access_log_2002_06_25 join 
ip_country_map on
( host_ip between start_ip and end_ip) ;
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..1711037.55 rows=5428333 width=102)
  ->  Seq Scan on ip_country_map  (cost=0.00..1112.55 rows=48855 width=70)
  ->  Seq Scan on access_log_2002_06_25  (cost=0.00..20.00 rows=1000 width=32)

EXPLAIN


Yet Another option
--
while loading access_log from file into database i do a select on ip_country_map.


also even a simple query like do not use indexes.

access_log=# explain  SELECT  country from ip_country_map where start_ip <= 
'203.196.129.1' and end_ip >= '203.196.129.1';
NOTICE:  QUERY PLAN:

Seq Scan on ip_country_map  (cost=0.00..1356.83 rows=5428 width=6)

EXPLAIN
access_log=# explain SELECT  country from ip_country_map where '203.196.129.1' between 
start_ip and end_ip;
NOTICE:  QUERY PLAN:

Seq Scan on ip_country_map  (cost=0.00..1356.83 rows=5428 width=6)

EXPLAIN
access_log=#

IS THERE ANYTHING woring with my database schema?
how shud i be storing the the data of ipranges and
country for efficient utilization in this problem.




regds

Rajesh Kumar Mallah,
Project Manager (Development)
Infocom Network Limited, New Delhi
phone: +91(11)6152172 (221) (L) ,9811255597 (M)

Visit http://www.trade-india.com ,
India's Leading B2B eMarketplace.





---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster

Re: [SQL] 2 Selects 1 is faster, why?

2002-06-26 Thread Alvar Freude


Hi,

-- Eric <[EMAIL PROTECTED]> wrote:

> If I perform the following 2 selects, the first one is EXTREMELY slow
> where the 2nd one is very fast.

[...] 

> Why would 2 be so much faster?  I have ran the EXPLAIN on this and index
> scans are being used.

I guess, the first query has to search for all ol.orderid the equivalent
o.orderid; the second variant only has to search for '1234' in each
?.orderid, which is much faster.

Explizit joins should speed up this!


> NOTE: The actual queries return more information than this, but the
> fundamental change shown above seems to give me the instant response I am
> looking for. (1) takes about 60 seconds to run and (2) takes 3-5 seconds
> to run.

3-5 seconds seems very long to me, if indexes are used and the result is
not a set of thousands of rows; are you sure?


Ciao
  Alvar
 

-- 
// Unterschreiben!  http://www.odem.org/informationsfreiheit/
// Internet am Telefon: http://www.teletrust.info/
// Das freieste Medium? http://www.odem.org/insert_coin/
// Blaster: http://www.assoziations-blaster.de/




---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

[SQL] Possibility of Index-Only access in PostgreSQL?

2002-06-26 Thread Gunther Schadow


Hello,

we have a simple query here that uses a table such as Foo(id*, bar, baz)
were id is a primary key and then we have an additional index on
Foo(bar, baz). Now, we have a simple query such as this:

SELECT bar, baz FROM Foo WHERE bar='HIT';

The Foo table has about 25 million rows and the above query selects 35000
rows.

It takes 10 minutes, that's the problem.

Yes, it's doing an index scan according to EXPLAIN, and yes VACUUM
ANALYZE is done, the machine is decent (dual 1 GHz i*86 processor,
2 GB RAM doing software RAID 0-1 under Linux.) The shared memory
buffer is 512 MB and there is no significant swapping activity.
The index and the table are on the same spindle, yes, but that's
nothing I can change right now.

I found that this above query runs in about 8 second if it's a rerun.
Presumably our large cache then contains most of the index if not also
the data rows. Apparently I am stuck in an I/O bottleneck.

So, what can I do? Again I can't touch the hardware for now. Besides,
we are running this same query on Oracle (on an already aging Alpha
mini) and it consistently runs in 17 seconds. May be there is also
caching involved, but there is something Oracle does better about it.

Two things come into mind:

- Oracle is content with data read from an index if that is all that's
   needed. Indeed, I only need bar and baz and those are in the index.
   The benefits of not having to seek the scattered rows from the data
   table saves 35000 back and forth head movements / seeks.

- Even if we had to go to the data rows, could it be that if a good
   big chunk of the index would be read ahead and buffered in one piece
   and then iterated over to retrieve the data rows without making the
   r/w head jump back and forth to and from the index would be good.

   Is PostgreSQL doing this index read-ahead? Is there a tuning variable
   I failed to set to make it read-ahead more?

I presume that PostgreSQL does good reading-ahead of the index, right?

But I heard it could not do index-only reads. Now, since I have several
of these problems (each time with different indexes) and our Oracle side
of the project uses index-only reads a lot, I would be very eager to have
index-only reads in PostgreSQL.

Naively it doesn't sound like a hard problem to implement in PostgreSQL,
does it? All we would need to do is get the data off the index that we
already read, and not do the row lookup. Probably the optimizer has
to make different plans here, so the main work will probably have to be
in the optimizer code. And in addition it might happen that some lower
level constraint exists in the executor who need to go to the rows for
some reason. In the worst case the index does not contain all data but
only just enough of a portion that distinguishes rows. That would then
require to rewrite all index code to have complete data in indexes.
But I'm only naively conjecturing.

So, please advise, what are the real barriers to this kind of
optimization? Is there anything I could do to help the progress on
this issue. (I'm generally capable of hacking certain features into
existing code if there isn't some subtle logic involved that I do not 
understand, because it isn't documented.)

I appreciate your help,
thank you,
-Gunther


-- 
Gunther Schadow, M.D., Ph.D.[EMAIL PROTECTED]
Medical Information Scientist  Regenstrief Institute for Health Care
Adjunct Assistant ProfessorIndiana University School of Medicine
tel:1(317)630-7960 http://aurora.regenstrief.org





---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [SQL] Limiting database size

2002-06-26 Thread Josh Berkus


Eric,

> I like the idea of putting it on a hard disk or partition of fixed size and
> waiting for the DB to simply crash.
> hahaha

Yeah. It's what MS SQL Server does, though.   As I said, I think the 
whole concept of limiting database size in MB is fundamentally flawed.   I 
mean, what's the database supposed to do when it runs out of space?

-Josh




---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [SQL] Limiting database size

2002-06-26 Thread terry



>>  Yeah. It's what MS SQL Server does, though.   As I
>> said, I think the whole concept of limiting database size in
>> MB is fundamentally flawed.   I mean, what's the database
>> supposed to do when it runs out of space?

Display a window that says:

"Management has determined that this database shall not exceed 
XXXmb.  If your business function requires more disk space - take 
it up with your manager."

Limiting database size is not a problem for developers, unless 
they are unable/unwilling to explain to their management that 
disks are a whole lot less expensive than terminating a database  
application that depends them.

But then, M$ never takes any responsibility for the amount of 
disk space it wastes.


terry 



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html

Re: [SQL] Possibility of Index-Only access in PostgreSQL?

2002-06-26 Thread Tom Lane

Gunther Schadow <[EMAIL PROTECTED]> writes:
> I would be very eager to have
> index-only reads in PostgreSQL.

It won't happen.  See the many, many prior discussions of this point in
the archives.

Something we perhaps *could* do is to "batch" index reads: pull a bunch
of TIDs from the index, sort these by page number, and then access the
corresponding heap tuples in page number order.  Unfortunately you'd
probably need to batch some thousands of TIDs to really improve the
locality of reference this way, and that would mean holding pins on
quite a few index pages, which would complicate the bookkeeping and
probably create a severe concurrency hit (if not actually introduce a
risk of deadlock; I haven't thought hard enough about it to be sure).
We could avoid these potential downsides by only sorting one index
page's worth of TIDs at a time, but most likely that doesn't buy enough
locality of reference to be worth the trouble.

Still it seems like a useful avenue to investigate.  If you want to
pursue it further, this is not the list to be discussing it on;
pgsql-hackers is the place for such discussions.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: [SQL] Limiting database size

2002-06-26 Thread Thomas Swan





Josh Berkus wrote:

  Eric,
  
I like the idea of putting it on a hard disk or partition of fixed size and
waiting for the DB to simply crash.
hahaha

  
  Yeah. It's what MS SQL Server does, though.   As I said, I think the 
whole concept of limiting database size in MB is fundamentally flawed.   I 
mean, what's the database supposed to do when it runs out of space?

Refuse updates, deletes, and inserts (from anything other than a DB superuser).
 However you should be able to select data.  Once the database has exceeded
its bounds, it would revert to read only status and wait for administrative
intervention such as enlarging the quota (allocatable size) or altering the
database to reduce its size.

Adding the checks wouldn't break existing databases if you defaulted the
checks to off.

This could also significantly help in hosting situations by preventing a
client for exceeding a specified quota and taking over the allocated area
for the postgres database.

Thinking about it, it seems like a good safety feature (aside from good partitioning
and system layout to begin with).

[SQL] pg_restore cannot restore function

2002-06-26 Thread Jie Liang


I use
pg_dump -Fc mydb > dbf
then I create anpther db by:
createdb mydb2
I use
pg_restore -P myfunction -d mydb2 dbf

cannot restore myfunction into mydb2


why??



Jie Liang



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

Re: [SQL] Possibility of Index-Only access in PostgreSQL?

2002-06-26 Thread Christopher Kings-Lynne


> - Oracle is content with data read from an index if that is all that's
>needed. Indeed, I only need bar and baz and those are in the index.
>The benefits of not having to seek the scattered rows from the data
>table saves 35000 back and forth head movements / seeks.

Postgres can't use the data from its indices, because of the MVCC mechanism.
Maybe you should try CLUSTERing your table to improve performance???

Chris





---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[SQL] Quickest way to insert unique records?

[SQL] 2 Selects 1 is faster, why?

[SQL] Performance Ideas

Re: [SQL] Limiting database size

Re: [SQL] what is the difference between default 0 vs default '0'

[SQL] sequence chages after firing update

Re: [SQL] Performance Ideas

Re: [SQL] sequence chages after firing update

Re: [SQL] sequence chages after firing update

[SQL] Efficient Query For Mapping IP Addresses To Country Code.

Re: [SQL] 2 Selects 1 is faster, why?

[SQL] Possibility of Index-Only access in PostgreSQL?

Re: [SQL] Limiting database size

Re: [SQL] Limiting database size

Re: [SQL] Possibility of Index-Only access in PostgreSQL?

Re: [SQL] Limiting database size

[SQL] pg_restore cannot restore function

Re: [SQL] Possibility of Index-Only access in PostgreSQL?

18 matches

Site Navigation

Mail list logo

Footer information