Re: [SQL] Outer join statement ?
Hello,
There are 2 ways of doing outer joins, both are relatively expensive
from the CPU time point of view
1. Create a function that returns the ojined value, or null if the value
does not exists in the secod table
2. Use a union beetwen a left join and a straight select with a where
condition. Next there's an example inspired from the Adison Westley book (I
suggest you grab a copy from the postgresql site. (/docs)
select customer.name,salesorder.order_id from customer join salesorder
on customer.cust_id=salesorder.fk_cust_id union all select
customer.name,null::int4 from customer where customer.cust_id not in (select
fk_cust_id from salesorder);
Probably an improvement here is to use exists on the second select, but
I didn't have the time to document.
Anyway, this will not work embedded in a view ("union all" is not
supported in views).
Alexander
"Nasdaq" <[EMAIL PROTECTED]> wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> The statement "Outer Join" is not available in Postgres ?
> Is there an extension provided by a third company that implement this
> functionnality ?
>
> If not do you know when this functionnality will be available in
> Postgres ?
>
> Thank
>
> Fred
>
> --
> TechBourse : le premier site en FRANCAIS dédié le NASDAQ
> http://www.techbourse.com
[SQL] optimal performance for inserts
Thanks for your suggestions, though I've already considered most of them. (I have a detailed reply below, interleaved with your mail). I am considering an option but would need help from somebody who knows how the backend works to be able to figure out if any of the following options would help. Consider the scenario of a database with say 3 tables, and atleast 3 concurrent writers to all the tables inserting different records. Which of the three options would be expected to perform better ? (I am using JDBC, I dont know if that is relevant) 1. Having a different Connection per writer 2. Having a different Connection per table 3. Having a single Connection which performs the 3 transactions sequentially. I was trying out some tests to decide between option 1 & option 2 , but did not get any conclusive results. Would be helpful to get some suggestions on the same. Thanks, Rini --- Mitch Vincent <[EMAIL PROTECTED]> wrote: > Removing indexes will speed up the INSERT portion > but slow down the SELECT > portion. I cannot remove indexes since there may be other queries to these tables at the same time when I am doing the inserts. > Just an FYI, you can INSERT into table (select > whatever from another > table) -- you could probably do what you need in a > single query (but would > also probably still have the speed problem). I have not spent time on it but I could not figure out how to have an insert statement such that one of the attributes (only) is a result of a select from another table. I would be interested in knowing if there is a way to do that. > Have you EXPLAINed the SELECT query to see if index > scans are being used > where possible? Yes, the index scans are being used > -Mitch > > - Original Message - > From: "Rini Dutta" <[EMAIL PROTECTED]> > To: <[EMAIL PROTECTED]> > Cc: <[EMAIL PROTECTED]> > Sent: Friday, August 25, 2000 12:20 PM > Subject: [SQL] queries and inserts > > > > Hi, > > > > I am interested in how to speed up storage. About > 1000 > > or more inserts may need to be performed at a time > , > > and before each insert I need to look up its key > from > > the reference table. So each insert is actually a > > query followed by an insert. > > > > The tables concerned are : > > CREATE TABLE referencetable(idx serial, rcol1 int4 > NOT > > NULL, rcol2 int4 NOT NULL, rcol3 varchar(20) NOT > > NULL, rcol4 varchar(20), PRIMARY KEY(idx) ... > > CREATE INDEX index_referencetable on > > referencetable(rcol1, rcol2, rcol3, rcol4); > > > > CREATE TABLE datatable ( ref_idx int4, > > start_date_offset int4 NOT NULL, stop_date_offset > int4 > > NOT NULL, dcol4 float NOT NULL, dcol5 float NOT > NULL, > > PRIMARY KEY(ref_idx, start_date_offset), > CONSTRAINT c1 > > FOREIGN KEY(ref_idx) REFERENCES > referencetable(idx) ); > > > > I need to do the following sequence n number of > times > > - > > 1. select idx (as key) from referencetable where > > col1=c1 and col2=c2 and col3=c3 and col4=c4; > (Would an > > initial 'select into temptable' help here since > for a > > large number of these queries 'c1' and 'c2' > > comnbinations would remain constant ?) > > 2. insert into datatable values(key, ); > > > > I am using JDBC interface of postgresql-7.0.2 on > > Linux. 'referencetable' has about 1000 records, it > can > > keep growing. 'datatable' has about 3 million > records, > > it would grow at a very fast rate. Storing 2000 > > records takes around 75 seconds after I vacuum > > analyze. (before that it took around 40 seconds - > ???) > > . I am performing all the inserts ( including the > > lookup) as one transaction. > > > > Thanks, > > Rini > > > > > > __ > > Do You Yahoo!? > > Yahoo! Mail - Free email you can access from > anywhere! > > http://mail.yahoo.com/ > > > __ Do You Yahoo!? Yahoo! Mail - Free email you can access from anywhere! http://mail.yahoo.com/
Re: [SQL] RE: Create table in functions
Andreas Tille wrote: > On Thu, 24 Aug 2000, Hiroshi Inoue wrote: > > > Hmm,Andreas's original function seems to contain other statements. > > If the function contains DML statements for the table Temp_Num_Table, > > it wouldn't work properly. i.e 1st call would work but 2nd call woudn't. > That's the problem. I'm in the process of porting a set of about > 30 Stroed Procedures from MS-SQL server to PostgreSQL and have > just managed only 3 :-( because of some "This is not possible with > PostgreSQL". Could you try out the current CVS sources? I've added an EXECUTE keyword to PL/pgSQL earlier - er - yesterday. It should give you the power to at least port the proc below to PostgreSQL. > To make clear what I want to do, I just copy the original stored > procedure, which might be not so clever so that it would be possible > to go without the table creation: Looks like the only purpose of the (temp) table is to hold exactly one single row of one attribute - the result of the count(*). From the code I guess that MS-SQL's only way to build up a dynamic query is to put it as a string into a variable and call Exec(@var). Maybe it's impossible to get a return value out of that at the same time, so the only solution is to throw it into a temp table and pick it up again. But anyway, the dynamic part of building the qualification (WHERE clause) was the initial problem. And that should be solved in CURRENT PostgreSQL sources (and get shipped with 7.1). Read on below... > > > CREATE Procedure TilleA.pHelpCountNames > ( @Arbeitsgruppeshort varchar(255) , > @Condition varchar(255) > ) > /* Count names in table Mitarbeiter which belong to the group >Arbeitsgruppeshort and match the condition @Condition > */ > As > Declare @Query varchar(1024) > Declare @num int > Select @num = 0 > > if @ArbeitsgruppeShort is NULL or @ArbeitsGruppeShort = '' begin > Select @ArbeitsgruppeShort = '' > end else begin > Select @ArbeitsgruppeShort = ' AND a.ArbeitsgruppeShort = ' + + >@ArbeitsgruppeShort + > end > > Create Table #NumTable > ( >Num integer > ) > > Select @Query = > 'Insert Into #NumTable (Num) ' + > 'SELECT Count (*) ' + > 'FROM Mitarbeiter m ' + > 'INNER JOIN tm2nMitarbeiterArbeitsgruppe t ON m.IdMitarbeiter = >t.IdMitarbeiter ' + > 'INNER JOIN Arbeitsgruppe aON t.IdArbeitsgruppe = >a.IdArbeitsgruppe ' + > 'WHERE ' + @Condition + @ArbeitsgruppeShort > Exec (@Query) > > Select @num=(Select Max(Num) From #NumTable) > > Drop Table #NumTable > > return @num > > > May be there are some clever ways to avoid the temporary table. > I really wonder if my solution is in fact very clever because I'm unsure > if it is thread-safe. > > Any hints? Let me give it a try: CREATE FUNCTION pHelpCountNames (text, text) RETURNS integer AS ' DECLARE ag_short ALIAS FOR $1; condition ALIAS FOR $2; query text; rec record; resultinteger; BEGIN query := ''SELECT count(*) AS n FROM Mitarbeiter m, tm2nMistarbeiterArbeitsgruppe t, Arbeitsgruppe a WHERE m.IdMitarbeiter = t.IdMitarbeiter AND t.IdArbeitsgruppe = a.IdArbeitsgruppe''; IF condition != THEN query := query || '' AND '' || condition; END IF; IF ag_short != THEN query := query || '' AND a.ArbeitsgruppeShort = '' || ag_short || ; END IF; FOR rec IN EXECUTE query LOOP result = rec.n; END LOOP; RETURN result; END; ' LANGUAGE 'plpgsql'; Just written down. Too lazy to test it right now. BTW: It is not strictly neccessary to build up the querystring for EXECUTE in a variable. Any expression resulting in a string (a function call or a concatenation of strings, variables and functions) is allowed. EXECUTE outside of a FOR statement simply executes the query and that should not be a SELECT. This can be a utility statement as well, so feel free to create and drop "temp" tables as needed. Used as above (inside the FOR) it must be a SELECT statement. I haven't checked if rows found, because that count(*) query is guaranteed to return either one row, or bomb out with a transaction abort. Usually someone should check. Enjoy. Jan -- #==# # It's easier to get forgiveness for being wrong than for being ri
