[PERFORM] How many views is ok?

2005-08-14 Thread Petr Kavan
I have database of company data, and some of them is table of information 
about employees. I need each employee to have access only to his own row. 
Postgre cannot do this by system of privileges, because that can give 
privileges only to whole tables.


Possibility is to create a view for each employee that chooses only his data 
and give employee privileges to this view. But I am not sure if such number 
of views does not have some performance drawbacks or even if postgre can 
support it (I expect i can). I would need several tables protected like this 
and it can result in, say 1000 views in maximum.


Because access to DB will go through PHP information system, other 
possibility to protect data is to let IS connect as more privileged than 
user really is, but let it retrieve only data for that user.


View-approach seems far more clear than this, but im not sure if postgre can 
handle it without problems.


Thanks for any reply :-)

---
Petr Kavan
Database Development



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] Performance pb vs SQLServer.

2005-08-14 Thread Stéphane COEZ
Hi,

I have a perfomance issue :

I run PG (8.0.3) and SQLServer2000 on a Windows2000 Server (P4 1,5Ghz 512Mo)
I have a table (320 rows) and I run this single query :

select cod from mytable group by cod
I have an index on cod (char(4) - 88 different values)

PG = ~ 20 sec.
SQLServer = < 8 sec


the explain is :

HashAggregate  (cost=64410.09..64410.09 rows=55 width=8)
  ->  Seq Scan on mytable  (cost=0.00..56325.27 rows=3233927 width=8)


if I switch to "enable_hashagg = false" (just for a try...)
the planner will choose my index :

Group  (cost=0.00..76514.01 rows=55 width=8)
  ->  Index Scan using myindex on mytable  (cost=0.00..68429.20 rows=3233927
width=8)

but performance will be comparable to previous test.

So with or without using Index I have the same result.


Thanks for help.
 
Stéphane COEZ




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [Fwd: [PERFORM] PG8 Tuning]

2005-08-14 Thread Jignesh Shah
Hi Paul,

I was passed your message... regarding DSS workload with Postgres on Solaris. 
(I am not in the alias).

Performance is relative to your workload. Can you actually send us what you are 
doing in your queries, updates etc?

I have been running few tests myself and here are my rules of thumbs, your 
mileage can vary..

http://blogs.sun.com/roller/page/jkshah?entry=tuning_postgresql_8_0_2

* Increasing checkpoint certainly helps. (I went as far as actually going to 
increase LOGFILE size from 16MB to 256MB and recompiling it and then using 
lower number of checkpoints (appropriately).. (file rotations also decreases 
performance)

* Moving pg_xlog to a different file system and mounting that file system with 
"forcedirectio" also helps a lot (This increases the througput by another 2x to 
5x or more.) (This can be done either by adding forcedirectio in your 
/etc/vfstab mount options or for existing mounts as follows:
mount -o remount,forcedirectio /filesystem
(Note: Database files should not be using forcedirectio otherwise file system 
cache will not be used for it)

* I actually reduced the PG Bufferpool to 1G or less since it seemed to 
decrease performance as I increased its bufferpool size (depending on your 
workload)

* If you are using SPARC then following etc commands will help..

set segmap_percent=60
set ufs:freebehind=0


This will allocate 60% of RAM for file system buffer  (database files) and  
also cache all files (since PostgreSQL files are 1G by default)

This will help your repeat queries significantly. 

Other things depends on what you queries you are running? If you send me few 
samples, I can send you appropriate DTrace scripts (Solaris 10 or higher) to 
run to figure out what's happening

Regards,
Jignesh





Jignesh K. Shah MTS Software Engineer 
Sun Microsystems, Inc   MDE-Horizontal Technologies 
Email: [EMAIL PROTECTED] Phone: (781) 442 3052
http://blogs.sun.com/jkshah


- Original Message -
>From   Paul Johnson <[EMAIL PROTECTED]>
DateThu, 11 Aug 2005 13:23:21 +0100 (BST)
To  pgsql-performance@postgresql.org
Subject [PERFORM] PG8 Tuning
Hi all, we're running PG8 on a Sun V250 with 8GB RAM and 2*1.3GHz SPARC
CPUs running Solaris 10. The DB cluster is on an external fibre-attached
Sun T3 array that has 9*36GB drives configured as a single RAID5 LUN.

The system is for the sole use of a couple of data warehouse developers,
hence we are keen to use 'aggressive' tuning options to maximise
performance.

So far we have made the following changes and measured the impact on our
test suite:

1) Increase checkpoint_segments from 3 to 64. This made a 10x improvement
in some cases.

2) Increase work_mem from 1,024 to 524,288.

3) Increase shared_buffers from 1,000 to 262,143 (2 GB). This required
setting SHMMAX=4294967295 (4 GB) in /etc/system and re-booting the box.

Question - can Postgres only use 2GB RAM, given that shared_buffers can
only be set as high as 262,143 (8K pages)?

So far so good...

4) Move /pg_xlog to an internal disk within the V250. This has had a
severe *negative* impact on performance. Copy job has gone from 2 mins to
12 mins, simple SQL job gone from 1 min to 7 mins. Not even run long SQL
jobs.

I'm guessing that this is because pg_xlog has gone from a 9 spindle LUN to
a single spindle disk?

In cases such as this, where an external storage array with a hardware
RAID controller is used, the normal advice to separate the data from the
pg_xlog  seems to come unstuck, or are we missing something?

Cheers,

Paul Johnson.


--- Begin Message ---
Hi all, we're running PG8 on a Sun V250 with 8GB RAM and 2*1.3GHz SPARC
CPUs running Solaris 10. The DB cluster is on an external fibre-attached
Sun T3 array that has 9*36GB drives configured as a single RAID5 LUN.

The system is for the sole use of a couple of data warehouse developers,
hence we are keen to use 'aggressive' tuning options to maximise
performance.

So far we have made the following changes and measured the impact on our
test suite:

1) Increase checkpoint_segments from 3 to 64. This made a 10x improvement
in some cases.

2) Increase work_mem from 1,024 to 524,288.

3) Increase shared_buffers from 1,000 to 262,143 (2 GB). This required
setting SHMMAX=4294967295 (4 GB) in /etc/system and re-booting the box.

Question - can Postgres only use 2GB RAM, given that shared_buffers can
only be set as high as 262,143 (8K pages)?

So far so good...

4) Move /pg_xlog to an internal disk within the V250. This has had a
severe *negative* impact on performance. Copy job has gone from 2 mins to
12 mins, simple SQL job gone from 1 min to 7 mins. Not even run long SQL
jobs.

I'm guessing that this is because pg_xlog has gone from a 9 spindle LUN to
a single spindle disk?

In cases such as this, where an external storage array with a hardware
RAID controller is used, the normal advice to se

Re: [PERFORM] How many views is ok?

2005-08-14 Thread John Arbash Meinel
Petr Kavan wrote:

> I have database of company data, and some of them is table of
> information about employees. I need each employee to have access only
> to his own row. Postgre cannot do this by system of privileges,
> because that can give privileges only to whole tables.
>
> Possibility is to create a view for each employee that chooses only
> his data and give employee privileges to this view. But I am not sure
> if such number of views does not have some performance drawbacks or
> even if postgre can support it (I expect i can). I would need several
> tables protected like this and it can result in, say 1000 views in
> maximum.
>
> Because access to DB will go through PHP information system, other
> possibility to protect data is to let IS connect as more privileged
> than user really is, but let it retrieve only data for that user.
>
> View-approach seems far more clear than this, but im not sure if
> postgre can handle it without problems.

We do a similar thing tying user to per-row permissions. We have 1 view
per table, and it works fine.
I would recommend that you do something similar. Basically, just make
the view:

CREATE VIEW just_me SECURITY DEFINER AS
SELECT * FROM user_table WHERE username=session_user;
REVOKE ALL FROM user_table;
GRANT SELECT TO just_me TO PUBLIC;

security definer, means that the 'just_me' view will be executed as the
user who created the function (superuser).
The REVOKE ALL (my syntax might be wrong) prevents users from querying
the user tables directly.
The 'session_user' makes the view use the name of the actual connected
user (because of security definer, plain 'user' is the superuser)
This should allow a user to see only their own row in the database.
(Whichever rows that have username matching the connected name).

Now, this only works if the php front end connects specifically as the
given user (our system is setup to do this).

If you don't do it this way, you'll always be stuck with the IS layer
doing the restriction. Even if you create a view per user, if your PHP
layer has the right to look at other tables/views, it doesn't really help.

Good luck,
John
=:->

>
> Thanks for any reply :-)
>
> ---
> Petr Kavan
> Database Development
>
>
>
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
>
>   http://www.postgresql.org/docs/faq
>



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Performance pb vs SQLServer.

2005-08-14 Thread John Arbash Meinel
Stéphane COEZ wrote:

>Hi,
>
>I have a perfomance issue :
>
>I run PG (8.0.3) and SQLServer2000 on a Windows2000 Server (P4 1,5Ghz 512Mo)
>I have a table (320 rows) and I run this single query :
>
>select cod from mytable group by cod
>I have an index on cod (char(4) - 88 different values)
>
>PG = ~ 20 sec.
>SQLServer = < 8 sec
>
>
>the explain is :
>
>HashAggregate  (cost=64410.09..64410.09 rows=55 width=8)
>  ->  Seq Scan on mytable  (cost=0.00..56325.27 rows=3233927 width=8)
>
>
>if I switch to "enable_hashagg = false" (just for a try...)
>the planner will choose my index :
>
>Group  (cost=0.00..76514.01 rows=55 width=8)
>  ->  Index Scan using myindex on mytable  (cost=0.00..68429.20 rows=3233927
>width=8)
>
>but performance will be comparable to previous test.
>
>So with or without using Index I have the same result.
>  
>

My guess is that this is part of a larger query. There isn't really much
you can do. If you want all 3.2M rows, then you have to wait for them to
be pulled in.

What you generally can do for performance, is to restructure things, so
that you *don't* have to touch all 3.2M rows.
If you are just trying to determine what the unique entries are for cod,
you probably are better off doing some normalization, and keeping a
separate table of cod values.

I'm guessing the reason your query is faster with SQLServer is because
of how postgres handles MVCC. Basically, it still has to fetch the main
page to determine if a row exists. While SQL server doesn't do MVCC, so
it can just look things up in the index.

You might also try a different query, something like:

SELECT DISTINCT cod FROM mytable ORDER BY cod GROUP BY cod;
(You may or may not want order by, or group by, try the different
combinations.)
It might be possible to have the planner realize that all you want is
unique rows, just doing a group by doesn't give you that.

John
=:->

>
>Thanks for help.
> 
>Stéphane COEZ
>
>
>
>
>---(end of broadcast)---
>TIP 9: In versions below 8.0, the planner will ignore your desire to
>   choose an index scan if your joining column's datatypes do not
>   match
>
>  
>




signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] How many views is ok?

2005-08-14 Thread Tom Lane
"Petr Kavan" <[EMAIL PROTECTED]> writes:
> Possibility is to create a view for each employee that chooses only his data 
> and give employee privileges to this view. But I am not sure if such number 
> of views does not have some performance drawbacks or even if postgre can 
> support it (I expect i can).

Do you really need more than one view?  I'd consider something like

create view emp_view as select * from emp where name = current_user;

This requires that your Postgres usernames match up with something in
the underlying table, of course.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Performance pb vs SQLServer.

2005-08-14 Thread Steinar H. Gunderson
On Sun, Aug 14, 2005 at 07:27:38PM -0500, John Arbash Meinel wrote:
> My guess is that this is part of a larger query. There isn't really much
> you can do. If you want all 3.2M rows, then you have to wait for them to
> be pulled in.

To me, it looks like he'll get 88 rows, not 3.2M. Surely we must be able to
do something better than a full sequential scan in this case?

test=# create table foo ( bar char(4) );
CREATE TABLE
test=# insert into foo values ('');
INSERT 24773320 1
test=# insert into foo values ('');
INSERT 24773321 1
test=# insert into foo values ('');
INSERT 24773322 1
test=# select * from foo group by bar;
 bar  
--
 
 
(2 rows)

I considered doing some odd magic with generate_series() and subqueries with
LIMIT 1, but it was a bit too weird in the end :-)

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Performance pb vs SQLServer.

2005-08-14 Thread Steinar H. Gunderson
On Sun, Aug 14, 2005 at 07:27:38PM -0500, John Arbash Meinel wrote:
> If you are just trying to determine what the unique entries are for cod,
> you probably are better off doing some normalization, and keeping a
> separate table of cod values.

Pah, I missed this part of the e-mail -- you can ignore most of my (other)
reply, then :-)

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Performance pb vs SQLServer.

2005-08-14 Thread John Arbash Meinel
Steinar H. Gunderson wrote:

>On Sun, Aug 14, 2005 at 07:27:38PM -0500, John Arbash Meinel wrote:
>  
>
>>My guess is that this is part of a larger query. There isn't really much
>>you can do. If you want all 3.2M rows, then you have to wait for them to
>>be pulled in.
>>
>>
>
>To me, it looks like he'll get 88 rows, not 3.2M. Surely we must be able to
>do something better than a full sequential scan in this case?
>
>test=# create table foo ( bar char(4) );
>CREATE TABLE
>test=# insert into foo values ('');
>INSERT 24773320 1
>test=# insert into foo values ('');
>INSERT 24773321 1
>test=# insert into foo values ('');
>INSERT 24773322 1
>test=# select * from foo group by bar;
> bar  
>--
> 
> 
>(2 rows)
>
>I considered doing some odd magic with generate_series() and subqueries with
>LIMIT 1, but it was a bit too weird in the end :-)
>
>/* Steinar */
>  
>
I think a plain "GROUP BY" is not smart enough to detect it doesn't need
all rows (since it is generally used because you want to get aggregate
values of other columns).
I think you would want something like SELECT DISTINCT, possibly with an
ORDER BY rather than a GROUP BY (which was my final suggestion).

John
=:->



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Performance pb vs SQLServer.

2005-08-14 Thread Tom Lane
"Steinar H. Gunderson" <[EMAIL PROTECTED]> writes:
> To me, it looks like he'll get 88 rows, not 3.2M. Surely we must be able to
> do something better than a full sequential scan in this case?

Not really.  There's been some speculation about implementing index
"skip search" --- once you've verified there's at least one visible
row of a given index value, tell the index to skip to the next different
value instead of handing back any of the remaining entries of the
current value.  But it'd be a lot of work and AFAICS not useful for
very many kinds of queries besides this.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Performance pb vs SQLServer.

2005-08-14 Thread Steinar H. Gunderson
On Sun, Aug 14, 2005 at 09:18:45PM -0400, Tom Lane wrote:
> Not really.  There's been some speculation about implementing index
> "skip search" --- once you've verified there's at least one visible
> row of a given index value, tell the index to skip to the next different
> value instead of handing back any of the remaining entries of the
> current value.  But it'd be a lot of work and AFAICS not useful for
> very many kinds of queries besides this.

This is probably a completely wrong way of handling it all, but could it be
done in a PL/PgSQL query like this? (Pseudo-code, sort of; I'm not very well
versed in the actual syntax, but I'd guess you get the idea.)

x = ( SELECT foo FROM table ORDER BY foo LIMIT 1 );
WHILE x IS NOT NULL
  RETURN NEXT x;
  x = ( SELECT foo FROM table WHERE foo > x ORDER BY foo LIMIT 1 );
END;

(Replace with max() and min() for 8.1, of course.)

/* Steinar */
- fond of horrible hacks :-)
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Performance pb vs SQLServer.

2005-08-14 Thread dario_d_s
One little thing. Did you shutdown sql2000 while testing postgresql? Remember 
that postgresql uses system cache. Sql2000 uses a large part of memory as 
buffer and it will not be available to operating system. I must say that, 
probably, results will be the same, but it will be a better test.

> I'm guessing the reason your query is faster with SQLServer is because
> of how postgres handles MVCC. Basically, it still has to fetch the main
> page to determine if a row exists. While SQL server doesn't do MVCC, so
> it can just look things up in the index.

Another thing [almost offtopic]:
I would like to add something to understand what does MVCC means and what are 
the consecuences.
MVCC: multiversion concurrency control. (ehhh...)

Just do this.

Open two psql sessions. Do this:
Session 1:
   begin;
   update any_table set any_column = 'value_a' where other_column = 'value_b'
   -- do not commit
Session 2:
   select any_table where other_column = 'value_b'
   Watch the result.
Session 1:
   commit;
Session 2:
   select any_table where other_column = 'value_b'
   Watch the result.

Now open two session in query analyzer. Do the same thing:
Session 1:
   begin tran
   update any_table set any_column = 'value_a' where other_column = 'value_b'
   -- do not commit
Session 2:
   select any_table where other_column = 'value_b'
   Wait for result.
   Wait... wait... (Oh, a lock! Ok, when you get tired, go back to session 1.)
Session 1:
   commit
Session 2:
   Then watch the result. 

Which one was faster?

["very, very offtopic"]
Ok. This comparition is just as useless as the other one, because it's 
comparing oranges with apples (It's funny anyway). I was just choosing an 
example in which you can see the best of postgresql against 'not so nice' 
behavior of mssql2000 (no service pack, it's my desktop system, I'll do the 
same test later with SP4 and different isolation levels and I'll check 
results). Furthermore, MSSQL2000 is 5 years old now. Does anybody has the same 
cellular phone, or computer? (I don't want to know :-) ). The big question is 
'What do you need?'. No system can give you all. That's marketing 'sarasa'.

Sorry for my english and the noise. [End of offtopic]

Long life, little spam and prosperity.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] How many views is ok?

2005-08-14 Thread Petr Kavan
Hey, that trick with session_user is great! :-) Thank you all very much, 
this will certainly help.



---
Petr Kavan
Database Development


- Original Message - 
From: "John Arbash Meinel" <[EMAIL PROTECTED]>

To: "Petr Kavan" <[EMAIL PROTECTED]>
Cc: 
Sent: Monday, August 15, 2005 2:22 AM
Subject: Re: [PERFORM] How many views is ok?



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org