Re: [PERFORM] Partitioning / Clustering

2005-05-14 Thread PFC

If you make the assertion that you are transferring equal or less
session data between your session server (lets say an RDBMS) and the
app server than you are between the app server and the client, an out
of band 100Mb network for session information is plenty of bandwidth.
	So if you count on a mean page size of 6-8 kbytes gzipped, that will  
prevent you from caching the N first results of the Big Slow Search Query  
in a native object in the user session state (say, a list of integers  
indicating which rows match), so you will have to redo the Big Slow Search  
Query everytime the user clicks on Next Page instead of grabbing a set of  
cached row id's and doing a fast SELECT WHERE id IN ...
	This is the worst case ... I'd gzip() the row id's and stuff them in the  
session, that's always better than blowing up the database with the Big  
Slow Search Query everytime someone does Next Page...

This also represents OLTP style traffic, which postgresql is pretty
good at.  You should easily be able to get over 100Tps.  100 hits per
second is an awful lot of traffic, more than any website I've managed
will ever see.
	On the latest anandtech benchmarks, 100 hits per second on a blog/forum  
software is a big bi-opteron server running dotNET, at 99% load... it's a  
lot if you count only dynamic page hits.

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Partitioning / Clustering

2005-05-12 Thread Alex Stapleton
On 11 May 2005, at 23:35, PFC wrote:


However, memcached (and for us, pg_memcached) is an excellent way  
to improve
horizontal scalability by taking disposable data (like session  
information)
out of the database and putting it in protected RAM.

So, what is the advantage of such a system versus, say, a  
sticky sessions system where each session is assigned to ONE  
application server (not PHP then) which keeps it in RAM as native  
objects instead of serializing and deserializing it on each request ?
I'd say the sticky sessions should perform a lot better, and if  
one machine dies, only the sessions on this one are lost.
But of course you can't do it with PHP as you need an app  
server which can manage sessions. Potentially the savings are huge,  
though.
Theres no reason it couldn't be done with PHP to be fair as long as  
you could ensure that the client was always routed back to the same  
machines. Which has it's own set of issues entirely. I am not  
entirely sure that memcached actually does serialize data when it's  
comitted into memcached either, although I could be wrong, I have not  
looked at the source. Certainly if you can ensure that a client  
always goes back to the same machine you can simplify the whole thing  
hugely. It's generally not that easy though, you need a proxy server  
of some description capable of understanding the HTTP traffic and  
maintaining a central session lookup table to redirect with. Which  
isn't really solving the problem so much as moving it somewhere else.  
Instead of needing huge memcached pools, you need hardcore  
loadbalancers. Load Balancers tend to cost $ in comparison.  
Distributed sticky sessions are a rather nice idea, I would like to  
hear a way of implementing them cheaply (and on PHP) as well. I may  
have to give that some thought in fact. Oh yeah, and load balancers  
software often sucks in annoying (if not always important) ways.

On Google, their distributed system spans a huge number of PCs  
and it has redundancy, ie. individual PC failure is a normal thing  
and is a part of the system, it is handled gracefully. I read a  
paper on this matter, it's pretty impressive. The google filesystem  
has nothing to do with databases though, it's more a massive data  
store / streaming storage.

Since when did Massive Data stores have nothing to do with DBs? Isn't  
Oracle Cluster entirely based on forming an enormous scalable disk  
array to store your DB on?

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Partitioning / Clustering

2005-05-12 Thread PFC

machines. Which has it's own set of issues entirely. I am not entirely  
sure that memcached actually does serialize data when it's comitted into
	I think it does, ie. it's a simple mapping of [string key] = [string  
value].

memcached either, although I could be wrong, I have not looked at the  
source. Certainly if you can ensure that a client always goes back to  
the same machine you can simplify the whole thing hugely. It's generally  
not that easy though, you need a proxy server of some description  
capable of understanding the HTTP traffic and maintaining a central
	Yes...
	You could implement it by mapping servers to the hash of the user session  
id.
	Statistically, the servers would get the same numbers of sessions on each  
of them, but you have to trust statistics...
	It does eliminate the lookup table though.

idea, I would like to hear a way of implementing them cheaply (and on  
PHP) as well. I may have to give that some thought in fact. Oh yeah, and  
load balancers software often sucks in annoying (if not always  
important) ways.
	You can use lighttpd as a load balancer, I believe it has a stick  
sessions plugin (or you could code one in, it's open source after all). It  
definitely support simple round-robin load balancing, acting as a proxy to  
any number of independent servers.


matter, it's pretty impressive. The google filesystem has nothing to do  
with databases though, it's more a massive data store / streaming  
storage.
Since when did Massive Data stores have nothing to do with DBs? Isn't  
Oracle Cluster entirely based on forming an enormous scalable disk array  
to store your DB on?
	Um, well, the Google Filesystem is (like its name implies) a filesystem  
designed to store huge files in a distributed and redundant manner. Files  
are structured as a stream of records (which are themselves big in size)  
and it's designed to support appending records to these stream files  
efficiently and without worrying about locking.

	It has no querying features however, that is why I said it was not a  
database.

	I wish I could find the whitepaper, I think the URL was on this list some  
day, maybe it's on Google's site ?

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Partitioning / Clustering

2005-05-12 Thread Alex Turner
Having local sessions is unnesesary, and here is my logic:

Generaly most people have less than 100Mb of bandwidth to the internet.

If you make the assertion that you are transferring equal or less
session data between your session server (lets say an RDBMS) and the
app server than you are between the app server and the client, an out
of band 100Mb network for session information is plenty of bandwidth. 
This also represents OLTP style traffic, which postgresql is pretty
good at.  You should easily be able to get over 100Tps.  100 hits per
second is an awful lot of traffic, more than any website I've managed
will ever see.

Why solve the complicated clustered sessions problem, when you don't
really need to?

Alex Turner
netEconomist

On 5/11/05, PFC [EMAIL PROTECTED] wrote:
 
 
  However, memcached (and for us, pg_memcached) is an excellent way to
  improve
  horizontal scalability by taking disposable data (like session
  information)
  out of the database and putting it in protected RAM.
 
 So, what is the advantage of such a system versus, say, a sticky
 sessions system where each session is assigned to ONE application server
 (not PHP then) which keeps it in RAM as native objects instead of
 serializing and deserializing it on each request ?
 I'd say the sticky sessions should perform a lot better, and if one
 machine dies, only the sessions on this one are lost.
 But of course you can't do it with PHP as you need an app server which
 can manage sessions. Potentially the savings are huge, though.
 
 On Google, their distributed system spans a huge number of PCs and it 
 has
 redundancy, ie. individual PC failure is a normal thing and is a part of
 the system, it is handled gracefully. I read a paper on this matter, it's
 pretty impressive. The google filesystem has nothing to do with databases
 though, it's more a massive data store / streaming storage.
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Partitioning / Clustering

2005-05-12 Thread Alex Stapleton
On 12 May 2005, at 15:08, Alex Turner wrote:
Having local sessions is unnesesary, and here is my logic:
Generaly most people have less than 100Mb of bandwidth to the  
internet.

If you make the assertion that you are transferring equal or less
session data between your session server (lets say an RDBMS) and the
app server than you are between the app server and the client, an out
of band 100Mb network for session information is plenty of bandwidth.
This also represents OLTP style traffic, which postgresql is pretty
good at.  You should easily be able to get over 100Tps.  100 hits per
second is an awful lot of traffic, more than any website I've managed
will ever see.
Why solve the complicated clustered sessions problem, when you don't
really need to?
100 hits a second = 8,640,000 hits a day. I work on a site which does  
 100 million dynamic pages a day. In comparison Yahoo probably does  
 100,000,000,000 (100 billion) views a day
 if I am interpreting Alexa's charts correctly. Which is about  
1,150,000 a second.

Now considering the site I work on is not even in the top 1000 on  
Alexa, theres a lot of sites out there which need to solve this  
problem I would assume.

There are also only so many hash table lookups a single machine can  
do, even if its a Quad Opteron behemoth.


Alex Turner
netEconomist
On 5/11/05, PFC [EMAIL PROTECTED] wrote:


However, memcached (and for us, pg_memcached) is an excellent way to
improve
horizontal scalability by taking disposable data (like session
information)
out of the database and putting it in protected RAM.
So, what is the advantage of such a system versus, say, a  
sticky
sessions system where each session is assigned to ONE application  
server
(not PHP then) which keeps it in RAM as native objects instead of
serializing and deserializing it on each request ?
I'd say the sticky sessions should perform a lot better,  
and if one
machine dies, only the sessions on this one are lost.
But of course you can't do it with PHP as you need an app  
server which
can manage sessions. Potentially the savings are huge, though.

On Google, their distributed system spans a huge number of  
PCs and it has
redundancy, ie. individual PC failure is a normal thing and is a  
part of
the system, it is handled gracefully. I read a paper on this  
matter, it's
pretty impressive. The google filesystem has nothing to do with  
databases
though, it's more a massive data store / streaming storage.

---(end of  
broadcast)---
TIP 1: subscribe and unsubscribe commands go to  
[EMAIL PROTECTED]




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Partitioning / Clustering

2005-05-12 Thread Alex Turner
Ok - my common sense alarm is going off here...

There are only 6.446 billion people worldwide.  100 Billion page views
would require every person in the world to view 18 pages of yahoo
every day.  Not very likely.

http://www.internetworldstats.com/stats.htm
suggests that there are around 1 billion people actualy on the internet.

That means each and every person on the internet has to view 100 pages
per day of yahoo.

pretty unlikely IMHO.  I for one don't even use Yahoo ;)

100 million page views per day suggests that 1 in 100 people on the
internet each viewed 10 pages of a site.  Thats a pretty high
percentage if you ask me.

If I visit 20 web sites in a day, and see an average of 10 pages per
site. that means only about 2000 or so sites generate 100 million page
views in a day or better.

100 million pageviews averages to 1157/sec, which we'll double for
peak load to 2314.

I can easily see a system doing 2314 hash lookups per second.  Hell I
wrote a system that could do a thousand times that four years ago on a
single 1Ghz Athlon.  Heck - you can get 2314 lookups/sec on a 486 ;)

Given that session information doesn't _have_ to persist to storage,
and can be kept in RAM.  A single server could readily manage session
information for even very large sites (of course over a million
concurrent users could really start chewing into RAM, but if you are
Yahoo, you can probably afford a box with 100GB of RAM ;).

We get over 1000 tps on a dual opteron with a couple of mid size RAID
arrays on 10k discs with fsync on for small transactions.  I'm sure
that could easily be bettered with a few more dollars.

Maybe my number are off, but somehow it doesn't seem like that many
people need a highly complex session solution to me.

Alex Turner
netEconomist

On 5/12/05, Alex Stapleton [EMAIL PROTECTED] wrote:
 
 On 12 May 2005, at 15:08, Alex Turner wrote:
 
  Having local sessions is unnesesary, and here is my logic:
 
  Generaly most people have less than 100Mb of bandwidth to the
  internet.
 
  If you make the assertion that you are transferring equal or less
  session data between your session server (lets say an RDBMS) and the
  app server than you are between the app server and the client, an out
  of band 100Mb network for session information is plenty of bandwidth.
  This also represents OLTP style traffic, which postgresql is pretty
  good at.  You should easily be able to get over 100Tps.  100 hits per
  second is an awful lot of traffic, more than any website I've managed
  will ever see.
 
  Why solve the complicated clustered sessions problem, when you don't
  really need to?
 
 100 hits a second = 8,640,000 hits a day. I work on a site which does
   100 million dynamic pages a day. In comparison Yahoo probably does
   100,000,000,000 (100 billion) views a day
   if I am interpreting Alexa's charts correctly. Which is about
 1,150,000 a second.
 
 Now considering the site I work on is not even in the top 1000 on
 Alexa, theres a lot of sites out there which need to solve this
 problem I would assume.
 
 There are also only so many hash table lookups a single machine can
 do, even if its a Quad Opteron behemoth.
 
 
  Alex Turner
  netEconomist
 
  On 5/11/05, PFC [EMAIL PROTECTED] wrote:
 
 
 
 
  However, memcached (and for us, pg_memcached) is an excellent way to
  improve
  horizontal scalability by taking disposable data (like session
  information)
  out of the database and putting it in protected RAM.
 
 
  So, what is the advantage of such a system versus, say, a
  sticky
  sessions system where each session is assigned to ONE application
  server
  (not PHP then) which keeps it in RAM as native objects instead of
  serializing and deserializing it on each request ?
  I'd say the sticky sessions should perform a lot better,
  and if one
  machine dies, only the sessions on this one are lost.
  But of course you can't do it with PHP as you need an app
  server which
  can manage sessions. Potentially the savings are huge, though.
 
  On Google, their distributed system spans a huge number of
  PCs and it has
  redundancy, ie. individual PC failure is a normal thing and is a
  part of
  the system, it is handled gracefully. I read a paper on this
  matter, it's
  pretty impressive. The google filesystem has nothing to do with
  databases
  though, it's more a massive data store / streaming storage.
 
  ---(end of
  broadcast)---
  TIP 1: subscribe and unsubscribe commands go to
  [EMAIL PROTECTED]
 
 
 
 
 


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Partitioning / Clustering

2005-05-12 Thread John A Meinel
Alex Turner wrote:
Ok - my common sense alarm is going off here...
There are only 6.446 billion people worldwide.  100 Billion page views
would require every person in the world to view 18 pages of yahoo
every day.  Not very likely.
http://www.internetworldstats.com/stats.htm
suggests that there are around 1 billion people actualy on the internet.
That means each and every person on the internet has to view 100 pages
per day of yahoo.
pretty unlikely IMHO.  I for one don't even use Yahoo ;)
100 million page views per day suggests that 1 in 100 people on the
internet each viewed 10 pages of a site.  Thats a pretty high
percentage if you ask me.
In general I think your point is valid. Just remember that it probably
also matters how you count page views. Because technically images are a
separate page (and this thread did discuss serving up images). So if
there are 20 graphics on a specific page, that is 20 server hits just
for that one page.
I could easily see an image heavy site getting 100 hits / page. Which
starts meaning that if 1M users hit 10 pages, then you get 1M*10*100 = 1G.
I still think 100G views on a single website is a lot, but 100M is
certainly possible.
John
=:-


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Partitioning / Clustering

2005-05-12 Thread PFC

100 hits a second = 8,640,000 hits a day. I work on a site which does
100 million dynamic pages a day. In comparison Yahoo probably does
100,000,000,000 (100 billion) views a day
  if I am interpreting Alexa's charts correctly. Which is about  
1,150,000 a second.

Read the help on Alexa's site... ;)
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Partitioning / Clustering

2005-05-12 Thread Josh Berkus
People,

 In general I think your point is valid. Just remember that it probably
 also matters how you count page views. Because technically images are a
 separate page (and this thread did discuss serving up images). So if
 there are 20 graphics on a specific page, that is 20 server hits just
 for that one page.

Also, there's bots and screen-scrapers and RSS, web e-mails, and web services 
and many other things which create hits but are not people.  I'm currently 
working on clickstream for a site which is nowhere in the top 100, and is 
getting 3 million real hits a day ... and we know for a fact that at least 
1/4 of that is bots.

Regardless, the strategy you should be employing for a high traffic site is 
that if your users hit the database for anything other than direct 
interaction (like filling out a webform) then you're lost.Use memcached, 
squid, lighttpd caching, ASP.NET caching, pools, etc.   Keep the load off the 
database except for the stuff that only the database can do.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Partitioning / Clustering

2005-05-12 Thread Alex Stapleton
On 12 May 2005, at 18:33, Josh Berkus wrote:
People,

In general I think your point is valid. Just remember that it  
probably
also matters how you count page views. Because technically images  
are a
separate page (and this thread did discuss serving up images). So if
there are 20 graphics on a specific page, that is 20 server hits just
for that one page.

Also, there's bots and screen-scrapers and RSS, web e-mails, and  
web services
and many other things which create hits but are not people.  I'm  
currently
working on clickstream for a site which is nowhere in the top 100,  
and is
getting 3 million real hits a day ... and we know for a fact that  
at least
1/4 of that is bots.
I doubt bots are generally Alexa toolbar enabled.
Regardless, the strategy you should be employing for a high traffic  
site is
that if your users hit the database for anything other than direct
interaction (like filling out a webform) then you're lost.Use  
memcached,
squid, lighttpd caching, ASP.NET caching, pools, etc.   Keep the  
load off the
database except for the stuff that only the database can do.
This is the aproach I would take as well. There is no point storing  
stuff in a DB, if your only doing direct lookups on it and it isn't  
the sort of data that you care so much about the integrity of.


--
Josh Berkus
Aglio Database Solutions
San Francisco
---(end of  
broadcast)---
TIP 8: explain analyze is your friend



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Partitioning / Clustering

2005-05-12 Thread Josh Berkus
Ross,

 Memcached is a PG memory store, I gather,

Nope.  It's a hyperfast resident-in-memory hash that allows you to stash stuff 
like user session information and even materialized query set results.  
Thanks to SeanC, we even have a plugin, pgmemcached.

 but...what is squid, lighttpd? 
 anything directly PG-related?

No.   These are all related to making the web server do more.   The idea is 
NOT to hit the database every time you have to serve up a web page, and 
possibly not to hit the web server either.  For example, you can use squid 3 
for reverse caching in front of your web server, and serve far more page 
views than you could with Apache alone.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Partitioning / Clustering

2005-05-11 Thread Simon Riggs
On Tue, 2005-05-10 at 11:03 +0100, Alex Stapleton wrote:
 So, when/is PG meant to be getting a decent partitioning system?  

ISTM that your question seems to confuse where code comes from. Without
meaning to pick on you, or reply rudely, I'd like to explore that
question. Perhaps it should be a FAQ entry.

All code is written by someone, and those people need to eat. Some
people are fully or partly funded to perform their tasks on this project
(coding, patching, etc). Others contribute their time for a variety of
reasons where involvement has a positive benefit.

You should ask these questions:
- Is anyone currently working on (Feature X)?
- If not, Can I do it myself?
- If not, and I still want it, can I fund someone else to build it for
me?

Asking when is Feature X going to happen is almost certainly going to
get the answer never otherwise, if the initial development is large
and complex. There are many TODO items that have lain untouched for
years, even though adding the feature has been discussed and agreed.

Best Regards, Simon Riggs



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Partitioning / Clustering

2005-05-11 Thread David Roussel
For an interesting look at scalability, clustering, caching, etc for a
large site have a look at how livejournal did it.
http://www.danga.com/words/2004_lisa/lisa04.pdf

They have 2.6 Million active users, posting 200 new blog entries per
minute, plus many comments and countless page views.

Although this system is of a different sort to the type I work on it's
interesting to see how they've made it scale.

They use mysql on dell hardware! And found single master replication did
not scale.  There's a section on multimaster replication, not sure if
they use it.  The main approach they use is to parition users into
spefic database clusters.  Caching is done using memcached at the
application level to avoid hitting the db for rendered pageviews.

It's interesting that the solution livejournal have arrived at is quite
similar in ways to the way google is set up.

David

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Partitioning / Clustering

2005-05-11 Thread Alex Stapleton
On 11 May 2005, at 08:57, David Roussel wrote:
For an interesting look at scalability, clustering, caching, etc for a
large site have a look at how livejournal did it.
http://www.danga.com/words/2004_lisa/lisa04.pdf
I have implemented similar systems in the past, it's a pretty good  
technique, unfortunately it's not very Plug-and-Play as you have to  
base most of your API on memcached (I imagine MySQLs NDB tables might  
work as well actually) for it to work well.

They have 2.6 Million active users, posting 200 new blog entries per
minute, plus many comments and countless page views.
Although this system is of a different sort to the type I work on it's
interesting to see how they've made it scale.
They use mysql on dell hardware! And found single master  
replication did
not scale.  There's a section on multimaster replication, not sure if
they use it.  The main approach they use is to parition users into
spefic database clusters.  Caching is done using memcached at the
application level to avoid hitting the db for rendered pageviews
I don't think they are storing pre-rendered pages (or bits of) in  
memcached, but are principally storing the data for the pages in it.  
Gluing pages together is not a hugely intensive process usually :)
The only problem with memcached is that the clients clustering/ 
partitioning system will probably break if a node dies, and probably  
get confused if you add new nodes onto it as well. Easily extensible  
clustering (no complete redistribution of data required when you add/ 
remove nodes) with the data distributed across nodes seems to be  
nothing but a pipe dream right now.

It's interesting that the solution livejournal have arrived at is  
quite
similar in ways to the way google is set up.
Don't Google use indexing servers which keep track of where data is?  
So that you only need to update them when you add or move data,  
deletes don't even have to be propagated among indexes immediately  
really because you'll find out if data isn't there when you visit  
where it should be. Or am I talking crap?

David
---(end of  
broadcast)---
TIP 8: explain analyze is your friend



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Partitioning / Clustering

2005-05-11 Thread Alex Stapleton
On 11 May 2005, at 09:50, Alex Stapleton wrote:
On 11 May 2005, at 08:57, David Roussel wrote:

For an interesting look at scalability, clustering, caching, etc  
for a
large site have a look at how livejournal did it.
http://www.danga.com/words/2004_lisa/lisa04.pdf

I have implemented similar systems in the past, it's a pretty good  
technique, unfortunately it's not very Plug-and-Play as you have  
to base most of your API on memcached (I imagine MySQLs NDB tables  
might work as well actually) for it to work well.


They have 2.6 Million active users, posting 200 new blog entries per
minute, plus many comments and countless page views.
Although this system is of a different sort to the type I work on  
it's
interesting to see how they've made it scale.

They use mysql on dell hardware! And found single master  
replication did
not scale.  There's a section on multimaster replication, not sure if
they use it.  The main approach they use is to parition users into
spefic database clusters.  Caching is done using memcached at the
application level to avoid hitting the db for rendered pageviews

I don't think they are storing pre-rendered pages (or bits of) in  
memcached, but are principally storing the data for the pages in  
it. Gluing pages together is not a hugely intensive process usually :)
The only problem with memcached is that the clients clustering/ 
partitioning system will probably break if a node dies, and  
probably get confused if you add new nodes onto it as well. Easily  
extensible clustering (no complete redistribution of data required  
when you add/remove nodes) with the data distributed across nodes  
seems to be nothing but a pipe dream right now.


It's interesting that the solution livejournal have arrived at is  
quite
similar in ways to the way google is set up.

Don't Google use indexing servers which keep track of where data  
is? So that you only need to update them when you add or move data,  
deletes don't even have to be propagated among indexes immediately  
really because you'll find out if data isn't there when you visit  
where it should be. Or am I talking crap?
That will teach me to RTFA first ;) Ok so LJ maintain an index of  
which cluster each user is on, kinda of like google do :)


David
---(end of  
broadcast)---
TIP 8: explain analyze is your friend



---(end of  
broadcast)---
TIP 4: Don't 'kill -9' the postmaster



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Partitioning / Clustering

2005-05-11 Thread Christopher Kings-Lynne
Acceptable Answers to 'So, when/is PG meant to be getting a decent  
partitioning system?':

1. Person X is working on it I believe.
2. It's on the list, but nobody has done anything about it yet
3. Your welcome to take a stab at it, I expect the community  would 
support your efforts as well.
4. If you have a huge pile of money you could probably buy the  
Moon. Thinking along those lines, you can probably pay someone to  write 
it for you.
5. It's a stupid idea, and it's never going to work, and heres  
why..

Unacceptable Answers to the same question:
1. Yours.
Be more helpful, and less arrogant please. Everyone else who has  
contributed to this thread has been very helpful in clarifying the  
state of affairs and pointing out what work is and isn't being done,  
and alternatives to just waiting for PG do it for you.
Please YOU be more helpful and less arrogant.  I thought your inital 
email was arrogant, demanding and insulting.  Your followup email has 
done nothing to dispel my impression.  Simon (one of PostgreSQL's major 
contributors AND one of the very few people working on partitioning in 
PostgreSQL, as you requested) told you all the reasons clearly and politely.

Chris
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Partitioning / Clustering

2005-05-11 Thread Tom Lane
Mischa Sandberg [EMAIL PROTECTED] writes:
 So, simplicity dictates something like:

 table pg_remote(schemaname text, connectby text, remoteschema text)

Previous discussion of this sort of thing concluded that we wanted to
follow the SQL-MED standard.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Partitioning / Clustering

2005-05-11 Thread Josh Berkus
David,

 It's interesting that the solution livejournal have arrived at is quite
 similar in ways to the way google is set up.

Yes, although again, they're using memcached as pseudo-clustering software, 
and as a result are limited to what fits in RAM (RAM on 27 machines, but it's 
still RAM).  And due to limitations on memcached, the whole thing blows 
whenever a server goes out (the memcached project is working on this).  But 
any LJ user could tell you that it's a low-availability system.

However, memcached (and for us, pg_memcached) is an excellent way to improve 
horizontal scalability by taking disposable data (like session information) 
out of the database and putting it in protected RAM.  On some websites, 
adding memcached can result is as much as a 60% decrease in database traffic.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Partitioning / Clustering

2005-05-11 Thread Simon Riggs
On Wed, 2005-05-11 at 17:13 +0800, Christopher Kings-Lynne wrote:
  Alex Stapleton wrote
  Be more helpful, and less arrogant please. 
 
 Simon told you all the reasons clearly and politely.

Thanks Chris for your comments.

PostgreSQL can always do with one more developer and my sole intent was
to encourage Alex and other readers to act themselves. If my words seem
arrogant, then I apologise to any and all that think so.

Best Regards, Simon Riggs 



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Partitioning / Clustering

2005-05-11 Thread Jim C. Nasby
On Wed, May 11, 2005 at 08:57:57AM +0100, David Roussel wrote:
 For an interesting look at scalability, clustering, caching, etc for a
 large site have a look at how livejournal did it.
 http://www.danga.com/words/2004_lisa/lisa04.pdf
 
 They have 2.6 Million active users, posting 200 new blog entries per
 minute, plus many comments and countless page views.

Neither of which is that horribly impressive. 200 TPM is less than 4TPS.
While I haven't run high transaction rate databases under PostgreSQL, I
suspect others who have will say that 4TPS isn't that big of a deal.

 Although this system is of a different sort to the type I work on it's
 interesting to see how they've made it scale.
 
 They use mysql on dell hardware! And found single master replication did
 not scale.  There's a section on multimaster replication, not sure if
Probably didn't scale because they used to use MyISAM.

 they use it.  The main approach they use is to parition users into
 spefic database clusters.  Caching is done using memcached at the
Which means they've got a huge amount of additional code complexity, not
to mention how many times you can't post something because 'that cluster
is down for maintenance'.

 application level to avoid hitting the db for rendered pageviews.
Memcached is about the only good thing I've seen come out of
livejournal.

 It's interesting that the solution livejournal have arrived at is quite
 similar in ways to the way google is set up.

Except that unlike LJ, google stays up and it's fast. Though granted, LJ
is quite a bit faster than it was 6 months ago.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Partitioning / Clustering

2005-05-11 Thread PFC

However, memcached (and for us, pg_memcached) is an excellent way to  
improve
horizontal scalability by taking disposable data (like session  
information)
out of the database and putting it in protected RAM.
	So, what is the advantage of such a system versus, say, a sticky  
sessions system where each session is assigned to ONE application server  
(not PHP then) which keeps it in RAM as native objects instead of  
serializing and deserializing it on each request ?
	I'd say the sticky sessions should perform a lot better, and if one  
machine dies, only the sessions on this one are lost.
	But of course you can't do it with PHP as you need an app server which  
can manage sessions. Potentially the savings are huge, though.

	On Google, their distributed system spans a huge number of PCs and it has  
redundancy, ie. individual PC failure is a normal thing and is a part of  
the system, it is handled gracefully. I read a paper on this matter, it's  
pretty impressive. The google filesystem has nothing to do with databases  
though, it's more a massive data store / streaming storage.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread John A Meinel
Alex Stapleton wrote:
What is the status of Postgres support for any sort of multi-machine
scaling support? What are you meant to do once you've upgraded your  box
and tuned the conf files as much as you can? But your query load  is
just too high for a single machine?
Upgrading stock Dell boxes (I know we could be using better machines,
but I am trying to tackle the real issue) is not a hugely price
efficient way of getting extra performance, nor particularly scalable
in the long term.
Switch from Dell Xeon boxes, and go to Opterons. :) Seriously, Dell is
far away from Big Iron. I don't know what performance you are looking
for, but you can easily get into inserting 10M rows/day with quality
hardware.
But actually is it your SELECT load that is too high, or your INSERT
load, or something inbetween.
Because Slony is around if it is a SELECT problem.
http://gborg.postgresql.org/project/slony1/projdisplay.php
Basically, Slony is a Master/Slave replication system. So if you have
INSERT going into the Master, you can have as many replicated slaves,
which can handle your SELECT load.
Slony is an asynchronous replicator, so there is a time delay from the
INSERT until it will show up on a slave, but that time could be pretty
small.
This would require some application level support, since an INSERT goes
to a different place than a SELECT. But there has been some discussion
about pg_pool being able to spread the query load, and having it be
aware of the difference between a SELECT and an INSERT and have it route
the query to the correct host. The biggest problem being that functions
could cause a SELECT func() to actually insert a row, which pg_pool
wouldn't know about. There are 2 possible solutions, a) don't do that
when you are using this system, b) add some sort of comment hint so that
pg_pool can understand that the select is actually an INSERT, and needs
to be done on the master.
So, when/is PG meant to be getting a decent partitioning system?  MySQL
is getting one (eventually) which is apparently meant to be  similiar to
Oracle's according to the docs. Clusgres does not appear  to be
widely/or at all used, and info on it seems pretty thin on the  ground,
so I am
not too keen on going with that. Is the real solution to multi- machine
partitioning (as in, not like MySQLs MERGE tables) on  PostgreSQL
actually doing it in our application API? This seems like  a less than
perfect solution once we want to add redundancy and  things into the mix.
There is also PGCluster
http://pgfoundry.org/projects/pgcluster/
Which is trying to be more of a Synchronous multi-master system. I
haven't heard of Clusgres, so I'm guessing it is an older attempt, which
has been overtaken by pgcluster.
Just realize that clusters don't necessarily scale like you would want
them too. Because at some point you have to insert into the same table,
which means you need to hold a lock which prevents the other machine
from doing anything. And with synchronous replication, you have to wait
for all of the machines to get a copy of the data before you can say it
has been committed, which does *not* scale well with the number of machines.
If you can make it work, I think having a powerful master server, who
can finish an INSERT quickly, and then having a bunch of Slony slaves
with a middleman (like pg_pool) to do load balancing among them, is the
best way to scale up. There are still some requirements, like not having
to see the results of an INSERT instantly (though if you are using
hinting to pg_pool, you could hint that this query must be done on the
master, realizing that the more you do it, the more you slow everything
down).
John
=:-
PS I don't know what functionality has been actually implemented in
pg_pool, just that it was discussed in the past. Slony-II is also in the
works.


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Adam Haberlach

I think that perhaps he was trying to avoid having to buy Big Iron at all.

With all the Opteron v. Xeon around here, and talk of $30,000 machines,
perhaps it would be worth exploring the option of buying 10 cheapass
machines for $300 each.  At the moment, that $300 buys you, from Dell, a
2.5Ghz Pentium 4 w/ 256mb of RAM and a 40Gb hard drive and gigabit ethernet.
The aggregate CPU and bandwidth is pretty stupendous, but not as easy to
harness as a single machine.

For those of us looking at batch and data warehousing applications, it would
be really handy to be able to partition databases, tables, and processing
load across banks of cheap hardware.

Yes, clustering solutions can distribute the data, and can even do it on a
per-table basis in some cases.  This still leaves it up to the application's
logic to handle reunification of the data.

Ideas:
1. Create a table/storage type that consists of a select statement
on another machine.  While I don't think the current executor is capable of
working on multiple nodes of an execution tree at the same time, it would be
great if it could offload a select of tuples from a remote table to an
entirely different server and merge the resulting data into the current
execution.  I believe MySQL has this, and Oracle may implement it in another
way.

2. There is no #2 at this time, but I'm sure one can be
hypothesized.

...Google and other companies have definitely proved that one can harness
huge clusters of cheap hardware.  It can't be _that_ hard, can it.  :)


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of John A Meinel
Sent: Tuesday, May 10, 2005 7:41 AM
To: Alex Stapleton
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Partitioning / Clustering

Alex Stapleton wrote:
 What is the status of Postgres support for any sort of multi-machine 
 scaling support? What are you meant to do once you've upgraded your  
 box and tuned the conf files as much as you can? But your query load  
 is just too high for a single machine?

 Upgrading stock Dell boxes (I know we could be using better machines, 
 but I am trying to tackle the real issue) is not a hugely price 
 efficient way of getting extra performance, nor particularly scalable 
 in the long term.

Switch from Dell Xeon boxes, and go to Opterons. :) Seriously, Dell is far
away from Big Iron. I don't know what performance you are looking for, but
you can easily get into inserting 10M rows/day with quality hardware.

But actually is it your SELECT load that is too high, or your INSERT load,
or something inbetween.

Because Slony is around if it is a SELECT problem.
http://gborg.postgresql.org/project/slony1/projdisplay.php

Basically, Slony is a Master/Slave replication system. So if you have INSERT
going into the Master, you can have as many replicated slaves, which can
handle your SELECT load.
Slony is an asynchronous replicator, so there is a time delay from the
INSERT until it will show up on a slave, but that time could be pretty
small.

This would require some application level support, since an INSERT goes to a
different place than a SELECT. But there has been some discussion about
pg_pool being able to spread the query load, and having it be aware of the
difference between a SELECT and an INSERT and have it route the query to the
correct host. The biggest problem being that functions could cause a SELECT
func() to actually insert a row, which pg_pool wouldn't know about. There
are 2 possible solutions, a) don't do that when you are using this system,
b) add some sort of comment hint so that pg_pool can understand that the
select is actually an INSERT, and needs to be done on the master.


 So, when/is PG meant to be getting a decent partitioning system?  
 MySQL is getting one (eventually) which is apparently meant to be  
 similiar to Oracle's according to the docs. Clusgres does not appear  
 to be widely/or at all used, and info on it seems pretty thin on the  
 ground, so I am not too keen on going with that. Is the real solution 
 to multi- machine partitioning (as in, not like MySQLs MERGE tables) 
 on  PostgreSQL actually doing it in our application API? This seems 
 like  a less than perfect solution once we want to add redundancy and  
 things into the mix.

There is also PGCluster
http://pgfoundry.org/projects/pgcluster/

Which is trying to be more of a Synchronous multi-master system. I haven't
heard of Clusgres, so I'm guessing it is an older attempt, which has been
overtaken by pgcluster.

Just realize that clusters don't necessarily scale like you would want them
too. Because at some point you have to insert into the same table, which
means you need to hold a lock which prevents the other machine from doing
anything. And with synchronous replication, you have to wait for all of the
machines to get a copy of the data before you can say it has been committed,
which does *not* scale well with the number of machines.

If you can

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Alex Stapleton
On 10 May 2005, at 15:41, John A Meinel wrote:
Alex Stapleton wrote:
What is the status of Postgres support for any sort of multi-machine
scaling support? What are you meant to do once you've upgraded  
your  box
and tuned the conf files as much as you can? But your query load  is
just too high for a single machine?

Upgrading stock Dell boxes (I know we could be using better machines,
but I am trying to tackle the real issue) is not a hugely price
efficient way of getting extra performance, nor particularly scalable
in the long term.
Switch from Dell Xeon boxes, and go to Opterons. :) Seriously, Dell is
far away from Big Iron. I don't know what performance you are looking
for, but you can easily get into inserting 10M rows/day with quality
hardware.
Better hardware = More Efficient != More Scalable
But actually is it your SELECT load that is too high, or your INSERT
load, or something inbetween.
Because Slony is around if it is a SELECT problem.
http://gborg.postgresql.org/project/slony1/projdisplay.php
Basically, Slony is a Master/Slave replication system. So if you have
INSERT going into the Master, you can have as many replicated slaves,
which can handle your SELECT load.
Slony is an asynchronous replicator, so there is a time delay from the
INSERT until it will show up on a slave, but that time could be pretty
small.
snip

So, when/is PG meant to be getting a decent partitioning system?   
MySQL
is getting one (eventually) which is apparently meant to be   
similiar to
Oracle's according to the docs. Clusgres does not appear  to be
widely/or at all used, and info on it seems pretty thin on the   
ground,
so I am
not too keen on going with that. Is the real solution to multi-  
machine
partitioning (as in, not like MySQLs MERGE tables) on  PostgreSQL
actually doing it in our application API? This seems like  a less  
than
perfect solution once we want to add redundancy and  things into  
the mix.

There is also PGCluster
http://pgfoundry.org/projects/pgcluster/
Which is trying to be more of a Synchronous multi-master system. I
haven't heard of Clusgres, so I'm guessing it is an older attempt,  
which
has been overtaken by pgcluster.

Just realize that clusters don't necessarily scale like you would want
them too. Because at some point you have to insert into the same  
table,
which means you need to hold a lock which prevents the other machine
from doing anything. And with synchronous replication, you have to  
wait
for all of the machines to get a copy of the data before you can  
say it
has been committed, which does *not* scale well with the number of  
machines.
This is why I mention partitioning. It solves this issue by storing  
different data sets on different machines under the same schema.  
These seperate chunks of the table can then be replicated as well for  
data redundancy and so on. MySQL are working on these things, but PG  
just has a bunch of third party extensions, I wonder why these are  
not being integrated into the main trunk :/ Thanks for pointing me to  
PGCluster though. It looks like it should be better than Slony at least.

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Alex Stapleton
On 10 May 2005, at 16:02, Adam Haberlach wrote:
I think that perhaps he was trying to avoid having to buy Big  
Iron at all.
You would be right. Although we are not against paying a bit more  
than $300 for a server ;)

With all the Opteron v. Xeon around here, and talk of $30,000  
machines,
perhaps it would be worth exploring the option of buying 10 cheapass
machines for $300 each.  At the moment, that $300 buys you, from  
Dell, a
2.5Ghz Pentium 4 w/ 256mb of RAM and a 40Gb hard drive and gigabit  
ethernet.
The aggregate CPU and bandwidth is pretty stupendous, but not as  
easy to
harness as a single machine.
snip
Yes, clustering solutions can distribute the data, and can even do  
it on a
per-table basis in some cases.  This still leaves it up to the  
application's
logic to handle reunification of the data.
If your going to be programming that sort of logic into your API in  
the beginning, it's not too much more work to add basic replication,  
load balancing and partitioning into it either. But the DB should be  
able to do it for you, adding that stuff in later is often more  
difficult and less likely to get done.

Ideas:
1. Create a table/storage type that consists of a select statement
on another machine.  While I don't think the current executor is  
capable of
working on multiple nodes of an execution tree at the same time, it  
would be
great if it could offload a select of tuples from a remote table to an
entirely different server and merge the resulting data into the  
current
execution.  I believe MySQL has this, and Oracle may implement it  
in another
way.
MySQL sort of has this, it's not as good as Oracle's though.  
Apparently there is a much better version of it in 5.1 though, that  
should make it to stable sometime next year I imagine.

2. There is no #2 at this time, but I'm sure one can be
hypothesized.
I would of thought a particularly smart version of pg_pool could do  
it. It could partition data to different servers if it knew which  
columns to key by on each table.

...Google and other companies have definitely proved that one can  
harness
huge clusters of cheap hardware.  It can't be _that_ hard, can it.  :)
I shudder to think how much the Big Iron equivalent of a google  
data-center would cost.

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of John A  
Meinel
Sent: Tuesday, May 10, 2005 7:41 AM
To: Alex Stapleton
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Partitioning / Clustering

Alex Stapleton wrote:
What is the status of Postgres support for any sort of multi-machine
scaling support? What are you meant to do once you've upgraded your
box and tuned the conf files as much as you can? But your query load
is just too high for a single machine?
Upgrading stock Dell boxes (I know we could be using better machines,
but I am trying to tackle the real issue) is not a hugely price
efficient way of getting extra performance, nor particularly scalable
in the long term.
Switch from Dell Xeon boxes, and go to Opterons. :) Seriously, Dell  
is far
away from Big Iron. I don't know what performance you are looking  
for, but
you can easily get into inserting 10M rows/day with quality hardware.

But actually is it your SELECT load that is too high, or your  
INSERT load,
or something inbetween.

Because Slony is around if it is a SELECT problem.
http://gborg.postgresql.org/project/slony1/projdisplay.php
Basically, Slony is a Master/Slave replication system. So if you  
have INSERT
going into the Master, you can have as many replicated slaves,  
which can
handle your SELECT load.
Slony is an asynchronous replicator, so there is a time delay from the
INSERT until it will show up on a slave, but that time could be pretty
small.

This would require some application level support, since an INSERT  
goes to a
different place than a SELECT. But there has been some discussion  
about
pg_pool being able to spread the query load, and having it be aware  
of the
difference between a SELECT and an INSERT and have it route the  
query to the
correct host. The biggest problem being that functions could cause  
a SELECT
func() to actually insert a row, which pg_pool wouldn't know about.  
There
are 2 possible solutions, a) don't do that when you are using this  
system,
b) add some sort of comment hint so that pg_pool can understand  
that the
select is actually an INSERT, and needs to be done on the master.


So, when/is PG meant to be getting a decent partitioning system?
MySQL is getting one (eventually) which is apparently meant to be
similiar to Oracle's according to the docs. Clusgres does not appear
to be widely/or at all used, and info on it seems pretty thin on the
ground, so I am not too keen on going with that. Is the real solution
to multi- machine partitioning (as in, not like MySQLs MERGE tables)
on  PostgreSQL actually doing it in our application API? This seems
like  a less than perfect solution once we want to add

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Richard_D_Levine
 exploring the option of buying 10 cheapass
 machines for $300 each.  At the moment, that $300 buys you, from Dell, a
 2.5Ghz Pentium 4

Buy cheaper ass Dells with an AMD 64 3000+.  Beats the crap out of the 2.5
GHz Pentium, especially for PostgreSQL.

See the thread Whence the Opterons for more

Rick

[EMAIL PROTECTED] wrote on 05/10/2005 10:02:50 AM:


 I think that perhaps he was trying to avoid having to buy Big Iron at
all.

 With all the Opteron v. Xeon around here, and talk of $30,000 machines,
 perhaps it would be worth exploring the option of buying 10 cheapass
 machines for $300 each.  At the moment, that $300 buys you, from Dell, a
 2.5Ghz Pentium 4 w/ 256mb of RAM and a 40Gb hard drive and gigabit
ethernet.
 The aggregate CPU and bandwidth is pretty stupendous, but not as easy to
 harness as a single machine.

 For those of us looking at batch and data warehousing applications, it
would
 be really handy to be able to partition databases, tables, and processing
 load across banks of cheap hardware.

 Yes, clustering solutions can distribute the data, and can even do it on
a
 per-table basis in some cases.  This still leaves it up to the
application's
 logic to handle reunification of the data.

 Ideas:
1. Create a table/storage type that consists of a select statement
 on another machine.  While I don't think the current executor is capable
of
 working on multiple nodes of an execution tree at the same time, it would
be
 great if it could offload a select of tuples from a remote table to an
 entirely different server and merge the resulting data into the current
 execution.  I believe MySQL has this, and Oracle may implement it in
another
 way.

2. There is no #2 at this time, but I'm sure one can be
 hypothesized.

 ...Google and other companies have definitely proved that one can harness
 huge clusters of cheap hardware.  It can't be _that_ hard, can it.  :)


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of John A
Meinel
 Sent: Tuesday, May 10, 2005 7:41 AM
 To: Alex Stapleton
 Cc: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Partitioning / Clustering

 Alex Stapleton wrote:
  What is the status of Postgres support for any sort of multi-machine
  scaling support? What are you meant to do once you've upgraded your
  box and tuned the conf files as much as you can? But your query load
  is just too high for a single machine?
 
  Upgrading stock Dell boxes (I know we could be using better machines,
  but I am trying to tackle the real issue) is not a hugely price
  efficient way of getting extra performance, nor particularly scalable
  in the long term.

 Switch from Dell Xeon boxes, and go to Opterons. :) Seriously, Dell is
far
 away from Big Iron. I don't know what performance you are looking for,
but
 you can easily get into inserting 10M rows/day with quality hardware.

 But actually is it your SELECT load that is too high, or your INSERT
load,
 or something inbetween.

 Because Slony is around if it is a SELECT problem.
 http://gborg.postgresql.org/project/slony1/projdisplay.php

 Basically, Slony is a Master/Slave replication system. So if you have
INSERT
 going into the Master, you can have as many replicated slaves, which can
 handle your SELECT load.
 Slony is an asynchronous replicator, so there is a time delay from the
 INSERT until it will show up on a slave, but that time could be pretty
 small.

 This would require some application level support, since an INSERT goes
to a
 different place than a SELECT. But there has been some discussion about
 pg_pool being able to spread the query load, and having it be aware of
the
 difference between a SELECT and an INSERT and have it route the query to
the
 correct host. The biggest problem being that functions could cause a
SELECT
 func() to actually insert a row, which pg_pool wouldn't know about. There
 are 2 possible solutions, a) don't do that when you are using this
system,
 b) add some sort of comment hint so that pg_pool can understand that the
 select is actually an INSERT, and needs to be done on the master.

 
  So, when/is PG meant to be getting a decent partitioning system?
  MySQL is getting one (eventually) which is apparently meant to be
  similiar to Oracle's according to the docs. Clusgres does not appear
  to be widely/or at all used, and info on it seems pretty thin on the
  ground, so I am not too keen on going with that. Is the real solution
  to multi- machine partitioning (as in, not like MySQLs MERGE tables)
  on  PostgreSQL actually doing it in our application API? This seems
  like  a less than perfect solution once we want to add redundancy and
  things into the mix.

 There is also PGCluster
 http://pgfoundry.org/projects/pgcluster/

 Which is trying to be more of a Synchronous multi-master system. I
haven't
 heard of Clusgres, so I'm guessing it is an older attempt, which has been
 overtaken by pgcluster.

 Just realize that clusters don't

Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread John A Meinel
Adam Haberlach wrote:
I think that perhaps he was trying to avoid having to buy Big Iron at all.
With all the Opteron v. Xeon around here, and talk of $30,000 machines,
perhaps it would be worth exploring the option of buying 10 cheapass
machines for $300 each.  At the moment, that $300 buys you, from Dell, a
2.5Ghz Pentium 4 w/ 256mb of RAM and a 40Gb hard drive and gigabit ethernet.
The aggregate CPU and bandwidth is pretty stupendous, but not as easy to
harness as a single machine.
For those of us looking at batch and data warehousing applications, it would
be really handy to be able to partition databases, tables, and processing
load across banks of cheap hardware.
Yes, clustering solutions can distribute the data, and can even do it on a
per-table basis in some cases.  This still leaves it up to the application's
logic to handle reunification of the data.
Sure. A lot of this is application dependent, though. For instance
foreign key constraints. In a general cluster solution, you would allow
foreign keys across partitions. I have a feeling this would be extra
slow, and hard to do correctly. Multi-machine transactions are also a
difficulty, since WAL now has to take into account all machines, and you
have to wait for fsync on all of them.
I'm not sure how Oracle does it, but these things seem like they prevent
clustering from really scaling very well.
Ideas:
1. Create a table/storage type that consists of a select statement
on another machine.  While I don't think the current executor is capable of
working on multiple nodes of an execution tree at the same time, it would be
great if it could offload a select of tuples from a remote table to an
entirely different server and merge the resulting data into the current
execution.  I believe MySQL has this, and Oracle may implement it in another
way.
2. There is no #2 at this time, but I'm sure one can be
hypothesized.
...Google and other companies have definitely proved that one can harness
huge clusters of cheap hardware.  It can't be _that_ hard, can it.  :)
Again, it depends on the application. A generic database with lots of
cross reference integrity checking does not work on a cluster very well.
A very distributed db where you don't worry about cross references does
scale. Google made a point of making their application work in a
distributed manner.
In the other post he mentions that pg_pool could naturally split out the
rows into different machines based on partitioning, etc. I would argue
that it is more of a custom pool daemon based on the overall
application. Because you have to start dealing with things like
cross-machine joins. Who handles that? the pool daemon has to, since it
is the only thing that talks to both tables. I think you could certainly
write a reasonably simple application specific daemon where all of the
clients send their queries to, and it figures out where they need to go,
and aggregates them as necessary. But a fully generic one is *not*
simple at all, and I think is far out of the scope of something like
pg_pool.
I'm guessing that PGCluster is looking at working on that, and it might
be true that pg_pool is thinking about it. But just thinking about the
very simple query:
SELECT row1, row2 FROM table1_on_machine_a NATURAL JOIN table2_on_machine_b
WHERE restrict_table_1 AND restrict_table_2
AND restrict_1_based_on_2;
This needs to be broken into something like:
SELECT row1 FROM table1_on_machine_a
WHERE restrict_table_1
ORDER BY join_column;
SELECT row2 FROM table2_on_machine_b
WHERE restrict_table_2
ORDER BY join_column;
Then these rows need to be merge_joined, and the restrict_1_based_on_2
needs to be applied.
This is in no way trivial, and I think it is outside the scope of
pg_pool. Now maybe if you restrict yourself so that each query stays
within one machine you can make it work. Or write your own app to handle
some of this transparently for the clients. But I would expect to make
the problem feasible, it would not be a generic solution.
Maybe I'm off base, I don't really keep track of pg_pool/PGCluster/etc.
But I can see that the problem is very difficult. Not at the very least,
this is a simple query. And it doesn't even do optimizations. You might
actually prefer the above to be done with a Nestloop style, where
table_1 is selected, and then for each row you do a single index select
on table_2. But how is your app going to know that? It has to have the
statistics from the backend databases. And if it has to place an extra
query to get those statistics, you just hurt your scalability even more.
Whereas big-iron already has all the statistics, and can optimize the
query plan.
Perhaps pg_cluster will handle this, by maintaining full statistics
across the cluster on each machine, so that more optimal queries can be
performed. I don't really know.
John
=:-


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread PFC

SELECT row1, row2 FROM table1_on_machine_a NATURAL JOIN  
table2_on_machine_b
WHERE restrict_table_1 AND restrict_table_2
AND restrict_1_based_on_2;
	I don't think that's ever going to be efficient...
	What would be efficient would be, for instance, a Join of a part of a  
table against another part of another table which both happen to be on the  
same machine, because the partitioning was done with this in mind (ie. for  
instance partitioning on client_id and keeping the information for each  
client on the same machine).

	You could build your smart pool daemon in pl/pgsql and use dblink ! At  
least you have the query parser built-in.

I wonder how Oracle does it ;)
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Josh Berkus
Alex,

 This is why I mention partitioning. It solves this issue by storing  
 different data sets on different machines under the same schema.  

That's clustering, actually.  Partitioning is simply dividing up a table into 
chunks and using the chunks intelligently.   Putting those chunks on seperate 
machines is another thing entirely.  

We're working on partitioning through the Bizgres sub-project:
www.bizgres.org  / http://pgfoundry.org/projects/bizgres/
... and will be pushing it to the main PostgreSQL when we have something.

I invite you to join the mailing list.

 These seperate chunks of the table can then be replicated as well for  
 data redundancy and so on. MySQL are working on these things, 

Don't hold your breath.   MySQL, to judge by their first clustering 
implementation, has a *long* way to go before they have anything usable.  In 
fact, at OSCON their engineers were asking Jan Wieck for advice.

If you have $$$ to shell out, my employer (GreenPlum) has a multi-machine 
distributed version of PostgreSQL.  It's proprietary, though.  
www.greenplum.com.

If you have more time than money, I understand that Stanford is working on 
this problem:
http://www-db.stanford.edu/~bawa/

But, overall, some people on this list are very mistaken in thinking it's an 
easy problem.   GP has devoted something like 5 engineers for 3 years to 
develop their system.  Oracle spent over $100 million to develop RAC.  

 but PG   
 just has a bunch of third party extensions, I wonder why these are  
 not being integrated into the main trunk :/ 

Because it represents a host of complex functionality which is not applicable 
to most users?  Because there are 4 types of replication and 3 kinds of 
clusering and not all users want the same kind?

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


RE: [PERFORM] Partitioning / Clustering

2005-05-10 Thread tdrayton

Hi Alex,

Actually, our product can partition data among several clustered nodes
running PostgreSQL, if that is what you are looking for. Data is
distributed based on a designated column. Other tables can be
replicated to all nodes.

For SELECTs, it also knows when it can join locally or it needs to ship
rows as part of the query plan. For FK constraints (discussed here), it
also knows when it can enforce them locally or not.

Please let me know if you would like some more information.

Regards,

Tom Drayton
ExtenDB
http://www.extendb.com



 This is why I mention partitioning. It solves this issue by storing  
 different data sets on different machines under the same schema.  
 These seperate chunks of the table can then be replicated as 
 well for  
 data redundancy and so on. MySQL are working on these things, but PG  
 just has a bunch of third party extensions, I wonder why these are  
 not being integrated into the main trunk :/ Thanks for 
 pointing me to  
 PGCluster though. It looks like it should be better than 
 Slony at least.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Mischa Sandberg
Quoting [EMAIL PROTECTED]:

  exploring the option of buying 10 cheapass
  machines for $300 each.  At the moment, that $300 buys you, from
 Dell, a
  2.5Ghz Pentium 4
 
 Buy cheaper ass Dells with an AMD 64 3000+.  Beats the crap out of
 the 2.5
 GHz Pentium, especially for PostgreSQL.

Whence Dells with an AMD 64 ?? Perhaps you skimmed:

  http://www.thestreet.com/tech/kcswanson/10150604.html
or
  http://www.eweek.com/article2/0,1759,1553822,00.asp





---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Jim C. Nasby
On Tue, May 10, 2005 at 07:29:59PM +0200, PFC wrote:
   I wonder how Oracle does it ;)

Oracle *clustering* demands shared storage. So you've shifted your money
from big-iron CPUs to big-iron disk arrays.

Oracle replication works similar to Slony, though it supports a lot more
modes (ie: syncronous).
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Mischa Sandberg
Quoting Alex Stapleton [EMAIL PROTECTED]:

 This is why I mention partitioning. It solves this issue by storing 
 different data sets on different machines under the same schema.  
 These seperate chunks of the table can then be replicated as well for
 data redundancy and so on. MySQL are working on these things, but PG 
 just has a bunch of third party extensions, I wonder why these are  
 not being integrated into the main trunk :/ Thanks for pointing me to
 PGCluster though. It looks like it should be better than Slony at
 least.

Across a decade or two of projects, including creating a federated
database engine for Simba, I've become rather dubious of horizontal
partitions (across disks or servers), either to improve performance, or
just to scale up and not lose performance. [[The one exception is for
emphasis non-time-critical read-only/emphasis systems, with
Slony-style replication.]]

The most successful high-volume systems I've seen have broken up
databases functionally, like a pipeline, where different applications
use different sections of the pipe. 

The highest-volume system I've worked on is Acxiom's gigantic
data-cleansing system. This is the central clearinghouse for every scrap
of demographic that can be associated with some North American,
somewhere. Think of DB for 300M people (some dead). The volumes are
just beyond belief, for both updates and queries. At Acxiom, the
datasets are so large, even after partitioning, that they just
constantly cycle them through memory, and commands are executes in
convoys --- sort of like riding a paternoster.
..
Anybody been tracking on what Mr Stonebraker's been up to, lately?
Datastream management. Check it out. Like replication, everybody
hand-rolled their own datastream systems until finally somebody else
generalized it well enough that it didn't have to be built from scratch
every time.

Datastream systems require practically no locking, let alone distributed
transactions. They give you some really strong guarantees on transaction
elapsed-time and throughput. 
...
Where is this all leading? Well, for scaling data like this, the one
feature that you need is the ability of procedures/rules on one server
to perform queries/procedures on another. MSSQL has linked servers and
(blech) OpenQuery. This lets you do reasonably-efficient work when you
only deal with one table at a time. Do NOT try anything fancy with
multi-table joins; timeouts are unavoidable, and painful.

Postgres has a natural advantage in such a distributed server system:
all table/index stats are openly available through the SQL interface,
for one server to make rational query plans involving another server's
resources. God! I would have killed for that when I was writing a
federated SQL engine; the kluges you need to do this at arms-length from
that information are true pain.

So where should I go look, to see what's been done so far, on a Postgres
that can treat another PG server as a new table type?



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Jim C. Nasby
On Tue, May 10, 2005 at 02:55:55PM -0700, Mischa Sandberg wrote:
 just beyond belief, for both updates and queries. At Acxiom, the
 datasets are so large, even after partitioning, that they just
 constantly cycle them through memory, and commands are executes in
 convoys --- sort of like riding a paternoster.

Speaking of which... what's the status of the patch that allows seqscans
to piggyback on already running seqscans on the same table?

 So where should I go look, to see what's been done so far, on a Postgres
 that can treat another PG server as a new table type?

To the best of my knowledge no such work has been done. There is a
project (who's name escapes me) that lets you run queries against a
remote postgresql server from a postgresql connection to a different
server, which could serve as the basis for what you're proposing.

BTW, given your experience, you might want to check out Bizgres.
(http://pgfoundry.org/projects/bizgres/) I'm sure your insights would be
most welcome.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Christopher Kings-Lynne
This is why I mention partitioning. It solves this issue by storing  
different data sets on different machines under the same schema.  These 
seperate chunks of the table can then be replicated as well for  data 
redundancy and so on. MySQL are working on these things
*laff*
Yeah, like they've been working on views for the last 5 years, and still 
haven't released them :D :D :D

Chris
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Mischa Sandberg
Quoting Christopher Kings-Lynne [EMAIL PROTECTED]:

  This is why I mention partitioning. It solves this issue by storing
  different data sets on different machines under the same schema. 
  These seperate chunks of the table can then be replicated as well for 
  data redundancy and so on. MySQL are working on these things
 *laff*
 Yeah, like they've been working on views for the last 5 years, and
 still haven't released them :D :D :D

? 
http://dev.mysql.com/doc/mysql/en/create-view.html
...for MySQL 5.0.1+ ?


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Joshua D. Drake
Mischa Sandberg wrote:
Quoting Christopher Kings-Lynne [EMAIL PROTECTED]:

This is why I mention partitioning. It solves this issue by storing
different data sets on different machines under the same schema. 
These seperate chunks of the table can then be replicated as well for 
data redundancy and so on. MySQL are working on these things
*laff*
Yeah, like they've been working on views for the last 5 years, and
still haven't released them :D :D :D

? 
http://dev.mysql.com/doc/mysql/en/create-view.html
...for MySQL 5.0.1+ ?
Yes but MySQL 5 isn't out yet (considered stable).
Sincerely,
Joshua D. Drake


---(end of broadcast)---
TIP 6: Have you searched our list archives?
   http://archives.postgresql.org

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Christopher Kings-Lynne

*laff*
Yeah, like they've been working on views for the last 5 years, and
still haven't released them :D :D :D

? 
http://dev.mysql.com/doc/mysql/en/create-view.html
...for MySQL 5.0.1+ ?
Give me a call when it's RELEASED.
Chris
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Mischa Sandberg
Quoting Christopher Kings-Lynne [EMAIL PROTECTED]:

 
 *laff*
 Yeah, like they've been working on views for the last 5 years, and
 still haven't released them :D :D :D
  
  ? 
  http://dev.mysql.com/doc/mysql/en/create-view.html
  ...for MySQL 5.0.1+ ?
 
 Give me a call when it's RELEASED.


:-) Touche'



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Mischa Sandberg
Quoting Jim C. Nasby [EMAIL PROTECTED]:

 To the best of my knowledge no such work has been done. There is a
 project (who's name escapes me) that lets you run queries against a
 remote postgresql server from a postgresql connection to a different
 server, which could serve as the basis for what you're proposing.

Okay, if the following looks right to the powerthatbe, I'd like to start
a project. Here's the proposition:

servername.dbname.schema.object would change RangeVar, which would
affect much code. dbname.schema.object itself is not implemented in
8.0. So, simplicity dictates something like:

table pg_remote(schemaname text, connectby text, remoteschema text)

The pg_statistic info from a remote server cannot be cached in local
pg_statistic, without inventing pseudo reloids as well as a
pseudoschema. Probably cleaner to cache it somewhere else. I'm still
reading down the path that puts pg_statistic data where costsize can get
at it.

First step: find out whether one can link libpq.so to postmaster :-)


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Neil Conway
Josh Berkus wrote:
Don't hold your breath.   MySQL, to judge by their first clustering 
implementation, has a *long* way to go before they have anything usable.
Oh? What's wrong with MySQL's clustering implementation?
-Neil
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Joshua D. Drake
Neil Conway wrote:
Josh Berkus wrote:
Don't hold your breath.   MySQL, to judge by their first clustering 
implementation, has a *long* way to go before they have anything usable.

Oh? What's wrong with MySQL's clustering implementation?
Ram only tables :)
-Neil
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Bruno Wolff III
On Tue, May 10, 2005 at 08:02:50 -0700,
  Adam Haberlach [EMAIL PROTECTED] wrote:
 
 
 With all the Opteron v. Xeon around here, and talk of $30,000 machines,
 perhaps it would be worth exploring the option of buying 10 cheapass
 machines for $300 each.  At the moment, that $300 buys you, from Dell, a
 2.5Ghz Pentium 4 w/ 256mb of RAM and a 40Gb hard drive and gigabit ethernet.
 The aggregate CPU and bandwidth is pretty stupendous, but not as easy to
 harness as a single machine.

That isn't going to be ECC ram. I don't think you really want to use
non-ECC ram in a critical database.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Neil Conway
Joshua D. Drake wrote:
Neil Conway wrote:
Oh? What's wrong with MySQL's clustering implementation?
Ram only tables :)
Sure, but that hardly makes it not usable. Considering the price of 
RAM these days, having enough RAM to hold the database (distributed over 
the entire cluster) is perfectly acceptable for quite a few people.

(Another deficiency is in 4.0, predicates in queries would not be pushed 
down to storage nodes -- so you had to stream the *entire* table over 
the network, and then apply the WHERE clause at the frontend query node. 
That is fixed in 5.0, though.)

-Neil
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faq


Re: [PERFORM] Partitioning / Clustering

2005-05-10 Thread Josh Berkus
Neil,

 Sure, but that hardly makes it not usable. Considering the price of
 RAM these days, having enough RAM to hold the database (distributed over
 the entire cluster) is perfectly acceptable for quite a few people.

The other problem, as I was told it at OSCON, was that these were not 
high-availability clusters; it's impossible to add a server to an existing 
cluster, and a server going down is liable to take the whole cluster down.  
Mind you, I've not tried that aspect of it myself; once I saw the ram-only 
rule, we switched to something else.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org