Re: [HACKERS] Raw device I/O for large objects

2007-09-20 Thread Georgi Chulkov
Thank you everyone for your valuable input! I will have a look at some other 
part of PostgreSQL, and maybe find something else to do instead.

Best,
Georgi

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Raw device I/O for large objects

2007-09-18 Thread Markus Schiltknecht

Hi,

Georgi Chulkov wrote:

Please allow me to ask then:
1. In your opinion, would the above scenario indeed benefit from a raw-device 
interface for large objects?


No, because file systems also try to do what you outline above. They 
certainly don't split sequential data up into blocks and distribute them 
randomly over the device, at least not without having a pretty good 
reason to do so (with which you'd also have to fight).


The possible gain achievable is pretty minimal, especially in 
conjunction with a (hopefully battery backed) write cache.


2. How feasible it is to decouple general table storage from large object 
storage?


I think that would be the easiest part. I would go for a pluggable 
storage implementation, selectable per tablespace. But then again, I 
wouldn't do it at all. After all, this is what MySQL is doing. And we 
certainly don't want to repeat their mistakes! Or do you know anybody 
who goes like: Yepee, multiple storages engines to choose from for my 
(un)valuable data, lets put some here and others there


Let's optimize the *one* storage engine we have and try to make that 
work well together with the various filesystems it uses. Because 
filesystems are already very good in what they are used for. (And we are 
glad we can use a filesystem and don't need to implement one ourselves).


Regards

Markus


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Raw device I/O for large objects

2007-09-18 Thread Tom Lane
Georgi Chulkov [EMAIL PROTECTED] writes:
 Here's the reason why I'm looking at raw device storage for large objects 
 only 
 (as opposed to all tables): with raw device I/O I can control, to an extent, 
 spatial locality. So, if I have an application that wants to store N large 
 objects (totaling several gigabytes) and read them back in some order that is 
 well-known in advance, I could store my large objects in that order on the 
 raw device.* Sequentially reading them back would then be very efficient. 
 With a file system underneath, I don't have that freedom. (Such a scenario 
 occurs with raster databases, for example.)

Not sure I buy that argument.  If you have loaded these large objects in
the desired order, then the data will be consecutively located in
pg_largeobject, and if the underlying filesystem is at all sane about
where it extends a growing file, the data will be pretty much
consecutive on disk too.  You could probably get marginal improvements
by cutting out the middleman but I'm not sure there's reason to think
there'd be spectacular improvements.

 Please allow me to ask then:
 1. In your opinion, would the above scenario indeed benefit from a raw-device 
 interface for large objects?

I don't say it wouldn't benefit.  What I'm questioning is the size of
the benefit compared to the amount of work required to get it.
Supporting raw I/O is not some trivial bit of work --- you essentially
have to reimplement your own filesystem, because like it or not you
*do* have to think about space management.  If we went in this direction
we'd be buying into a lot of work, not to mention a lot of ongoing
portability headaches.  So far no one's been able to make a case that
it's worth that level of effort.

 2. How feasible it is to decouple general table storage from large object 
 storage?

You might try digging into the original POSTGRES sources --- at one time
there were several different large-object APIs.  I'm not sure if they
exposed them just as different sets of access functions or if there was
something more elegant.  My own feeling though is that you probably
don't want to go that way, because with outside-the-database storage you
lose transactional behavior (unless you're up for reinventing that
wheel too).  I'd try replacing md.c, or maybe resurrecting smgr.c as
something that can really switch between more than one underlying
storage manager.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[HACKERS] Raw device I/O for large objects

2007-09-17 Thread Georgi Chulkov
Hello,

I am a graduate student of computer science and I have been looking at 
PostgreSQL for my master's thesis work.

I am looking into implementing raw device I/O for large objects into 
PostgreSQL (maybe for all storage, I'm not sure which would be 
easier/better). I am extremely new to the codebase, however.

Could someone please point me to the right places to look at, and how/where to 
get started? Would such a development be useful at all? Is anyone working on 
anything related?

Any feedback / information would be highly appreciated!

Thanks,
Georgi

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Raw device I/O for large objects

2007-09-17 Thread Sibte Abbas
On 9/17/07, Georgi Chulkov [EMAIL PROTECTED] wrote:

 Could someone please point me to the right places to look at, and how/where to
 get started? Would such a development be useful at all? Is anyone working on
 anything related?

 Any feedback / information would be highly appreciated!


http://www.postgresql.org/docs/techdocs
http://www.postgresql.org/docs/faq/

The postgresql documentation:
http://www.postgresql.org/docs/8.2/interactive/index.html

Also, If you have the source, the src/tools/backend directory has some
useful material for starters.

regards,
--
Sibte Abbas

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Raw device I/O for large objects

2007-09-17 Thread Tom Lane
Georgi Chulkov [EMAIL PROTECTED] writes:
 I am looking into implementing raw device I/O for large objects into 
 PostgreSQL (maybe for all storage, I'm not sure which would be 
 easier/better).

We've heard this idea proposed before, and it's been shot down as a poor
use of development effort every time.  Check the archives for previous
threads, but the basic argument goes like this: when Oracle et al did
that twenty years ago, it was a good idea because (1) operating systems
tended to have sucky filesystems, (2) performance and reliability
properties of same were not very consistent across platforms, and (3)
being large commercial software vendors they could afford to throw lots
of warm bodies at anything that seemed like a bottleneck.  None of those
arguments holds up well for us today however.  If you think you want to
reimplement a filesystem you need to have some pretty concrete reasons
why you can outsmart all the smart folks who have worked on
your-favorite-OS's filesystems for lo these many years.  There's also
the fact that on any reasonably modern disk hardware, raw I/O is
anything but.

My opinion is that there is lots of lower-hanging fruit elsewhere.
You can find some ideas on our TODO list, or troll the pghackers
list archives for other discussions.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Raw device I/O for large objects

2007-09-17 Thread Georgi Chulkov
Hi,

 We've heard this idea proposed before, and it's been shot down as a poor
 use of development effort every time.  Check the archives for previous
 threads, but the basic argument goes like this: when Oracle et al did
 that twenty years ago, it was a good idea because (1) operating systems
 tended to have sucky filesystems, (2) performance and reliability
 properties of same were not very consistent across platforms, and (3)
 being large commercial software vendors they could afford to throw lots
 of warm bodies at anything that seemed like a bottleneck.  None of those
 arguments holds up well for us today however.  If you think you want to
 reimplement a filesystem you need to have some pretty concrete reasons
 why you can outsmart all the smart folks who have worked on
 your-favorite-OS's filesystems for lo these many years.  There's also
 the fact that on any reasonably modern disk hardware, raw I/O is
 anything but.

Thanks, I agree with all your arguments.

Here's the reason why I'm looking at raw device storage for large objects only 
(as opposed to all tables): with raw device I/O I can control, to an extent, 
spatial locality. So, if I have an application that wants to store N large 
objects (totaling several gigabytes) and read them back in some order that is 
well-known in advance, I could store my large objects in that order on the 
raw device.* Sequentially reading them back would then be very efficient. 
With a file system underneath, I don't have that freedom. (Such a scenario 
occurs with raster databases, for example.)

* assuming I have a way to communicate these requirements; that's a whole new 
problem

Please allow me to ask then:
1. In your opinion, would the above scenario indeed benefit from a raw-device 
interface for large objects?
2. How feasible it is to decouple general table storage from large object 
storage?

Thank you for your time,

Georgi

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Raw device I/O for large objects

2007-09-17 Thread Luke Lonergan
Index organized tables would do this and it would be a generic capability.

- Luke

Msg is shrt cuz m on ma treo

 -Original Message-
From:   Georgi Chulkov [mailto:[EMAIL PROTECTED]
Sent:   Monday, September 17, 2007 11:50 PM Eastern Standard Time
To: Tom Lane
Cc: pgsql-hackers@postgresql.org
Subject:Re: [HACKERS] Raw device I/O for large objects

Hi,

 We've heard this idea proposed before, and it's been shot down as a poor
 use of development effort every time.  Check the archives for previous
 threads, but the basic argument goes like this: when Oracle et al did
 that twenty years ago, it was a good idea because (1) operating systems
 tended to have sucky filesystems, (2) performance and reliability
 properties of same were not very consistent across platforms, and (3)
 being large commercial software vendors they could afford to throw lots
 of warm bodies at anything that seemed like a bottleneck.  None of those
 arguments holds up well for us today however.  If you think you want to
 reimplement a filesystem you need to have some pretty concrete reasons
 why you can outsmart all the smart folks who have worked on
 your-favorite-OS's filesystems for lo these many years.  There's also
 the fact that on any reasonably modern disk hardware, raw I/O is
 anything but.

Thanks, I agree with all your arguments.

Here's the reason why I'm looking at raw device storage for large objects only 
(as opposed to all tables): with raw device I/O I can control, to an extent, 
spatial locality. So, if I have an application that wants to store N large 
objects (totaling several gigabytes) and read them back in some order that is 
well-known in advance, I could store my large objects in that order on the 
raw device.* Sequentially reading them back would then be very efficient. 
With a file system underneath, I don't have that freedom. (Such a scenario 
occurs with raster databases, for example.)

* assuming I have a way to communicate these requirements; that's a whole new 
problem

Please allow me to ask then:
1. In your opinion, would the above scenario indeed benefit from a raw-device 
interface for large objects?
2. How feasible it is to decouple general table storage from large object 
storage?

Thank you for your time,

Georgi

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly