Re: [HACKERS] Raw device I/O for large objects
Thank you everyone for your valuable input! I will have a look at some other part of PostgreSQL, and maybe find something else to do instead. Best, Georgi ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Raw device I/O for large objects
Hi, Georgi Chulkov wrote: Please allow me to ask then: 1. In your opinion, would the above scenario indeed benefit from a raw-device interface for large objects? No, because file systems also try to do what you outline above. They certainly don't split sequential data up into blocks and distribute them randomly over the device, at least not without having a pretty good reason to do so (with which you'd also have to fight). The possible gain achievable is pretty minimal, especially in conjunction with a (hopefully battery backed) write cache. 2. How feasible it is to decouple general table storage from large object storage? I think that would be the easiest part. I would go for a pluggable storage implementation, selectable per tablespace. But then again, I wouldn't do it at all. After all, this is what MySQL is doing. And we certainly don't want to repeat their mistakes! Or do you know anybody who goes like: Yepee, multiple storages engines to choose from for my (un)valuable data, lets put some here and others there Let's optimize the *one* storage engine we have and try to make that work well together with the various filesystems it uses. Because filesystems are already very good in what they are used for. (And we are glad we can use a filesystem and don't need to implement one ourselves). Regards Markus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Raw device I/O for large objects
Georgi Chulkov [EMAIL PROTECTED] writes: Here's the reason why I'm looking at raw device storage for large objects only (as opposed to all tables): with raw device I/O I can control, to an extent, spatial locality. So, if I have an application that wants to store N large objects (totaling several gigabytes) and read them back in some order that is well-known in advance, I could store my large objects in that order on the raw device.* Sequentially reading them back would then be very efficient. With a file system underneath, I don't have that freedom. (Such a scenario occurs with raster databases, for example.) Not sure I buy that argument. If you have loaded these large objects in the desired order, then the data will be consecutively located in pg_largeobject, and if the underlying filesystem is at all sane about where it extends a growing file, the data will be pretty much consecutive on disk too. You could probably get marginal improvements by cutting out the middleman but I'm not sure there's reason to think there'd be spectacular improvements. Please allow me to ask then: 1. In your opinion, would the above scenario indeed benefit from a raw-device interface for large objects? I don't say it wouldn't benefit. What I'm questioning is the size of the benefit compared to the amount of work required to get it. Supporting raw I/O is not some trivial bit of work --- you essentially have to reimplement your own filesystem, because like it or not you *do* have to think about space management. If we went in this direction we'd be buying into a lot of work, not to mention a lot of ongoing portability headaches. So far no one's been able to make a case that it's worth that level of effort. 2. How feasible it is to decouple general table storage from large object storage? You might try digging into the original POSTGRES sources --- at one time there were several different large-object APIs. I'm not sure if they exposed them just as different sets of access functions or if there was something more elegant. My own feeling though is that you probably don't want to go that way, because with outside-the-database storage you lose transactional behavior (unless you're up for reinventing that wheel too). I'd try replacing md.c, or maybe resurrecting smgr.c as something that can really switch between more than one underlying storage manager. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[HACKERS] Raw device I/O for large objects
Hello, I am a graduate student of computer science and I have been looking at PostgreSQL for my master's thesis work. I am looking into implementing raw device I/O for large objects into PostgreSQL (maybe for all storage, I'm not sure which would be easier/better). I am extremely new to the codebase, however. Could someone please point me to the right places to look at, and how/where to get started? Would such a development be useful at all? Is anyone working on anything related? Any feedback / information would be highly appreciated! Thanks, Georgi ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Raw device I/O for large objects
On 9/17/07, Georgi Chulkov [EMAIL PROTECTED] wrote: Could someone please point me to the right places to look at, and how/where to get started? Would such a development be useful at all? Is anyone working on anything related? Any feedback / information would be highly appreciated! http://www.postgresql.org/docs/techdocs http://www.postgresql.org/docs/faq/ The postgresql documentation: http://www.postgresql.org/docs/8.2/interactive/index.html Also, If you have the source, the src/tools/backend directory has some useful material for starters. regards, -- Sibte Abbas ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Raw device I/O for large objects
Georgi Chulkov [EMAIL PROTECTED] writes: I am looking into implementing raw device I/O for large objects into PostgreSQL (maybe for all storage, I'm not sure which would be easier/better). We've heard this idea proposed before, and it's been shot down as a poor use of development effort every time. Check the archives for previous threads, but the basic argument goes like this: when Oracle et al did that twenty years ago, it was a good idea because (1) operating systems tended to have sucky filesystems, (2) performance and reliability properties of same were not very consistent across platforms, and (3) being large commercial software vendors they could afford to throw lots of warm bodies at anything that seemed like a bottleneck. None of those arguments holds up well for us today however. If you think you want to reimplement a filesystem you need to have some pretty concrete reasons why you can outsmart all the smart folks who have worked on your-favorite-OS's filesystems for lo these many years. There's also the fact that on any reasonably modern disk hardware, raw I/O is anything but. My opinion is that there is lots of lower-hanging fruit elsewhere. You can find some ideas on our TODO list, or troll the pghackers list archives for other discussions. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Raw device I/O for large objects
Hi, We've heard this idea proposed before, and it's been shot down as a poor use of development effort every time. Check the archives for previous threads, but the basic argument goes like this: when Oracle et al did that twenty years ago, it was a good idea because (1) operating systems tended to have sucky filesystems, (2) performance and reliability properties of same were not very consistent across platforms, and (3) being large commercial software vendors they could afford to throw lots of warm bodies at anything that seemed like a bottleneck. None of those arguments holds up well for us today however. If you think you want to reimplement a filesystem you need to have some pretty concrete reasons why you can outsmart all the smart folks who have worked on your-favorite-OS's filesystems for lo these many years. There's also the fact that on any reasonably modern disk hardware, raw I/O is anything but. Thanks, I agree with all your arguments. Here's the reason why I'm looking at raw device storage for large objects only (as opposed to all tables): with raw device I/O I can control, to an extent, spatial locality. So, if I have an application that wants to store N large objects (totaling several gigabytes) and read them back in some order that is well-known in advance, I could store my large objects in that order on the raw device.* Sequentially reading them back would then be very efficient. With a file system underneath, I don't have that freedom. (Such a scenario occurs with raster databases, for example.) * assuming I have a way to communicate these requirements; that's a whole new problem Please allow me to ask then: 1. In your opinion, would the above scenario indeed benefit from a raw-device interface for large objects? 2. How feasible it is to decouple general table storage from large object storage? Thank you for your time, Georgi ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Raw device I/O for large objects
Index organized tables would do this and it would be a generic capability. - Luke Msg is shrt cuz m on ma treo -Original Message- From: Georgi Chulkov [mailto:[EMAIL PROTECTED] Sent: Monday, September 17, 2007 11:50 PM Eastern Standard Time To: Tom Lane Cc: pgsql-hackers@postgresql.org Subject:Re: [HACKERS] Raw device I/O for large objects Hi, We've heard this idea proposed before, and it's been shot down as a poor use of development effort every time. Check the archives for previous threads, but the basic argument goes like this: when Oracle et al did that twenty years ago, it was a good idea because (1) operating systems tended to have sucky filesystems, (2) performance and reliability properties of same were not very consistent across platforms, and (3) being large commercial software vendors they could afford to throw lots of warm bodies at anything that seemed like a bottleneck. None of those arguments holds up well for us today however. If you think you want to reimplement a filesystem you need to have some pretty concrete reasons why you can outsmart all the smart folks who have worked on your-favorite-OS's filesystems for lo these many years. There's also the fact that on any reasonably modern disk hardware, raw I/O is anything but. Thanks, I agree with all your arguments. Here's the reason why I'm looking at raw device storage for large objects only (as opposed to all tables): with raw device I/O I can control, to an extent, spatial locality. So, if I have an application that wants to store N large objects (totaling several gigabytes) and read them back in some order that is well-known in advance, I could store my large objects in that order on the raw device.* Sequentially reading them back would then be very efficient. With a file system underneath, I don't have that freedom. (Such a scenario occurs with raster databases, for example.) * assuming I have a way to communicate these requirements; that's a whole new problem Please allow me to ask then: 1. In your opinion, would the above scenario indeed benefit from a raw-device interface for large objects? 2. How feasible it is to decouple general table storage from large object storage? Thank you for your time, Georgi ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly