Re: [HACKERS] Hadoop backend?

2009-07-21 Thread Ron Mayer
Paul Sheer wrote: Hadoop backend for PostGreSQL Resurrecting an old thread, it seems some guys at Yale implemented something very similar to what this thread was discussing. http://dbmsmusings.blogspot.com/2009/07/announcing-release-of-hadoopdb-longer.html It's an open source stack that

Re: [HACKERS] Hadoop backend?

2009-02-24 Thread Paul Sheer
As far as I can tell, the PG storage manager API is at the wrong level of abstraction for pretty much everything. These days, everything we do is atop the Unix filesystem API, and anything that smgr might have been Is there a complete list of filesystem API calls somewhere that I can get

Re: [HACKERS] Hadoop backend?

2009-02-24 Thread Hans-Jürgen Schönig
why not just stream it in via set-returning functions and make sure that we can mark a set returning function as STREAMABLE or so (to prevent joins, whatever). is it the easiest way to get it right and it helps in many other cases. i think that the storage manager is definitely the wrong

Re: [HACKERS] Hadoop backend?

2009-02-24 Thread Peter Eisentraut
Tom Lane wrote: It's interesting to speculate about where we could draw an abstraction boundary that would be more useful. I don't think the MySQL guys got it right either... The supposed smgr abstraction of PostgreSQL, which tells more or less how to get a byte to the disk, is quite far

Re: [HACKERS] Hadoop backend?

2009-02-24 Thread Josh Berkus
With a distributed data store, the data would become a logical object - no adding or removal of machines would affect the data. This is an ideal that would remove a tremendous maintenance burden from many sites well, at least the one's I have worked at as far as I can see. Two things:

Re: [HACKERS] Hadoop backend?

2009-02-23 Thread Paul Sheer
It would only be possible to have the actual PostgreSQL backends running on a single node anyway, because they use shared memory to This is not problem: Performance is a secondary consideration (at least as far as the problem I was referring to). The primary usefulness is to have the data be a

Re: [HACKERS] Hadoop backend?

2009-02-23 Thread Robert Haas
On Mon, Feb 23, 2009 at 9:08 AM, Paul Sheer paulsh...@gmail.com wrote: It would only be possible to have the actual PostgreSQL backends running on a single node anyway, because they use shared memory to This is not problem: Performance is a secondary consideration (at least as far as the

Re: [HACKERS] Hadoop backend?

2009-02-23 Thread Andrew Chernow
Paul Sheer wrote I have also found it's no use having RAID or ZFS. Each of these ties the data to an OS installation. If the OS needs to be reinstalled, all the data has to be manually moved in a way that is, well... dangerous. How about network storage, fiber attach? If you move the db you

Re: [HACKERS] Hadoop backend?

2009-02-23 Thread Markus Wanner
Hi, Paul Sheer wrote: This is not problem: Performance is a secondary consideration (at least as far as the problem I was referring to). Well, if you don't mind your database running .. ehm.. creeping several orders of magnitudes slower, you might also be interested in Single-System Image

Re: [HACKERS] Hadoop backend?

2009-02-23 Thread Jonah H. Harris
On Sun, Feb 22, 2009 at 3:47 PM, Robert Haas robertmh...@gmail.com wrote: In theory, I think you could make postgres work on any type of underlying storage you like by writing a second smgr implementation that would exist alongside md.c. The fly in the ointment is that you'd need a more

Re: [HACKERS] Hadoop backend?

2009-02-23 Thread Tom Lane
Jonah H. Harris jonah.har...@gmail.com writes: I believe there is more than that which would need to be done nowadays. I seem to recall that the storage manager abstraction has slowly been dedicated/optimized for md over the past 6 years or so. As far as I can tell, the PG storage manager API

Re: [HACKERS] Hadoop backend?

2009-02-23 Thread pi song
| I believe there is more than that which would need to be done nowadays. I seem to recall that the storage manager| | abstraction has slowly been dedicated/optimized for md over the past 6 years or so. It may even be easier/preferred | to write a hadoop specific access method

Re: [HACKERS] Hadoop backend?

2009-02-22 Thread Hans-Jürgen Schönig
hi ... i think the easiest way to do this is to simply add a mechanism to functions which allows a function to stream data through. it would basically mean losing join support as you cannot read data again in a way which is good enough good enough for joining with the function providing

Re: [HACKERS] Hadoop backend?

2009-02-22 Thread Robert Haas
On Sat, Feb 21, 2009 at 9:37 PM, pi song pi.so...@gmail.com wrote: 1) Hadoop file system is very optimized for mostly read operation 2) As of a few months ago, hdfs doesn't support file appending. There might be a bit of impedance to make them go together. However, I think it should a very

Re: [HACKERS] Hadoop backend?

2009-02-22 Thread pi song
One more problem is that data placement on HDFS is inherent, meaning you have no explicit control. Thus, you cannot place two sets of data which are likely to be joined together on the same node = uncontrollable latency during query processing. Pi Song On Mon, Feb 23, 2009 at 7:47 AM, Robert Haas

Re: [HACKERS] Hadoop backend?

2009-02-22 Thread Robert Haas
On Sun, Feb 22, 2009 at 5:18 PM, pi song pi.so...@gmail.com wrote: One more problem is that data placement on HDFS is inherent, meaning you have no explicit control. Thus, you cannot place two sets of data which are likely to be joined together on the same node = uncontrollable latency during

Re: [HACKERS] Hadoop backend?

2009-02-22 Thread pi song
On Mon, Feb 23, 2009 at 3:56 PM, pi song pi.so...@gmail.com wrote: I think the point that you can access more system cache is right but that doesn't mean it will be more efficient than accessing from your local disk. Take Hadoop for example, your request for file content will have to go to

[HACKERS] Hadoop backend?

2009-02-21 Thread Paul Sheer
Hadoop backend for PostGreSQL A problem that my client has, and one that I come across often, is that a database seems to always be associated with a particular physical machine, a physical machine that has to be upgraded, replaced, or otherwise maintained. Even if the database is

Re: [HACKERS] Hadoop backend?

2009-02-21 Thread pi song
1) Hadoop file system is very optimized for mostly read operation2) As of a few months ago, hdfs doesn't support file appending. There might be a bit of impedance to make them go together. However, I think it should a very good initiative to come up with ideas to be able to run postgres on