Re: [HACKERS] Opportunity for a Radical Changes in Database Software

2007-10-31 Thread J. Andrew Rogers


On Oct 28, 2007, at 2:54 PM, Josh Berkus wrote:
I'd actually be curious what incremental changes you could see  
making to

PostgreSQL for better in-memory operation.  Ideas?



It would be difficult to make PostgreSQL really competitive for in- 
memory operation, primarily because a contrary assumption pervades  
the entire design.  You would need to rip out a lot of the guts of  
it.  I was not even intending to suggest that it would be a good idea  
or trivial to adapt PostgreSQL to in-memory operation, but since I am  
at least somewhat familiar with the research I thought I'd offer a  
useful link that detailed the kinds of considerations involved. That  
said, I have seriously considered the idea since I have a major  
project that requires that kind of capability and there is some  
utility in using parts of PostgreSQL if possible, particularly since  
it was used to prototype it.  In my specific case I also need to  
shoehorn a new type of access method into it as well that there is no  
conceptual support for, so it will probably be easier to build a  
(mostly) new database engine altogether.


Personally, if I was designing a distributed in-memory database, I  
would use a somewhat more conservative set of assumptions than  
Stonebraker so that it would have a more general applicability.  For  
example, his assumption of extremely short CPU times for a  
transaction (1 millisecond) are not even valid for some types of  
OLTP loads, never mind the numerous uses that are not strictly OLTP- 
like but which nonetheless are built on relatively short  
transactions; in the Stonebraker design this much latency would be a  
pathology.  Unfortunately, if you remove that assumption, the design  
starts to unravel noticeably.  Nonetheless, there are other viable  
design paths that while not over-fitted to OLTP still could offer  
large gains.


I think the market is right for a well-designed distributed, in- 
memory database, but I think one would be starting with an  
architecture inferior for the purpose that would be hard to get away  
from if we made incremental changes to a solid disk-based engine.  It  
seems short-term expedient but long-term bad engineering -- think MySQL.



Cheers,

J. Andrew Rogers


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Opportunity for a Radical Changes in Database Software

2007-10-27 Thread J. Andrew Rogers


On Oct 27, 2007, at 2:20 PM, Florian Weimer wrote:

* J. Andrew Rogers:


Everything you are looking for is here:

http://web.mit.edu/dna/www/vldb07hstore.pdf

It is the latest Stonebraker et al on massively distributed in-memory
OLTP architectures.


Ruby-on-Rails compiles into standard JDBC, but hides all the  
complexity

of that interface. Hence, H-Store plans to move from C++ to
Ruby-on-Rails as our stored procedure language.  This reads a bit
strange.



Yeah, that's a bit of a WTF?.  Okay, a giant WTF?.  I could see  
using Ruby as a stored procedure language, but Ruby-on-Rails seems  
like an exercise in buzzword compliance.  And Ruby is just about the  
slowest language in its class, which based on the rest of the paper  
(serializing all transactions, doing all transactions strictly in- 
memory) means that you would be bottlenecking your database node on  
the procedural language rather than the usual I/O considerations.


Most of the architectural stuff made a considerable amount of sense,  
though I had quibbles with bits of it (I think the long history of  
the design makes some decisions look silly in a world that is now  
multi-core by default).  The Ruby-on-Rails part is obviously  
fungible.  Nonetheless, it is a good starting point for massively  
distributed in-memory OLTP architectures and makes a good analysis of  
many aspects of database design from that perspective, or at least I  
have not really seen anything better.  I prefer a slightly more  
conservative approach that generalizes better in that space than what  
is suggested personally.


Cheers,

J. Andrew Rogers


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Opportunity for a Radical Changes in Database Software

2007-10-25 Thread J. Andrew Rogers


On Oct 25, 2007, at 8:05 AM, Dan wrote:
In looking at current developments in computers, it seems we're  
nearing

a point where a fundamental change may be possible in databases...
Namely in-memory databases which could lead to huge performance
improvements.
...
The sites that use it see incredible performance increases, but  
often at

the cost of not being able to provide versioned results that are
guaranteed to be accurate.

The big questions are then, how would you create a distributed in- 
memory

database?



Everything you are looking for is here:

http://web.mit.edu/dna/www/vldb07hstore.pdf

It is the latest Stonebraker et al on massively distributed in-memory  
OLTP architectures.



J. Andrew Rogers


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] COMMIT NOWAIT Performance Option

2007-02-28 Thread J. Andrew Rogers


On Feb 28, 2007, at 4:40 PM, Jonah H. Harris wrote:

Oracle, Microsoft, IBM, Sybase, Teradata, MySQL, and Firebird have a
clever feature called page checksumming which I think we should copy
because it's simple and effective at detecting page-level corruption
due to torn pages and/or faulty storage hardware.



Any system that does enough I/O will occasionally generate corrupted  
pages in the complete absence of any other detectable fault or  
hardware failure.  A fair amount has been written about it with  
respect to large-scale computing and it happens often enough when  
systems start getting large that just about everyone implements  
software I/O checksumming eventually to deal with the problem.  I  
simply assumed that PostgreSQL was doing the same since it is  
definitely used for systems that are large enough that this becomes a  
statistically significant issue.


A popular alternative to CRC32 for this purpose is the significantly  
cheaper and almost as effective is the Adler32 algorithm.  I know  
Google used this algorithm when they added checksumming to their  
database to tame inexplicable transient corruption.


Cheers,

J. Andrew Rogers


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Acclerating INSERT/UPDATE using UPS

2007-02-10 Thread J. Andrew Rogers


On Feb 10, 2007, at 9:33 PM, Christopher Browne wrote:

The fundamental trouble with this mechanism is that a power outage can
instantly turn a database into crud.

One may try to mitigate that problem by supporting the memory device
with multiple power supplies *and* multiple UPSes.



Ask me about the time a year ago that I had a 24x7 database, with two  
power supplies connected to two UPSes on independent mains circuits,  
dropped dead because one UPS was overloaded (more than one server  
connected to it, apparently too much) and the other UPS was simply  
dead (undetected zombie UPS), when a catastrophic power failure  
killed both of the generator backed mains circuits.


I wasn't pleased, but it happened nonetheless.  A UPS is not a 100%  
guarantee of anything.  They fail more often than they should.  No  
amount of paranoia guarantees uptime.


That said, I see plenty of use for loosening restrictions on  
databases where the contents do not matter and a little loss is  
acceptable.


Cheers,

J. Andrew Rogers
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Proposal: Commit timestamp

2007-02-09 Thread J. Andrew Rogers

On Feb 8, 2007, at 8:41 PM, Richard Troy wrote:
It'd be great if Jan considers the blending of replication; any  
given DB
instance shouldn't be only a master/originator or only a slave/ 
subscriber.
A solution that lets you blend replication strategies in a single  
db is,

from my point of view, very important.



It might be constructive to define what a minimal complete set of  
replication primitives actually is in addition to which ones should  
be implemented.  In addition to master/slave models, you have Paxos  
algorithms and dynamic reconfiguration models in literature that can  
utilize many of the same primitives but which are very different in  
implementation.  I see the value of Jan's proposal, but perhaps it  
would be better to step back and make some assertions about the  
nature of the core capabilities that will be supported in some  
broader picture.  Having a theoretically (mostly) complete set of  
usable primitives would be an incredibly powerful feature set.


Cheers,

J. Andrew Rogers
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] pgsql: Fix for plpython functions; return true/false for boolean,

2007-01-30 Thread J. Andrew Rogers


On Jan 30, 2007, at 2:43 AM, Guido Goldstein wrote:

Is it possible to tell me which python versions you want to
support?

Just as a hint: 2.5 is the current stable version.



I support a lot of python on several platforms.  For broad  
compatibility with pre-installed Python versions on recent OS  
versions, Python 2.3 support is essentially mandatory and there are  
few good reasons to not support it.  I occasionally see Python 2.2 on  
really old systems by default, but it takes significantly more effort  
to support versions that old; the solution in my case is to upgrade  
Python to 2.3 or 2.4.


Python 2.5 may be the current stable version, but vanilla source  
builds segfault on some Python code that runs fine in 2.3 and 2.4,  
strongly suggesting that it is not mature enough that I would put it  
anywhere near anything important (like a database).



J. Andrew Rogers
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] PostgreSQL Data Loss

2007-01-26 Thread J. Andrew Rogers


On Jan 26, 2007, at 2:22 AM, BluDes wrote:

 I have a problem with one of my costomers.
I made a program that uses a PostgreSQL (win32) database to save  
its data.
My customer claims that he lost lots of data reguarding his own  
clients and that those data had surely been saved on the database.
My first guess is that he is the one who deleted the data but wants  
to blame someone else, obviously I can't prove it.


Could it be possible for PostgreSQL to lose its data? Maybe with a  
file corruption? Could it be possible to restore these data?


My program does not modify or delete data since its more like a log  
that only adds information. It is obviously possible to delete  
these logs but it requires to answer yes to 2 different warnings,  
so the data can't be deleted accidentally.


I have other customers with even 10 times the amount of data of the  
one who claimed the loss but no problems with them.
He obviously made no backups (and claims whe never told him to do  
them so we are responsible even for this) though the program has a  
dedicated Backup-section.



I have seen this data loss pattern many, many times, and on Oracle  
too.  The most frequent culprits in my experience:


1.)  The customer screwed up big time and does not want to admit that  
they made a mistake, hoping you can somehow pull their butt out of  
the fire for free.


2.)  Someone else sabotaged or messed up the customers database, and  
the customer is not aware of it.


3.)  The customer deleted their own data and is oblivious to the fact  
that they are responsible.


4.)  There is some rare edge case in your application that generates  
SQL that deletes all the data.



There is always the possibility that there is in fact some data loss  
due to a failure of the database, but it is a rare kind of corruption  
that deletes a person's data but leaves everything else intact with  
no error messages, warnings, or other indications that something is  
not right.  Given the description of the problem, I find an internal  
failure of the database to be a low probability reason for the data  
loss.



Having run many database systems that had various levels of pervasive  
internal change auditing/versioning, often unbeknownst to the casual  
user, virtually all of the several data loss cases I've seen with a  
description like the above clearly fit in the cases #1-3 above when  
we went into the audit logs i.e. someone explicitly did the  
deleting.  I cannot tell you how many times people have tried to  
pretend that the database lost or messed up their data and then  
been embarrassed when they discover that I can step through every  
single action they took to destroy their own data.  I've never seen a  
single case like the one described above that was due to an internal  
database failure; when there is an internal database failure, it is  
usually ugly and obvious.


Cheers,

J. Andrew Rogers
[EMAIL PROTECTED]




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Interface of the R-tree in order to work with postgresql

2006-10-09 Thread J. Andrew Rogers


On Oct 9, 2006, at 7:21 AM, Martijn van Oosterhout wrote:

On Mon, Oct 09, 2006 at 09:15:58AM -0500, jorge alberto wrote:
I want to know how can I add a new spatial access method into the  
postgresql
(I'm doing research on spatial access methods( reading a lot of  
papers and
programming a lot too ) but also I want to know how can I add my  
new data
structure( if someday i get it, of course =) ) in the postgresql,  
I mean
where can i find the .h that describes the interface that a  
spatial access

method, like the R-tree, must have in order to work with postgresql.


I would start by reading the documentation, specifically the part  
about

GiST. Once you have understood the concept and code there, go to
reading the rtree code in the backend. That should answer almost all
your questions.



Index Access Methods (Ch. 48) for a more general interface may also  
be useful, since he implies that R-Tree is not the only spatial  
access method he is interested in.  While some spatial access  
methods, such as R-Tree, are well-suited for a GiST implementation,  
it is not a particularly appropriate access method (assumption  
mismatch) for some other interesting spatial index types that a  
researcher might be interested in implementing.



Cheers,

J. Andrew Rogers (who is also implementing new spatial indexes...)
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] Fixed length datatypes.

2006-06-29 Thread J. Andrew Rogers


On Jun 28, 2006, at 10:14 AM, [EMAIL PROTECTED] wrote:

All the geometric types that I'll never use in core,
with few or no uses, including functions to operate on these types,
and no UUID type... Hehe... To me, that's irony... :-)



Interestingly, the superior geometry capability is driving a lot of  
recent migration from MySQL to PostgreSQL in my own experience,  
especially with PostGIS.  The geometry parts may not get as much love  
as other parts, but they still get to leverage the very solid  
foundation they are built on top of.  The geometry capability of  
MySQL is basically checklist in nature, as it lacks the more  
sophisticated indexing and query execution that is really required to  
get passable performance from queries with geometry in them.  MySQL  
has similar geometry capability to PostgreSQL in theory if you don't  
look too closely, but in practice the engine is not up to the more  
rigorous demands of that kind of work.


With the nascent rise of the geospatial web, it is going to become a  
lot more important than it has been.



J. Andrew Rogers
[EMAIL PROTECTED]





---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Table clustering idea

2006-06-27 Thread J. Andrew Rogers


On Jun 27, 2006, at 9:39 AM, Jim C. Nasby wrote:

I think one of the issues might have been: how will you handle other
indexes on the table when you can no longer point them at an item  
(since

items will need to move to maintain an IOT).



There are clean ways to handle this.  The table is organized on the  
primary key, a typical requirement for IOTs.  Any indexes you add to  
IOT reference the primary key of the heap tuple.  Since the heap and  
PK index are the same thing, external indexes use the PK as the tuple  
identifier.


The only caveat is that this creates performance asymmetries.  IOTs  
have significantly faster access through their primary keys but  
slower external index access since two B-Trees have to be traversed.   
An IOT is typically only used for tables that are only accessed  
through their primary key.  Not supporting external indexes on IOTs  
is a functional implementation (and probably recommended in  
practice), though most real implementations allow external indexes if  
not always in their first version.



J. Andrew Rogers
[EMAIL PROTECTED]


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] New project launched : PostgreSQL GUI Installer for

2006-01-30 Thread J. Andrew Rogers

On Jan 30, 2006, at 7:52 PM, Joshua D. Drake wrote:
There is a reason that even Oracle has a graphical installer on  
Linux, because most people installing

the software:

A. Don't know how to use it
B. Probably don't know how to use Linux
C. Don't want to.



Except that the Oracle graphical installer usually requires a non- 
trivial amount of command line kung-fu that alone is more complex  
than the entirety of the command line installation of PostgreSQL.   
Oracle installation is an unpleasant and painful process even under  
the best of circumstances, and I've never had one that required less  
effort than Postgres for a vanilla install.  And I always install  
postgres from source.  If ./configure; make; make install scares  
away people, sorting out the dependency hell getting the Oracle  
installer to even run on nominally supported platforms will  
definitely scare them away.


A graphical installer for Unix is fine, but please, do not make it  
anything like Oracle's graphical installer.  Oracle's graphical  
install process gives command line installs a good name for ease of use.



J. Andrew Rogers


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Replication on the backend

2005-12-07 Thread J. Andrew Rogers


On Dec 6, 2005, at 9:09 PM, Gregory Maxwell wrote:

Eh, why would light limited delay be any slower than a disk on FC the
same distance away? :)

In any case, performance of PG on iscsi is just fine. You can't blame
the network... Doing multimaster replication is hard because the
locking primitives that are fine on a simple multiprocessor system
(with a VERY high bandwidth very low latency interconnect between
processors) just don't work across a network, so you're left finding
other methods and making them work...



Speed of light latency shows up pretty damn often in real networks,  
even relatively local ones.  The number of people that wonder why a  
transcontinental SLA of 10ms is not possible is astonishing.  The  
silicon fabrics are sufficiently fast that most well-designed  
networks are limited by how fast one can push photons through a  
fiber, which is significantly slower than photons through a vacuum.   
Silicon switch fabrics add latency measured in nanoseconds, which is  
effectively zero for many networks that leave the system board.


Compared to single system simple SMP, a local cluster built on a  
first-rate fabric will have about an order of magnitude higher  
latency but very similar bandwidth.  On the other hand, at those  
latencies you can increase the number of addressable processors with  
that kind of bandwidth by an order of magnitude, so it is a bit of a  
trade.  However, latency matters a lot such that one would have to be  
a lot smarter about partitioning synchronization across that fabric  
even though one would lose nothing in the bandwidth department.




But again, multimaster isn't hard because there of some inherently
slow property of networks.



Eh?  As far as I know, the difficulty of multi-master is almost  
entirely a product of the latency of real networks such that they are  
too slow for scalable distributed locks.  SMP is little more than a  
distributed lock manager implemented in silicon.  Therefore, multi- 
master is hard in practice because we cannot drive networks fast  
enough.  That said, current state-of-the-art network fabrics are  
within an order of magnitude of SMP fabrics such that they could be  
real contenders, particularly once you get north of 8-16 processors.


The really sweet potential is in Opteron system boards with  
Infiniband directly attached to HyperTransport.  At that level of  
bandwidth and latency, both per node and per switch fabric, the  
architecture possibilities start to become intriguing.



J. Andrew Rogers



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Replication on the backend

2005-12-07 Thread J. Andrew Rogers


On Dec 6, 2005, at 11:42 PM, Markus Schiltknecht wrote:
Does anybody have latency / roundtrip measurements for current  
hardware?

I'm interested in:
1Gb Ethernet,
10 Gb Ethernet,
InfiniBand,
probably even p2p usb2 or firewire links?



In another secret life, I know a bit about supercomputing fabrics.   
The latency metrics have to be thoroughly qualified.


First, most of the RTT latency numbers for network fabrics are for 0  
byte packet sizes, which really does not apply to anyone shuffling  
real data around.  For small packets, high-performance fabrics (HTX  
Infiniband, Quadrics, etc) have approximately an order of magnitude  
less latency than vanilla Ethernet, though the performance specifics  
depend greatly on the actual usage.  For large packet sizes, the  
differences in latency become far less obvious.  However, for real  
packets a performant fabric will still look very good compared to  
disk systems.  Switched fiber fabrics have enough relatively  
inexpensive throughput now to saturate most disk systems and CPU I/O  
busses; only platforms like HyperTransport can really keep up.  It is  
worth pointing out that the latency of high-end network fabrics is  
similar to large NUMA fabrics, which exposes some of the limits of  
SMP scalability.  As a point of reference, an organization that knows  
what they are doing should have no problem getting 500 microsecond  
RTT on a vanilla metropolitan area GigE fiber network -- a few  
network operators actually do deliver this on a regional scale.  For  
a local cluster, a competent design can best this by orders of  
magnitude.


There are a number of silicon limitations, but a system that connects  
the fabric directly to HyperTransport can drive several GB/s with  
very respectable microsecond latencies if the rest of the system is  
up to it.  There are Opteron system boards now that will drive  
Infiniband directly from HyperTransport.  I know Arima/Rioworks makes  
some (great server boards generally), and several other companies are  
either making them or have announced them in the pipeline.  These  
Opteron boards get pretty damn close to Big Iron SMP fabric  
performance in a cheap package.  Given how many companies have  
announced plans to produce Opteron server boards with Infiniband  
fabrics directly integrated into HyperTransport, I would say that  
this is the future of server boards.


And if postgres could actually use an infiniband fabric for  
clustering a single database instance across Opteron servers, that  
would be very impressive...


J. Andrew Rogers



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Reducing the overhead of NUMERIC data

2005-11-01 Thread J. Andrew Rogers
On 11/1/05 2:38 PM, Jim C. Nasby [EMAIL PROTECTED] wrote:
 
 FWIW, most databases I've used limit NUMERIC to 38 digits, presumably to
 fit length info into 1 or 2 bytes. So there's something to be said for a
 small numeric type that has less overhead and a large numeric (what we
 have today).


The 38 digit limit is the decimal size of a 128-bit signed integer.  The
optimization has less to do with the size of the length info and more to do
with fast math and fixed structure size.


J. Andrew Rogers



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Alternative variable length structure

2005-09-08 Thread J. Andrew Rogers
On 9/8/05 9:53 AM, Josh Berkus josh@agliodbs.com wrote:
 
 Hmmm.  Seems like these would be different data types from the standard ones
 we deal with.  I can see the value for data warehousing, for example.
 
 Wouldn't this require creating, for example, a SHORTTEXT type?  Or were you
 planning this to handle VARCHAR(6) and the like?   If so, how would we deal
 with users who change the length via ALTER TABLE?


If I am reading the patch correctly (and I may not be), changing the length
via ALTER TABLE would not affect anything because the header length is
variable per type instance in the table.  This appears to be an alternate
encoding for VARCHAR(n) and similar.

This is not dissimilar to how ASN.1 BER prefix encodes type lengths, an
encoding widely used in wire protocols.  This can save quite a few bytes in
many cases, particularly when storing short byte strings (for ASN.1, even
integers and similar are encoded this way to shave bytes on them as well but
that seems excessive).  It would be useful to see how encoding/decoding cost
balances out the more compact representation in terms of performance.


J. Andrew Rogers



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Call for 7.5 feature completion

2005-08-25 Thread J. Andrew Rogers
On 8/25/05 4:13 PM, Alvaro Herrera [EMAIL PROTECTED] wrote:
 We have gone a long way now, even though it was only a year ago.  My
 question for everyone on this list is:  What are the few remaining big
 features that you see missing for PostgreSQL?
 
 Or, slightly different, what are people's most wanted features?


Table partitioning would be huge, but it looks like that is on its way.

Index-organized tables are the most significant thing at the top of my wish
list, as that would generate a huge performance increase for much of what I
spend my time on with PostgreSQL.  There is currently no good way to
approximate it.


J. Andrew Rogers
[EMAIL PROTECTED]



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] US Patents vs Non-US software ...

2005-01-18 Thread J. Andrew Rogers
On Tue, 18 Jan 2005 09:22:58 +0200
Many countries do not grant software patents so it is not 
likely
that IBM applied through PCT since a refusal in one 
country may
cause to patent to be refused in all countries.

Contrary to popular misconception, virtually all countries 
grant software patents.  The problem is that people have 
applied the term software patent to USPTO-specific 
lameness like one-click shopping, which really is 
outside the scope of traditional software patents.  While 
most countries do not grant patents for this flavor of 
frivolousness, they do grant hard-theory algorithm design 
patents across virtually all types of machinery (including 
virtual machinery).

Traditional software design patents are structurally and 
functionally indistinguishable from chemical process 
patents, which are generally recognized as valid in most 
countries.  Software patents have to have novelty that 
survives reduction to general process design (and the ARC 
algorithm looks like it qualifies) if you want most 
countries to grant it.  The problem with USPTO and 
so-called software patents is that they allow people to 
patent what is essentially prior art with re-named 
variables.  Chemical process patents are a good analogy 
because literally every argument used against software 
patents could be used against chemical process patents, 
which no one apparently finds controversial.  What often 
passes for material novel-ness in software processes 
with the USPTO would never fly for chemical processes with 
the same USPTO.  If someone invents a better pipe alloy 
for carrying chemical fluids, you cannot re-patent all 
chemical processes with the novelty being that you use a 
better type of pipe -- that change is not material to the 
chemical process, even if it improves the economics of it 
in some fashion.  The only thing patentable would be the 
superior alloy design in the abstract.

Most of the lame software patents are lame because 
reduction to machine process design gives you something 
that is decidedly non-novel.  In other words, the 
novel-ness is the semantic dressing-up of a non-novel 
engineering process.

cheers,
j. andrew rogers
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [HACKERS] PostgreSQL as an application server

2004-08-06 Thread J. Andrew Rogers
On Thu, 2004-08-05 at 22:27, Jonathan M. Gardner wrote:
 A few nights ago, I implemented some of my application logic in PostgreSQL 
 via PL/PythonU. I was simply amazed at what I was able to do. My question 
 becomes: Why not get rid of the middle layer and move it into the databse 
 entirely?


In the mid-90s, I was a lead on just such an app using Oracle PL/SQL. 
IIRC, it was about a half million lines of code.  At that time, Oracle
hadn't seen anything quite like it and was pretty interested in what
we'd done.  I'll add that if we knew what it would be like when we
started, we probably would have done it somewhat differently than we
did.

The advantage is that you have very tight integration with the database
and there are few or no external dependencies to worry about.  For
database heavy applications that require some level of portability, this
isn't a bad way to do development.

The major disadvantage is that the development environment and tools for
in-database languages aren't nearly as rich as your typical standalone
environment, which makes programming a pain in the ass for many types of
codes.  I might have missed something in the intervening years, but I
don't think anyone has really bridged that gap.  The database guys
generally don't like running application code in their database, mostly
because it creates new failure modes and problems that they have to
manage.  For example, at least in older versions of Oracle, if you
accidentally programmed an infinite loop or some other busy
non-functioning state, it took the DBA to kill it.  In the course of
application development, this could happen many, many times as code was
being debugged, much to the annoyance of the DBAs.

That said, I don't see any obvious reason why it couldn't be done well
with a moderate amount of effort.


j. andrew rogers



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] PostgreSQL as an application server

2004-08-06 Thread J. Andrew Rogers
On Fri, 2004-08-06 at 10:53, Joshua D. Drake wrote:
 Although the gap still exists within the environment itself, one 
 significant advantage with PostgreSQL is you can use a more native (to 
 the programmer anyway) language to generate your logic.
 
 With PostgreSQL alone you can use plPerl, plPython and plPHP. The 
 language itself hasn't change in it's implementation of the pL. You just
 have to remember to make all ' a '' :) (at least for the most part).


One of the things I very much like about PostgreSQL is that it feels
like more of a programmer's RDBMS than Oracle.  As in the needs and
preferences of programmers were obviously given a higher priority in the
design of PostgreSQL.  I find this to be a very attractive feature and a
good thing.

This is a case of where a focus on the needs and preferences of hackers
in the development of the software by hackers has worked out pretty
well, at least for me.  People say that is a bad thing about a lot of
OSS, but I actually think it was needed in RDBMS software.


j. andrew rogers


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] LinuxTag wrapup

2004-07-06 Thread J. Andrew Rogers
On Sun, 2004-07-04 at 19:57, Tom Lane wrote:
 Anyone who needs this has always been able to make it trivially
 (though you once had to invent a random column name for the one
 required column).


In Oracle, DUAL is treated specially internally for performance reasons,
since it is so heavily used.  Making a table with the same name would
probably be a serviceable but under-performing migration mechanism.


 Does anyone have the foggiest idea why they named it DUAL?  Doesn't
 seem a very mnemonic choice to me...


There is no real authoritative answer to this, and it has long been a
mystery.  One semi-official version of the story is that it was
originally an internal table with two rows used for some operations. 
How that became a single row scratch pad table is a mystery, since even
the Oracle old-timers I know have no recollection of it ever being
anything but what it currently is.  Others claim it is a reference to
1x1 matrix operations.  There are a number of different stories that
people have heard -- I've heard three or four completely unrelated
explanations from long-time Oracle folks -- and most of them are
plausible.

It is one of those things we will probably never know.  Whatever its
historical purpose, DUAL has been so pervasively used in the Oracle
universe for so long that giving it a better name would break virtually
every Oracle application in existence.  It is an institution unto
itself.


j. andrew rogers



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] What can we learn from MySQL?

2004-04-23 Thread J. Andrew Rogers
 solvable
problems.  I think improved default product packaging would remove 80%
of the impediment to more widespread adoption.

There is no *technical* reason things should be done this way and it
might even go against the sensibilities of many current users.  But it
would go a long way toward widening the audience.


j. andrew rogers





---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[HACKERS] PITR for replication?

2004-04-01 Thread J. Andrew Rogers

I may be completely missing the point here, but it looks to me as though
the PITR archival mechanism is also most of a native replication
facility.  Is there anyone reason this couldn't be extended to
replication, and if so, is anyone planning on using it as such?

My memory is fuzzy on this point, but I seem to recall that this is
(was?) how replication is more-or-less done for many of the big
commercial RDBMS.


[EMAIL PROTECTED]


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings