[HACKERS] [feature] cached index to speed up specific queries on extremely large data sets

2014-04-11 Thread lkcl .
hi folks, please cc me direct on responses as i am subscribed on digest.

i've been asked to look at how to deal with around 7 billion records
(appx 30 columns, appx data size total 1k) and this might have to be
in a single system (i will need to Have Words with the client about
that).  the data is read-only and an arbitrary number of additional
tables may be created to manage the data.  records come in at a rate
of around 25 million per day, the 7 billion records is based on the
assumption of keeping one month's worth of data around.

analysis of this data needs to be done across the entire set: i.e. it
may not be subdivided into isolated tables (by day for example).  i am
therefore um rather concerned about efficiency, even just from the
perspective of using 2nd normalised form and not doing JOINs against
other tables.

so i had an idea.  there already exists the concept of indexes.  there
already exists the concept of cached queries.  question: would it be
practical to *merge* those two concepts such that specific queries
could be *updated* as new records are added, such that when the query
is called again it answers basically pretty much immediately? let us
assume that performance degradation on update (given that indexes
already exist and are required to be updated) is acceptable.

the only practical way (without digging into postgresql's c code) to
do this at a higher level would be in effect to abandon the advantages
of the postgresql query optimisation engine and *reimplement* it in a
high-level language, subdividing the data into smaller (more
manageable) tables, using yet more tables to store intermediate
results of a previous query then somehow managing to stitch together a
new response based on newer packets.  it would be a complete nightmare
to both implement and maintain.

second question then based on whether the first is practical: is there
anyone who would be willing (assuming it can be arranged) to engage in
a contract to implement the required functionality?

thanks,

l.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [feature] cached index to speed up specific queries on extremely large data sets

2014-04-11 Thread Heikki Linnakangas

On 04/11/2014 03:20 PM, lkcl . wrote:

so i had an idea.  there already exists the concept of indexes.  there
already exists the concept of cached queries.  question: would it be
practical to*merge*  those two concepts such that specific queries
could be*updated*  as new records are added, such that when the query
is called again it answers basically pretty much immediately? let us
assume that performance degradation on update (given that indexes
already exist and are required to be updated) is acceptable.


I think you just described materialized views. The built-in materialized 
views in PostgreSQL are not updated immediately as the tables are 
modified, but it's entirely possible to roll your own using views and 
triggers. There are a few links on the PostgreSQL wiki, in the Versions 
before 9.3 section: https://wiki.postgresql.org/wiki/Materialized_Views.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [feature] cached index to speed up specific queries on extremely large data sets

2014-04-11 Thread lkcl .
On Fri, Apr 11, 2014 at 1:26 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 04/11/2014 03:20 PM, lkcl . wrote:

 so i had an idea.  there already exists the concept of indexes.  there
 already exists the concept of cached queries.  question: would it be
 practical to*merge*  those two concepts such that specific queries
 could be*updated*  as new records are added, such that when the query

 is called again it answers basically pretty much immediately? let us
 assume that performance degradation on update (given that indexes
 already exist and are required to be updated) is acceptable.


 I think you just described materialized views.

 ... well... dang :)

 http://tech.jonathangardner.net/wiki/PostgreSQL/Materialized_Views

 ok so definitely not the snapshot materialised views, but yes!  the
eager materialised views, definitely.

 The built-in materialized
 views in PostgreSQL are not updated immediately as the tables are modified,

 ... but that would probably be enough.

 but it's entirely possible to roll your own using views and triggers. There
 are a few links on the PostgreSQL wiki, in the Versions before 9.3
 section: https://wiki.postgresql.org/wiki/Materialized_Views.

 awesome.  uhhh, well that was easy *lol*.  once i am paid, whom do i
send the payment to for the fast response and incredibly valuable
information? :)  [this is a serious question!]

l.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [feature] cached index to speed up specific queries on extremely large data sets

2014-04-11 Thread Michael Paquier
On Fri, Apr 11, 2014 at 9:53 PM, lkcl . luke.leigh...@gmail.com wrote:
 On Fri, Apr 11, 2014 at 1:26 PM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 On 04/11/2014 03:20 PM, lkcl . wrote:

 so i had an idea.  there already exists the concept of indexes.  there
 already exists the concept of cached queries.  question: would it be
 practical to*merge*  those two concepts such that specific queries
 could be*updated*  as new records are added, such that when the query

 is called again it answers basically pretty much immediately? let us
 assume that performance degradation on update (given that indexes
 already exist and are required to be updated) is acceptable.


 I think you just described materialized views.

  ... well... dang :)

  http://tech.jonathangardner.net/wiki/PostgreSQL/Materialized_Views

  ok so definitely not the snapshot materialised views, but yes!  the
 eager materialised views, definitely.

 The built-in materialized
 views in PostgreSQL are not updated immediately as the tables are modified,

  ... but that would probably be enough.

 but it's entirely possible to roll your own using views and triggers. There
 are a few links on the PostgreSQL wiki, in the Versions before 9.3
 section: https://wiki.postgresql.org/wiki/Materialized_Views.
When updating a materialized view, or refreshing it, you need as well
to be aware that an exclusive lock is taken on it during the refresh
in 9.3, so the materialized view cannot be accessed for read queries.

  awesome.  uhhh, well that was easy *lol*.  once i am paid, whom do i
 send the payment to for the fast response and incredibly valuable
 information? :)  [this is a serious question!]
This can be helpful:
http://www.postgresql.org/about/donate/
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [feature] cached index to speed up specific queries on extremely large data sets

2014-04-11 Thread lkcl .
On Fri, Apr 11, 2014 at 2:12 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Fri, Apr 11, 2014 at 9:53 PM, lkcl . luke.leigh...@gmail.com wrote:
 section: https://wiki.postgresql.org/wiki/Materialized_Views.
 When updating a materialized view, or refreshing it, you need as well
 to be aware that an exclusive lock is taken on it during the refresh
 in 9.3, so the materialized view cannot be accessed for read queries.

 ok.  as long as the storage of data (in the underlying table) is not
adversely affected, that would be fine.  as this is the hackers list
and this turns out to be more a user question i'll leave it for now.

  awesome.  uhhh, well that was easy *lol*.  once i am paid, whom do i
 send the payment to for the fast response and incredibly valuable
 information? :)  [this is a serious question!]
 This can be helpful:
 http://www.postgresql.org/about/donate/

 thank you michael.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers