subject:"Re\: \[GENERAL\] \[PERFORM\] Partioning tsearch2 a table into chunks and accessing via views"

Re: [GENERAL] [PERFORM] Partioning tsearch2 a table into chunks and accessing via views

2007-08-25 Thread Joshua D. Drake

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Benjamin Arai wrote:
 This kind of disappointing, I was hoping there was more that could be done.
 
 There has to be another way to do incremental indexing without loosing
 that much performance.

What makes you think you are loosing performance by using partitioning?

Joshua D. Drake

 
 Benjamin
 
 On Aug 24, 2007, at 6:29 PM, Joshua D. Drake wrote:
 
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Brandon Shalton wrote:
 Benjamin,



 In order to avoid the re-indexing I was thinking of instead creating
 a new
 table each month (building its indexes and etc) and accessing them all
 through a view. This way I only have to index the new data each month.


 Take a look at bizgres.org (based on postgres).

 They have a parent-child structure.

 The way i use it, is I have about 30M records a day that are inserted
 into the database.

 Each day is a child table to the parent.

 so example:

 the parent table is called  logfile

 each day, is a child, with the structure like  logfile_YYMMDD

 the child inherits the table structure of the parent, such that you
 could query the child table name directly, or you run the query against
 the parent (ie. logfile table) and get all the data.

 the indexes are done on a per table basis, so new data that comes in, is
 a lesser amount, and doesn't require re-indexing.
 
 
 PostgreSQL can do all of this too.
 
 Sincerely,
 
 Joshua D. Drake
 


 example:

 select * from logfile_070825 where datafield = 'foo'

 if i knew i wanted to specifically go into that child, or:

 select * from logfile where datafield = 'foo'

 and all child tables are searched and results merged.  You can perform
 any kind of sql query and field structures are you normally do.

 the downside is that the queries are run sequentially.

 so if you had 100 child tables, each table is queried via indexes, then
 results are merged.

 but, this approach does allow me to dump alot of data in, without having
 the re-indexing issues you are facing.

 at some point, you could roll up the days, in to weekly child tables,
 then monthly tables, etc.

 I believe Bizgres has a new version of their system that does parallel
 queries which would certainly speed things up.

 For your documents, you can do it by the day it was checked in, or maybe
 you have some other way of logically grouping, but the parent/child
 table structure really helped to solve my problem of adding in millions
 of records each day.

 The closest thing in mysql is using merge tables, which is not really
 practical when it comes time to do the joins to the tables.

 -brandon

 http://www.t3report.com - marketing intelligence for online marketing
 and affiliate programs










 ---(end of broadcast)---
 TIP 6: explain analyze is your friend

 
 
 - --
 
   === The PostgreSQL Company: Command Prompt, Inc. ===
 Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
 PostgreSQL solutions since 1997  http://www.commandprompt.com/
 UNIQUE NOT NULL
 Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
 PostgreSQL Replication: http://www.commandprompt.com/products/
 
 


- ---(end of broadcast)---
TIP 6: explain analyze is your friend



- --

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG0FZKATb/zqfZUUQRAsfRAJ4mjQP+1ltG7pqLFQ+Ru52LA5e7XACcDqKr
PIihth2x3gx3qTEI8WfWNjo=
=AhJx
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] [PERFORM] Partioning tsearch2 a table into chunks and accessing via views

2007-08-25 Thread Benjamin Arai


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

As stated in the previous email if I use partitioning then queries  
will be executed sequentially - i.e., instead of log(n) it would be  
(# partitions) * log(n).  Right?


Benjamin

On Aug 25, 2007, at 9:18 AM, Joshua D. Drake wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Benjamin Arai wrote:
This kind of disappointing, I was hoping there was more that could  
be done.


There has to be another way to do incremental indexing without  
loosing

that much performance.


What makes you think you are loosing performance by using  
partitioning?


Joshua D. Drake



Benjamin

On Aug 24, 2007, at 6:29 PM, Joshua D. Drake wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



Brandon Shalton wrote:

Benjamin,




In order to avoid the re-indexing I was thinking of instead  
creating

a new
table each month (building its indexes and etc) and accessing  
them all
through a view. This way I only have to index the new data each  
month.




Take a look at bizgres.org (based on postgres).

They have a parent-child structure.

The way i use it, is I have about 30M records a day that are  
inserted

into the database.

Each day is a child table to the parent.

so example:

the parent table is called  logfile

each day, is a child, with the structure like  logfile_YYMMDD

the child inherits the table structure of the parent, such  
that you
could query the child table name directly, or you run the query  
against

the parent (ie. logfile table) and get all the data.

the indexes are done on a per table basis, so new data that  
comes in, is

a lesser amount, and doesn't require re-indexing.




PostgreSQL can do all of this too.



Sincerely,



Joshua D. Drake





example:

select * from logfile_070825 where datafield = 'foo'

if i knew i wanted to specifically go into that child, or:

select * from logfile where datafield = 'foo'

and all child tables are searched and results merged.  You can  
perform

any kind of sql query and field structures are you normally do.

the downside is that the queries are run sequentially.

so if you had 100 child tables, each table is queried via  
indexes, then

results are merged.

but, this approach does allow me to dump alot of data in,  
without having

the re-indexing issues you are facing.

at some point, you could roll up the days, in to weekly child  
tables,

then monthly tables, etc.

I believe Bizgres has a new version of their system that does  
parallel

queries which would certainly speed things up.

For your documents, you can do it by the day it was checked in,  
or maybe

you have some other way of logically grouping, but the parent/child
table structure really helped to solve my problem of adding in  
millions

of records each day.

The closest thing in mysql is using merge tables, which is not  
really

practical when it comes time to do the joins to the tables.

-brandon

http://www.t3report.com - marketing intelligence for online  
marketing

and affiliate programs










---(end of  
broadcast)---

TIP 6: explain analyze is your friend





- --



  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/ 
donate

PostgreSQL Replication: http://www.commandprompt.com/products/







- ---(end of  
broadcast)---

TIP 6: explain analyze is your friend



- --

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/ 
donate

PostgreSQL Replication: http://www.commandprompt.com/products/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG0FZKATb/zqfZUUQRAsfRAJ4mjQP+1ltG7pqLFQ+Ru52LA5e7XACcDqKr
PIihth2x3gx3qTEI8WfWNjo=
=AhJx
-END PGP SIGNATURE-



-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (Darwin)

iQIVAwUBRtBxK/yqRf6YpodNAQKVkhAAgF4DaXeMxplX1EUXZMuw9aqr+75NxNcp
ZOCJPSFN0jwzY3MlFCRVjL1kzXmRJB4L3fE2xVQX9reY62TPfYC8m/xatey1X6nc
RdfNb9IzL6OyAghcpnUnwYntQtmGRpJtS7LQrx/SiDz8LWIp2S5v3Q9S8alKNTUS
FupCNy1bL3yJf9tySSvol6JSH2edVt8f48J1j03f5B9zh+G/rKrQ+muuKOHyU3mb
cVJ+gbSWCesuo+9rfaJ24m2ODwZm/YA+ENhlc3EOvD8z+cYn2OjuvAqvHABRsEKe
+E9NWBPK/7UT4/T4B/LcBW1B6VISFqyETkwe2fhY5kVZnF+f0KtQIxXh/9qMsnnh
tWthI9YmG4MIBmCsJwdneABHdfMJDp8IlawXqMlX4VkPHUrUtiQV/oDNsHMrU8BM
SZOK5m0ADgXk0rndkEWXhERsyuFaocFj+snvaJEVH9PJSDVgjo7EMW5Qfo6p3NFg
ujBurhLaSuj52vClbdOs3lYp0Drbuf9iQnot3pD4XsCKAOTQm3S7BvgKMd5FUHLX
HBFn4KiSRGx7hwlrss4rjqJ8BoJKbtvGxyNSiwZkrAOke+gqEML6pPdvlAj3Dif8

Re: [GENERAL] [PERFORM] Partioning tsearch2 a table into chunks and accessing via views

2007-08-25 Thread Brandon Shalton



As stated in the previous email if I use partitioning then queries  will 
be executed sequentially - i.e., instead of log(n) it would be  (# 
partitions) * log(n).  Right?




depends.. since indexes would be hit for each child table, the time for 
query is dependent on the amount of data that is indexed in each table.


the querying of the parent is still pretty quick given dual processor and a 
fast array filestorage device.


given your situation, i would give the parent/child approach a child.  I 
haven't checked in postgres if it is has it has Joshua had replied, but I do 
know bizgres does as i have been running this configuration for the last 3 
years and it solved my problem of importing 30-60M records in a day and 
still being able to query the database for data.


-brandon

http://www.t3report.com - marketing intelligence for online marketing and 
affiliate programs



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [GENERAL] [PERFORM] Partioning tsearch2 a table into chunks and accessing via views

2007-08-25 Thread Joshua D. Drake

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Benjamin Arai wrote:
 As stated in the previous email if I use partitioning then queries will
 be executed sequentially - i.e., instead of log(n) it would be (#
 partitions) * log(n).  Right?

The planner will consider every relevant partition during the execution.
Which may be a performance hit, it may not be. It depends on many
factors. In general however, partitioning when done correctly is a
performance benefit and a maintenance benefit.

Sincerely,

Joshua D. Drake


 
 Benjamin
 
 On Aug 25, 2007, at 9:18 AM, Joshua D. Drake wrote:
 
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Benjamin Arai wrote:
 This kind of disappointing, I was hoping there was more that could be
 done.

 There has to be another way to do incremental indexing without loosing
 that much performance.
 
 What makes you think you are loosing performance by using partitioning?
 
 Joshua D. Drake
 

 Benjamin

 On Aug 24, 2007, at 6:29 PM, Joshua D. Drake wrote:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Brandon Shalton wrote:
 Benjamin,



 In order to avoid the re-indexing I was thinking of instead creating
 a new
 table each month (building its indexes and etc) and accessing them
 all
 through a view. This way I only have to index the new data each
 month.


 Take a look at bizgres.org (based on postgres).

 They have a parent-child structure.

 The way i use it, is I have about 30M records a day that are inserted
 into the database.

 Each day is a child table to the parent.

 so example:

 the parent table is called  logfile

 each day, is a child, with the structure like  logfile_YYMMDD

 the child inherits the table structure of the parent, such that you
 could query the child table name directly, or you run the query
 against
 the parent (ie. logfile table) and get all the data.

 the indexes are done on a per table basis, so new data that comes
 in, is
 a lesser amount, and doesn't require re-indexing.


 PostgreSQL can do all of this too.

 Sincerely,

 Joshua D. Drake



 example:

 select * from logfile_070825 where datafield = 'foo'

 if i knew i wanted to specifically go into that child, or:

 select * from logfile where datafield = 'foo'

 and all child tables are searched and results merged.  You can perform
 any kind of sql query and field structures are you normally do.

 the downside is that the queries are run sequentially.

 so if you had 100 child tables, each table is queried via indexes,
 then
 results are merged.

 but, this approach does allow me to dump alot of data in, without
 having
 the re-indexing issues you are facing.

 at some point, you could roll up the days, in to weekly child tables,
 then monthly tables, etc.

 I believe Bizgres has a new version of their system that does parallel
 queries which would certainly speed things up.

 For your documents, you can do it by the day it was checked in, or
 maybe
 you have some other way of logically grouping, but the parent/child
 table structure really helped to solve my problem of adding in
 millions
 of records each day.

 The closest thing in mysql is using merge tables, which is not really
 practical when it comes time to do the joins to the tables.

 -brandon

 http://www.t3report.com - marketing intelligence for online marketing
 and affiliate programs










 ---(end of
 broadcast)---
 TIP 6: explain analyze is your friend



 - --

   === The PostgreSQL Company: Command Prompt, Inc. ===
 Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
 PostgreSQL solutions since 1997  http://www.commandprompt.com/
 UNIQUE NOT NULL
 Donate to the PostgreSQL Project:
 http://www.postgresql.org/about/donate
 PostgreSQL Replication: http://www.commandprompt.com/products/


 
 
 - ---(end of
 broadcast)---
 TIP 6: explain analyze is your friend
 
 
 
 - --
 
   === The PostgreSQL Company: Command Prompt, Inc. ===
 Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
 PostgreSQL solutions since 1997  http://www.commandprompt.com/
 UNIQUE NOT NULL
 Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
 PostgreSQL Replication: http://www.commandprompt.com/products/
 
 


- --

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG0HJ5ATb/zqfZUUQRAuEdAJwNwsr/XCsr85tElSVbRVMUHME+PACglbJK
gj5cZgOtgEEjUPph0jpsOcw=
=u7Ox
-END PGP SIGNATURE-

---(end of

Re: [GENERAL] [PERFORM] Partioning tsearch2 a table into chunks and accessing via views

2007-08-25 Thread Benjamin Arai


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Since I am using tsearch2 on the table I think there is going to be a  
significant performance hit - e.g., I partition by batch (batches are  
not separated by date, they are essentially random subsets of a much  
larger data-set).  I am querying this database using tsearch2, so I  
am assuming all tables are going to be queried each time since the  
text is not partition by any specific constraint - e.g., R goes to  
table 1 and =R goes to table 2.


Benjamin

On Aug 25, 2007, at 11:18 AM, Joshua D. Drake wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Benjamin Arai wrote:
As stated in the previous email if I use partitioning then queries  
will

be executed sequentially - i.e., instead of log(n) it would be (#
partitions) * log(n).  Right?


The planner will consider every relevant partition during the  
execution.

Which may be a performance hit, it may not be. It depends on many
factors. In general however, partitioning when done correctly is a
performance benefit and a maintenance benefit.

Sincerely,

Joshua D. Drake




Benjamin

On Aug 25, 2007, at 9:18 AM, Joshua D. Drake wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



Benjamin Arai wrote:
This kind of disappointing, I was hoping there was more that  
could be

done.

There has to be another way to do incremental indexing without  
loosing

that much performance.


What makes you think you are loosing performance by using  
partitioning?



Joshua D. Drake




Benjamin

On Aug 24, 2007, at 6:29 PM, Joshua D. Drake wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



Brandon Shalton wrote:

Benjamin,




In order to avoid the re-indexing I was thinking of instead  
creating

a new
table each month (building its indexes and etc) and accessing  
them

all
through a view. This way I only have to index the new data each
month.



Take a look at bizgres.org (based on postgres).

They have a parent-child structure.

The way i use it, is I have about 30M records a day that are  
inserted

into the database.

Each day is a child table to the parent.

so example:

the parent table is called  logfile

each day, is a child, with the structure like  logfile_YYMMDD

the child inherits the table structure of the parent, such  
that you

could query the child table name directly, or you run the query
against
the parent (ie. logfile table) and get all the data.

the indexes are done on a per table basis, so new data that comes
in, is
a lesser amount, and doesn't require re-indexing.




PostgreSQL can do all of this too.



Sincerely,



Joshua D. Drake





example:

select * from logfile_070825 where datafield = 'foo'

if i knew i wanted to specifically go into that child, or:

select * from logfile where datafield = 'foo'

and all child tables are searched and results merged.  You can  
perform

any kind of sql query and field structures are you normally do.

the downside is that the queries are run sequentially.

so if you had 100 child tables, each table is queried via  
indexes,

then
results are merged.

but, this approach does allow me to dump alot of data in, without
having
the re-indexing issues you are facing.

at some point, you could roll up the days, in to weekly child  
tables,

then monthly tables, etc.

I believe Bizgres has a new version of their system that does  
parallel

queries which would certainly speed things up.

For your documents, you can do it by the day it was checked  
in, or

maybe
you have some other way of logically grouping, but the parent/ 
child

table structure really helped to solve my problem of adding in
millions
of records each day.

The closest thing in mysql is using merge tables, which is not  
really

practical when it comes time to do the joins to the tables.

-brandon

http://www.t3report.com - marketing intelligence for online  
marketing

and affiliate programs










---(end of
broadcast)---
TIP 6: explain analyze is your friend





- --



  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project:
http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/








- ---(end of
broadcast)---
TIP 6: explain analyze is your friend





- --



  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/ 
donate

PostgreSQL Replication: http://www.commandprompt.com/products/







- --

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency:

Re: [GENERAL] [PERFORM] Partioning tsearch2 a table into chunks and accessing via views

2007-08-25 Thread Erik Jones



On Aug 25, 2007, at 2:58 PM, Erik Jones wrote:



On Aug 24, 2007, at 7:41 PM, Benjamin Arai wrote:


Hi,

I have an application which loads millions of NEW documents each  
month
into a PostgreSQL tsearch2 table.  I have the initial version  
completed
and searching performance is great but my problem is that each  
time a new
month rolls around I have to drop all the indexes do a COPY and re- 
index
the entire table. This is problematic considering that each month  
takes
longer than the previous to rebuild the indexes and the  
application in

unavailable during the rebuilding process.

In order to avoid the re-indexing I was thinking of instead  
creating a new
table each month (building its indexes and etc) and accessing them  
all
through a view. This way I only have to index the new data each  
month.


Does this work?  Does a view with N tables make it N times slower for
tsearch2 queries? Is there a better solution?



You can use Postgres's inheritance mechanism for your partitioning  
mechanism and combine it with constraint exclusion to avoid the N^2  
issues.  See:


http://www.postgresql.org/docs/8.2/interactive/ddl-inherit.html

and

http://www.postgresql.org/docs/8.2/interactive/ddl-partitioning.html

Basically, create a table from which all of your partitioned tables  
inherit.  Partition in such a way that you can use constraint  
exclusion and then you can treat the parent table like the view you  
were suggesting.


Erik Jones



Sorry, I didn't see that you had crossposted and carried the  
conversation on another list.  Please, don't do that.  Avoid the top  
posting, as well.   They both make it difficult for others to join in  
or follow the conversations and issues.


Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate  market in style.
Visit us online at http://www.myemma.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [GENERAL] [PERFORM] Partioning tsearch2 a table into chunks and accessing via views

2007-08-24 Thread Joshua D. Drake

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Brandon Shalton wrote:
 Benjamin,
 
 

 In order to avoid the re-indexing I was thinking of instead creating 
 a new
 table each month (building its indexes and etc) and accessing them all
 through a view. This way I only have to index the new data each month.

 
 Take a look at bizgres.org (based on postgres).
 
 They have a parent-child structure.
 
 The way i use it, is I have about 30M records a day that are inserted
 into the database.
 
 Each day is a child table to the parent.
 
 so example:
 
 the parent table is called  logfile
 
 each day, is a child, with the structure like  logfile_YYMMDD
 
 the child inherits the table structure of the parent, such that you
 could query the child table name directly, or you run the query against
 the parent (ie. logfile table) and get all the data.
 
 the indexes are done on a per table basis, so new data that comes in, is
 a lesser amount, and doesn't require re-indexing.


PostgreSQL can do all of this too.

Sincerely,

Joshua D. Drake

 
 
 example:
 
 select * from logfile_070825 where datafield = 'foo'
 
 if i knew i wanted to specifically go into that child, or:
 
 select * from logfile where datafield = 'foo'
 
 and all child tables are searched and results merged.  You can perform
 any kind of sql query and field structures are you normally do.
 
 the downside is that the queries are run sequentially.
 
 so if you had 100 child tables, each table is queried via indexes, then
 results are merged.
 
 but, this approach does allow me to dump alot of data in, without having
 the re-indexing issues you are facing.
 
 at some point, you could roll up the days, in to weekly child tables,
 then monthly tables, etc.
 
 I believe Bizgres has a new version of their system that does parallel
 queries which would certainly speed things up.
 
 For your documents, you can do it by the day it was checked in, or maybe
 you have some other way of logically grouping, but the parent/child
 table structure really helped to solve my problem of adding in millions
 of records each day.
 
 The closest thing in mysql is using merge tables, which is not really
 practical when it comes time to do the joins to the tables.
 
 -brandon
 
 http://www.t3report.com - marketing intelligence for online marketing
 and affiliate programs
 
 
 
 
 
 
 
 
 
 
 ---(end of broadcast)---
 TIP 6: explain analyze is your friend
 


- --

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGz4XuATb/zqfZUUQRAukhAJ9b2x4PLPZsoPmtm3O/Ze4AobDXngCgq+rl
X2j2ePDyjYxRajfGCVmjnYU=
=pIjb
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [GENERAL] [PERFORM] Partioning tsearch2 a table into chunks and accessing via views

2007-08-24 Thread Benjamin Arai


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

This kind of disappointing, I was hoping there was more that could be  
done.


There has to be another way to do incremental indexing without  
loosing that much performance.


Benjamin

On Aug 24, 2007, at 6:29 PM, Joshua D. Drake wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Brandon Shalton wrote:

Benjamin,




In order to avoid the re-indexing I was thinking of instead creating
a new
table each month (building its indexes and etc) and accessing  
them all
through a view. This way I only have to index the new data each  
month.




Take a look at bizgres.org (based on postgres).

They have a parent-child structure.

The way i use it, is I have about 30M records a day that are inserted
into the database.

Each day is a child table to the parent.

so example:

the parent table is called  logfile

each day, is a child, with the structure like  logfile_YYMMDD

the child inherits the table structure of the parent, such that you
could query the child table name directly, or you run the query  
against

the parent (ie. logfile table) and get all the data.

the indexes are done on a per table basis, so new data that comes  
in, is

a lesser amount, and doesn't require re-indexing.



PostgreSQL can do all of this too.

Sincerely,

Joshua D. Drake




example:

select * from logfile_070825 where datafield = 'foo'

if i knew i wanted to specifically go into that child, or:

select * from logfile where datafield = 'foo'

and all child tables are searched and results merged.  You can  
perform

any kind of sql query and field structures are you normally do.

the downside is that the queries are run sequentially.

so if you had 100 child tables, each table is queried via indexes,  
then

results are merged.

but, this approach does allow me to dump alot of data in, without  
having

the re-indexing issues you are facing.

at some point, you could roll up the days, in to weekly child tables,
then monthly tables, etc.

I believe Bizgres has a new version of their system that does  
parallel

queries which would certainly speed things up.

For your documents, you can do it by the day it was checked in, or  
maybe

you have some other way of logically grouping, but the parent/child
table structure really helped to solve my problem of adding in  
millions

of records each day.

The closest thing in mysql is using merge tables, which is not really
practical when it comes time to do the joins to the tables.

-brandon

http://www.t3report.com - marketing intelligence for online marketing
and affiliate programs










---(end of  
broadcast)---

TIP 6: explain analyze is your friend




- --

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/ 
donate

PostgreSQL Replication: http://www.commandprompt.com/products/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGz4XuATb/zqfZUUQRAukhAJ9b2x4PLPZsoPmtm3O/Ze4AobDXngCgq+rl
X2j2ePDyjYxRajfGCVmjnYU=
=pIjb
-END PGP SIGNATURE-



-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (Darwin)

iQIVAwUBRs+/UfyqRf6YpodNAQL6Xg//eEqR0UQ4I/snn7Dtmkru40jCuECGeG8g
XoxLWEa+bumVgwrEYbjKTBp3KP6OEKz9VV4xHQROTtqxh+rg0hdoc0kWxSyquCm8
VljL24ykvBmRmjhacwi8FKp092zwRcLrbkzTxIr90q8u008aVPWxQCBtmfL6QVTv
I9AyN0kb00ypx+B9I2ySugYzBerVCMUiKUeXplHWn1loSSm1w+5CzXY8gtvivFEV
YspS1Fk2rxjnjlPE/FTGUiwJrdWZTJrd3BuSVbH5DWBoCjz9gzq0NyNZAtESWX2H
oGwlWBEJNFTtoHnK4iTMS+CzKHQQQZ9ZuQcHy84SlXYUo9n0/NCIeabu2xaj44Fs
LFq8jBCH3ebAkD/hQOgk1H05ljbfX8A/u2zz75W1NbD0xTB/sAljWqhypz2x7pOo
sUJF9MQ7DwVG8JitUAAc5fuGpLLR4WxF68YdkgycaCNknP7IATeD2ecqJkC26Av+
GHHci2ct5ypVq9Qq8OuesYSox7XpO2+E+Y5DtgBo+/R7eOJRLA3Z0FDXFLGsdFxy
0OKoew1MN79jP+KMZFJwvddH/TrkZBdIKlkacXYwUHU3c1ATwne6WteKTnEmr2aP
99oQgfmNDyQgTeEL20jokF4YZOdm1UO3Cc7wTi2QlwyqUDbUmYtWzgbS9QbnaGGA
58XdVacGznw=
=Hst4
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [GENERAL] [PERFORM] Partioning tsearch2 a table into chunks and accessing via views

Re: [GENERAL] [PERFORM] Partioning tsearch2 a table into chunks and accessing via views

Re: [GENERAL] [PERFORM] Partioning tsearch2 a table into chunks and accessing via views

Re: [GENERAL] [PERFORM] Partioning tsearch2 a table into chunks and accessing via views

Re: [GENERAL] [PERFORM] Partioning tsearch2 a table into chunks and accessing via views

Re: [GENERAL] [PERFORM] Partioning tsearch2 a table into chunks and accessing via views

Re: [GENERAL] [PERFORM] Partioning tsearch2 a table into chunks and accessing via views

Re: [GENERAL] [PERFORM] Partioning tsearch2 a table into chunks and accessing via views

8 matches

Site Navigation

Mail list logo

Footer information