Re: [PERFORM] TableSpace Design issues on Postgres 8.0 beta 5
Additional update: After spending time in searching more option. Both Postgresql and MySQL Claims 1. Postgresql: http://www.postgresql.org/users-lounge/limitations.html 2.MySQL (Drawback creating large numbers of tables) http://forums.devshed.com/t27778/s.html This might sound like crazy ideas. I just try to experiment several option which are easy todo before developing). After running auto generate tables 12 tables each for each of 1600 Tablespaces Here are some statistics 1. For every 12 tables created in each Tablespace will create 35 files in tablespace directory 2. For every 12 tables for each 1 tables space will creates additional approx 17 additional new(eventhough it has it's own tablespace files for default database directory 3. The above screnario creating 19200 tables in database 4. It take approximately 2 hours for pgAdmin III when starting just to load all tables info when first starting. Good news(it did not hang, it able to handle all 19200 tables once loaded and easily browse all tables info) I have not try to actually query the database from the web. Seems like the whole approach of "large number of tables" is not good. (As College Database Design course said:):). By creating 12 tables for each of 7000 Tablespace. Now files in folder is not as huge as when everything in one TableSpace. Any opinion is welcome... "Rosny" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > Hi, > > This kind of long email. > > After searching the mailing list, have not found good answer > for TableSpace. So, I try to post this question. > > My question > Question : > 1 Which option from below scenario will be good in term of performance and > future scalability? > 2. Is it Option B1 below the right approach? > 3. Is progresql will have problems if I have 7000 tablespace? > > --- > Environment : > - Windows 2003 > - Postgresql 8.0 beta 5 > > Scenario : > Original Design: > Total Tables 40: > - 20 tables are main tables responsible for the others 20 tables > - the others 20 tables are specific for each department. > - from these 20 tables(departments) > there are 4-5 tables that will contain approx 20 millions records >(these tables will be hit every times access to the website). > > Refering to 20 tables which can be partition > A. All departments tables is put into 20 tables. >some querying of 20 millions records. > > B. For each department create tablespace. (Which means, if there > are 7000 departments, there will be 7000 tablespace each contains > 20 tables). > > > Question : Which option will be good in term of performance >and future scalability? > > A1. Use A option, > As tables become huge. partition the tables which hits often > and has large size file(usually when it bigger than 2-3 GB size) > into separate tablespace. > > Problems in A1 approach : > 1. query take very long. It might be resolved > - indexing, better written pgsql statement. > > Advantage : total files are small. around 1000 in one directory > > > B1. Use B option, > Creating 7000 TableSpace for Departments > - One Department has one tablespace > - Each Department has 20 tables > > Advantage : > - each table is small and query is very fast. > - scalability. As the sites grows, contents grows. will > not effect future scalability as much as A1. > in A1 the query already max out for performance partition. > in B1 the query has not max out yet because the data is > already distribute across thousands of tables > > Disadvantage: > - total numbers of files is huge. > (after creating 7000 tablespace, and start > table automatic generator to create 20 tables > for each 7000 tablespace. > After running the 1500th tablespace. > Each TableSpace has : 35 files > Surprisingly the default table space already has 2 files) >- Need to use dynamic table name query. (this is ok, > since there are not very complex sql statement logic) > > I am trying to choose option B1, as it is good for future scability. > > Question : > 1. Is it B1 the right approach? > 2. Is progresql will have problems if I have 7000 tablespace? > > > Thank you, > Rosny > > note: previously posted on cygwin. but I think it is more appropriate for > this group > > > > > > > > > > > > > > ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] TableSpace Design issues on Postgres 8.0 beta 5
Hi, This kind of long email. After searching the mailing list, have not found good answer for TableSpace. So, I try to post this question. My question Question : 1 Which option from below scenario will be good in term of performance and future scalability? 2. Is it Option B1 below the right approach? 3. Is progresql will have problems if I have 7000 tablespace? --- Environment : - Windows 2003 - Postgresql 8.0 beta 5 Scenario : Original Design: Total Tables 40: - 20 tables are main tables responsible for the others 20 tables - the others 20 tables are specific for each department. - from these 20 tables(departments) there are 4-5 tables that will contain approx 20 millions records (these tables will be hit every times access to the website). Refering to 20 tables which can be partition A. All departments tables is put into 20 tables. some querying of 20 millions records. B. For each department create tablespace. (Which means, if there are 7000 departments, there will be 7000 tablespace each contains 20 tables). Question : Which option will be good in term of performance and future scalability? A1. Use A option, As tables become huge. partition the tables which hits often and has large size file(usually when it bigger than 2-3 GB size) into separate tablespace. Problems in A1 approach : 1. query take very long. It might be resolved - indexing, better written pgsql statement. Advantage : total files are small. around 1000 in one directory B1. Use B option, Creating 7000 TableSpace for Departments - One Department has one tablespace - Each Department has 20 tables Advantage : - each table is small and query is very fast. - scalability. As the sites grows, contents grows. will not effect future scalability as much as A1. in A1 the query already max out for performance partition. in B1 the query has not max out yet because the data is already distribute across thousands of tables Disadvantage: - total numbers of files is huge. (after creating 7000 tablespace, and start table automatic generator to create 20 tables for each 7000 tablespace. After running the 1500th tablespace. Each TableSpace has : 35 files Surprisingly the default table space already has 2 files) - Need to use dynamic table name query. (this is ok, since there are not very complex sql statement logic) I am trying to choose option B1, as it is good for future scability. Question : 1. Is it B1 the right approach? 2. Is progresql will have problems if I have 7000 tablespace? Thank you, Rosny note: previously posted on cygwin. but I think it is more appropriate for this group ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] TableSpace Design issues on Postgres 8.0 beta 5
Thanks for the response. I just start to get a feel of where TableSpace will be used. You are right I do not have 7000 logical filesystems. I am assuming using TableSpace as organization of files in folders in Windows 2003 Environment. So, each TableSpace will represent one folder(directory) in a drive. But this doesnot work to well for my design. Since pgAdmin III GUI take 2 hours just to load approximately 14000 tables. I am not using TableSpace approach anymore. I am using multiple database approach. Anyway, thanks for the response. Today testing by taking several option to the limit. I kind of having some ideas for future scability. Rosny "Tom Lane" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > "Rosny" <[EMAIL PROTECTED]> writes: > > B. For each department create tablespace. (Which means, if there > > are 7000 departments, there will be 7000 tablespace each contains > > 20 tables). > > If your system has seven thousand separate logical filesystems attached > to it, there might be some value in having seven thousand tablespaces. > But I will bet a great deal that it does not and there isn't. > > regards, tom lane > > ---(end of broadcast)--- > TIP 4: Don't 'kill -9' the postmaster > ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PERFORM] Which one is faster: one way reading ="single pass reading"
Hi, Which one is faster: one way reading ="single pass reading" Assumption : a. Need to have 3 millions records b. Need to have call 10 or 20 records repeatly (so for database it will be 10 times connection, each connection with one record. or can be fancy 1 connection call return 10 sets of records) 1. Reading from Flat file Assume already give file name and just need to read the file (since it is flat file, each record represent a filename, with multiple directory category) 2. Reading from XML file Assume schema already given just need to read the file (since it is xml file, each record represent an xml filename, with multiple directory category) 3. Reading from Postgresql Assume primary key has been done with indexing just need to search the number and grap the text content (assume 3 millions of records, search the number, read the content file) trying to recreate WebDBReader (from nutch) using C# http://nutch.sourceforge.net/docs/api/net/nutch/db/WebDBReader.html Thank you in advances, Rosny ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq