Re: [PERFORM] Backup taking long time !!!

2017-01-23 Thread julyanto SUTANDANG
Dear Jeff,

Thanks for the correction and by this email, we hope that myth has gone
forever :)
Will do that to inform other about this matter.

And agree with all of us here that: using pg_basebackup is the best
approach rather than do it manually through pg_start_backup, right?

Thanks and Regards,

Jul.



Julyanto SUTANDANG

Equnix Business Solutions, PT
(An Open Source and Open Mind Company)
www.equnix.co.id
Pusat Niaga ITC Roxy Mas Blok C2/42.  Jl. KH Hasyim Ashari 125, Jakarta
Pusat
T: +6221 2282 F: +62216315281 M: +628164858028


Caution: The information enclosed in this email (and any attachments) may
be legally privileged and/or confidential and is intended only for the use
of the addressee(s). No addressee should forward, print, copy, or otherwise
reproduce this message in any manner that would allow it to be viewed by
any individual not originally listed as a recipient. If the reader of this
message is not the intended recipient, you are hereby notified that any
unauthorized disclosure, dissemination, distribution, copying or the taking
of any action in reliance on the information herein is strictly prohibited.
If you have received this communication in error, please immediately notify
the sender and delete this message.Unless it is made by the authorized
person, any views expressed in this message are those of the individual
sender and may not necessarily reflect the views of PT Equnix Business
Solutions.

On Tue, Jan 24, 2017 at 12:12 AM, Jeff Janes <jeff.ja...@gmail.com> wrote:

> On Sun, Jan 22, 2017 at 6:57 AM, Stephen Frost <sfr...@snowman.net> wrote:
>
>> Greetings,
>>
>> * julyanto SUTANDANG (julya...@equnix.co.id) wrote:
>> > CORRECTION:
>> >
>> > "you might you pg_start_backup to tell the server not to write into the
>> > DATADIR"
>> >
>> > become
>> >
>> > "you might *use* pg_start_backup to tell the server not to write into
>> the
>> > *BASEDIR*, actually server still writes but only to XLOGDIR "
>>
>> Just to make sure anyone reading the mailing list archives isn't
>> confused, running pg_start_backup does *not* make PG stop writing to
>> BASEDIR (or DATADIR, or anything, really).  PG *will* continue to write
>> data into BASEDIR after pg_start_backup has been called.
>>
>
>
> Correct.  Unfortunately it is a very common myth that it does cause
> PostgreSQL to stop writing to the base dir.
>
>
>>
>> The only thing that pg_start_backup does is identify an entry in the WAL
>> stream, from which point all WAL must be replayed when restoring the
>> backup.  All WAL generated from that point (pg_start_backup point) until
>> the pg_stop_backup point *must* be replayed when restoring the backup or
>> the database will not be consistent.
>>
>
> pg_start_backup also forces full_page_writes to be effectively 'on' for
> the duration of the backup, if it is not already explicitly on (which it
> usually will already be).  This affects pg_xlog, of course, not base.  But
> it is an essential step for people who run with full_page_writes=off, as it
> ensures that anything in base which got changed mid-copy will be fixed up
> during replay of the WAL.
>
>
> Cheers,
>
> Jeff
>


Re: [PERFORM] Backup taking long time !!!

2017-01-22 Thread julyanto SUTANDANG
Hi All,

Especially for Stephen Frost, Thank you very much for your deeply
explanation and elaboration!
Anyway, all has clear, i am not disagree with Stephen, i am the lucky one
get in corrected by Expert like you.
in short, please use pg_basebackup for getting snapshot and don't forget
for the WAL log to be archive also so we can get complete full and
incremental backup. (that is better, rather than only occasional backup
right?)

So this is anyway what we should do, in doing backup for PostgreSQL. by
this way, we can ensure "D" Durability of your data in Database across
disaster and across location, not only within an Instance.

Thanks,



Julyanto SUTANDANG

Equnix Business Solutions, PT
(An Open Source and Open Mind Company)
www.equnix.co.id
Pusat Niaga ITC Roxy Mas Blok C2/42.  Jl. KH Hasyim Ashari 125, Jakarta
Pusat
T: +6221 2282 F: +62216315281 M: +628164858028


Caution: The information enclosed in this email (and any attachments) may
be legally privileged and/or confidential and is intended only for the use
of the addressee(s). No addressee should forward, print, copy, or otherwise
reproduce this message in any manner that would allow it to be viewed by
any individual not originally listed as a recipient. If the reader of this
message is not the intended recipient, you are hereby notified that any
unauthorized disclosure, dissemination, distribution, copying or the taking
of any action in reliance on the information herein is strictly prohibited.
If you have received this communication in error, please immediately notify
the sender and delete this message.Unless it is made by the authorized
person, any views expressed in this message are those of the individual
sender and may not necessarily reflect the views of PT Equnix Business
Solutions.

On Mon, Jan 23, 2017 at 12:32 AM, Stephen Frost <sfr...@snowman.net> wrote:

> Greetings,
>
> * julyanto SUTANDANG (julya...@equnix.co.id) wrote:
> > Thanks for elaborating this Information, this is new, so whatever it is
> the
> > procedure is *Correct and Workable*.
>
> Backups are extremely important, so I get quite concerned when people
> provide incorrect information regarding them.
>
> > > With all of the WAL
> > > which was created during the backup, PG will be able to recover from
> the
> > > changes made during the backup to the data directory, but you *must*
> > > have all of that WAL, or the backup will be inconsistent because of
> >
> > That is rather out of question, because all what we discuss here is just
> > doing full/snapshot backup.
>
> It's unclear what you mean by 'out of question' or why you believe that
> it matters if it's a full backup or not.
>
> Any backup of PG *must* include all of the WAL that was created during
> the backup.
>
> > The backup is Full Backup or Snapshot and it will work whenever needed.
> > We are not saying about Incremental Backup yet.
> > Along with collecting the XLOG File, you can have incremental backup and
> > having complete continuous data backup.
> > in this case, Stephen is suggesting on using pg_receivexlog or
> > archive_command
> > (everything here is actually explained well on the docs))
>
> No, that is not correct.  You must have the WAL for a full backup as
> well.  If I understand what you're suggesting, it's that WAL is only for
> point-in-time-recovery, but that is *not* correct, WAL is required for
> restoring a full backup to a consistent state.
>
> > those changes that were made to the data directory after
> > > pg_start_backup() was called.
> > >
> > > In other words, if you aren't using pg_receivexlog or archive_command,
> > > your backups are invalid.
> > >
> > I doubt that *invalid* here is a valid word
> > In term of snapshot backup and as long as the snapshot can be run, that
> is
> > valid, isn't it?
>
> It's absolutely correct, you must have the WAL generated during your
> backup or the backup is invalid.
>
> If, what you mean by 'snapshot' is a *full-system atomic snapshot*,
> provided by some layer lower than PostgreSQL that is *exactly* as if the
> machine was physically turned off all at once, then, and *only* then,
> can you be guaranteed that PG will be able to recover, but the reason
> for that is because PG will go back to the last checkpoint that
> happened, as recorded in pg_control, and replay all of the WAL in the
> pg_xlog/pg_wal directory, which must all exist and be complete for all
> committed transaction because the WAL was sync'd to disk before the
> commit was acknowledged and the WAL is not removed until after a
> checkpoint has completed which has sync'd the data in the data directory
> out to the filesystem.
>
> That's also known as 'cra

Re: [PERFORM] Backup taking long time !!!

2017-01-22 Thread julyanto SUTANDANG
Hi Stephen,

> > When PostgreSQL in the mode of Start Backup, PostgreSQL only writes to
> the
> > XLOG, then you can safely rsync / copy the base data (snapshot) then
> later
> > you can have full copy of snapshot backup data.
>
> You are confusing two things.
>
> After calling pg_start_backup, you can safely copy the contents of the
> data directory, that is correct.


> However, PostgreSQL *will* continue to write to the data directory.
> That, however, is ok, because those changes will *also* be written into
> the WAL and, after calling pg_start_backup(), you collect all of the
> WAL using archive_command or pg_receivexlog.

Thanks for elaborating this Information, this is new, so whatever it is the
procedure is *Correct and Workable*.


> With all of the WAL
> which was created during the backup, PG will be able to recover from the
> changes made during the backup to the data directory, but you *must*
> have all of that WAL, or the backup will be inconsistent because of
>

That is rather out of question, because all what we discuss here is just
doing full/snapshot backup.
The backup is Full Backup or Snapshot and it will work whenever needed.
We are not saying about Incremental Backup yet.
Along with collecting the XLOG File, you can have incremental backup and
having complete continuous data backup.
in this case, Stephen is suggesting on using pg_receivexlog or
archive_command
(everything here is actually explained well on the docs))


those changes that were made to the data directory after
> pg_start_backup() was called.
>
> In other words, if you aren't using pg_receivexlog or archive_command,
> your backups are invalid.
>
I doubt that *invalid* here is a valid word
In term of snapshot backup and as long as the snapshot can be run, that is
valid, isn't it?

> if you wanted to backup in later day, you can use rsync then it will copy
> > faster because rsync only copy the difference, rather than copy all the
> > data.
>
> This is *also* incorrect.  rsync, by itself, is *not* safe to use for
> doing that kind of incremental backup, unless you enable checksums.  The
> reason for this is that rsync has only a 1-second level granularity and
> it is possible (unlikely, though it has been demonstrated) to miss
> changes made to a file within that 1-second window.
>
As long as that is not XLOG file, anyway.. as you are saying that wouldn't
be a problem since actually we can run the XLOG for recovery. .


>
> > my latter explanation is: use pg_basebackup, it will do it automatically
> > for you.
>
> Yes, if you are unsure about how to perform a safe backup properly,
> using pg_basebackup or one of the existing backup tools is, by far, the
> best approach.  Attempting to roll your own backup system based on rsync
> is not something I am comfortable recommending any more because it is
> *not* simple to do correctly.
>
OK, that is fine, and actually we are using that.
the reason why i explain about start_backup and stop_backup is to give a
gradual understand, and hoping that people will get the mechanism in the
back understandable.


>
> Thanks!
>
> Thanks for your great explanation!


> Stephen
>


Re: [PERFORM] Backup taking long time !!!

2017-01-22 Thread julyanto SUTANDANG
Hi Stephen,

Please elaborate more of what you are saying. What i am saying is based on
the Official Docs, Forum and our own test. This is what we had to do to
save time, both backing up and  restoring.

https://www.postgresql.org/docs/9.6/static/functions-admin.html

When PostgreSQL in the mode of Start Backup, PostgreSQL only writes to the
XLOG, then you can safely rsync / copy the base data (snapshot) then later
you can have full copy of snapshot backup data.
if you wanted to backup in later day, you can use rsync then it will copy
faster because rsync only copy the difference, rather than copy all the
data.

my latter explanation is: use pg_basebackup, it will do it automatically
for you.

CMIIW,




Julyanto SUTANDANG

Equnix Business Solutions, PT
(An Open Source and Open Mind Company)
www.equnix.co.id
Pusat Niaga ITC Roxy Mas Blok C2/42.  Jl. KH Hasyim Ashari 125, Jakarta
Pusat
T: +6221 2282 F: +62216315281 M: +628164858028


Caution: The information enclosed in this email (and any attachments) may
be legally privileged and/or confidential and is intended only for the use
of the addressee(s). No addressee should forward, print, copy, or otherwise
reproduce this message in any manner that would allow it to be viewed by
any individual not originally listed as a recipient. If the reader of this
message is not the intended recipient, you are hereby notified that any
unauthorized disclosure, dissemination, distribution, copying or the taking
of any action in reliance on the information herein is strictly prohibited.
If you have received this communication in error, please immediately notify
the sender and delete this message.Unless it is made by the authorized
person, any views expressed in this message are those of the individual
sender and may not necessarily reflect the views of PT Equnix Business
Solutions.

On Sun, Jan 22, 2017 at 9:55 PM, Stephen Frost <sfr...@snowman.net> wrote:

> Greetings,
>
> * julyanto SUTANDANG (julya...@equnix.co.id) wrote:
> > Best practice in doing full backup is using RSYNC, but before you can
> copy
> > the DATADIR, you might you pg_start_backup to tell the server not to
> write
> > into the DATADIR, because you are copying that data. After finished copy
> > all the data in DATADIR, you can ask server to continue flushing the data
> > from logs, by commanding pg_stop_backup. Remember, not to copy the XLOG
> > dir.
>
> Whoah.  That is not, at all, correct, if I'm understanding what you're
> suggesting.
>
> PG most certainly *does* continue to write into the data directory even
> after pg_start_backup() has been run.  You *must* use archive_command or
> pg_receivexlog to capture all of the WAL during the backup to have a
> consistent backup.
>
> > There are another way more simpler, which is applying command
> > pg_basebackup, which actually did that way in simpler version.
>
> pg_basebackup has options to stream the WAL during the backup to capture
> it, which is how it handles that.
>
> > if you did pg_dump, you wont get the exact copy of your data, and you
> will
> > take longer downtime to recover the backup data. By that way, recovering
> is
> > only starting up the postgres with that copy.
>
> pg_dump will generally take longer to do a restore, yes.  Recovering
> from a backup does require that a recovery.conf exists with a
> restore_command that PG can use to get the WAL files it needs, or that
> all of the WAL from the backup is in pg_xlog/pg_wal.
>
> Please do not claim that PG stops writing to the DATADIR or BASEDIR
> after a pg_start_backup(), that is not correct and could lead to invalid
> backups.
>
> Thanks!
>
> Stephen
>


Re: [PERFORM] Backup taking long time !!!

2017-01-22 Thread julyanto SUTANDANG
CORRECTION:

"you might you pg_start_backup to tell the server not to write into the
DATADIR"

become

"you might *use* pg_start_backup to tell the server not to write into the
*BASEDIR*, actually server still writes but only to XLOGDIR "


Regards,





Julyanto SUTANDANG

Equnix Business Solutions, PT
(An Open Source and Open Mind Company)
www.equnix.co.id
Pusat Niaga ITC Roxy Mas Blok C2/42.  Jl. KH Hasyim Ashari 125, Jakarta
Pusat
T: +6221 2282 F: +62216315281 M: +628164858028


Caution: The information enclosed in this email (and any attachments) may
be legally privileged and/or confidential and is intended only for the use
of the addressee(s). No addressee should forward, print, copy, or otherwise
reproduce this message in any manner that would allow it to be viewed by
any individual not originally listed as a recipient. If the reader of this
message is not the intended recipient, you are hereby notified that any
unauthorized disclosure, dissemination, distribution, copying or the taking
of any action in reliance on the information herein is strictly prohibited.
If you have received this communication in error, please immediately notify
the sender and delete this message.Unless it is made by the authorized
person, any views expressed in this message are those of the individual
sender and may not necessarily reflect the views of PT Equnix Business
Solutions.

On Sun, Jan 22, 2017 at 8:20 PM, julyanto SUTANDANG <julya...@equnix.co.id>
wrote:

> Hi Dinesh,
>
> Best practice in doing full backup is using RSYNC, but before you can copy
> the DATADIR, you might you pg_start_backup to tell the server not to write
> into the DATADIR, because you are copying that data. After finished copy
> all the data in DATADIR, you can ask server to continue flushing the data
> from logs, by commanding pg_stop_backup. Remember, not to copy the XLOG
> dir.
>
> There are another way more simpler, which is applying command
> pg_basebackup, which actually did that way in simpler version.
>
> if you did pg_dump, you wont get the exact copy of your data, and you will
> take longer downtime to recover the backup data. By that way, recovering is
> only starting up the postgres with that copy.
>
>
> Good luck!
>
>
>
> Julyanto SUTANDANG
>
> Equnix Business Solutions, PT
> (An Open Source and Open Mind Company)
> www.equnix.co.id
> Pusat Niaga ITC Roxy Mas Blok C2/42.  Jl. KH Hasyim Ashari 125, Jakarta
> Pusat
> T: +6221 2282 <(021)%202282> F: +62216315281 <(021)%206315281> M:
> +628164858028 <0816-4858-028>
>
>
> Caution: The information enclosed in this email (and any attachments) may
> be legally privileged and/or confidential and is intended only for the use
> of the addressee(s). No addressee should forward, print, copy, or otherwise
> reproduce this message in any manner that would allow it to be viewed by
> any individual not originally listed as a recipient. If the reader of this
> message is not the intended recipient, you are hereby notified that any
> unauthorized disclosure, dissemination, distribution, copying or the taking
> of any action in reliance on the information herein is strictly prohibited.
> If you have received this communication in error, please immediately notify
> the sender and delete this message.Unless it is made by the authorized
> person, any views expressed in this message are those of the individual
> sender and may not necessarily reflect the views of PT Equnix Business
> Solutions.
>
> On Fri, Jan 20, 2017 at 6:24 PM, Dinesh Chandra 12108 <
> dinesh.chan...@cyient.com> wrote:
>
>> Hi Expert,
>>
>>
>>
>> I have a database having size around 1350 GB, created in PostgreSQL-9.1
>> in Linux platform.
>>
>> I am using pg_dump to take backup which takes around 12 hours to complete.
>>
>> Could you please suggest me how I can make my backup fast so that it
>> complete in less hours?
>>
>>
>>
>> Thanks in advance.
>>
>>
>>
>> *Regards,*
>>
>> *Dinesh Chandra*
>>
>> *|Database administrator (Oracle/PostgreSQL)| Cyient Ltd. Noida.*
>>
>> *--*
>>
>> Mobile: +91-9953975849 <+91%2099539%2075849> | Ext 1078
>> |dinesh.chan...@cyient.com
>>
>> Plot No. 7, NSEZ, Phase-II ,Noida-Dadri Road, Noida - 201 305,India.
>>
>>
>>
>> --
>>
>> DISCLAIMER:
>>
>> This email message is for the sole use of the intended recipient(s) and
>> may contain confidential and privileged information. Any unauthorized
>> review, use, disclosure or distribution is prohibited. If you are not the
>> intended recipient, please contact the sender by reply email and destroy
>> all copies of the original message. Check all attachments for viruses
>> before opening them. All views or opinions presented in this e-mail are
>> those of the author and may not reflect the opinion of Cyient or those of
>> our affiliates.
>>
>
>


Re: [PERFORM] Backup taking long time !!!

2017-01-22 Thread julyanto SUTANDANG
Hi Dinesh,

Best practice in doing full backup is using RSYNC, but before you can copy
the DATADIR, you might you pg_start_backup to tell the server not to write
into the DATADIR, because you are copying that data. After finished copy
all the data in DATADIR, you can ask server to continue flushing the data
from logs, by commanding pg_stop_backup. Remember, not to copy the XLOG
dir.

There are another way more simpler, which is applying command
pg_basebackup, which actually did that way in simpler version.

if you did pg_dump, you wont get the exact copy of your data, and you will
take longer downtime to recover the backup data. By that way, recovering is
only starting up the postgres with that copy.


Good luck!



Julyanto SUTANDANG

Equnix Business Solutions, PT
(An Open Source and Open Mind Company)
www.equnix.co.id
Pusat Niaga ITC Roxy Mas Blok C2/42.  Jl. KH Hasyim Ashari 125, Jakarta
Pusat
T: +6221 2282 F: +62216315281 M: +628164858028


Caution: The information enclosed in this email (and any attachments) may
be legally privileged and/or confidential and is intended only for the use
of the addressee(s). No addressee should forward, print, copy, or otherwise
reproduce this message in any manner that would allow it to be viewed by
any individual not originally listed as a recipient. If the reader of this
message is not the intended recipient, you are hereby notified that any
unauthorized disclosure, dissemination, distribution, copying or the taking
of any action in reliance on the information herein is strictly prohibited.
If you have received this communication in error, please immediately notify
the sender and delete this message.Unless it is made by the authorized
person, any views expressed in this message are those of the individual
sender and may not necessarily reflect the views of PT Equnix Business
Solutions.

On Fri, Jan 20, 2017 at 6:24 PM, Dinesh Chandra 12108 <
dinesh.chan...@cyient.com> wrote:

> Hi Expert,
>
>
>
> I have a database having size around 1350 GB, created in PostgreSQL-9.1 in
> Linux platform.
>
> I am using pg_dump to take backup which takes around 12 hours to complete.
>
> Could you please suggest me how I can make my backup fast so that it
> complete in less hours?
>
>
>
> Thanks in advance.
>
>
>
> *Regards,*
>
> *Dinesh Chandra*
>
> *|Database administrator (Oracle/PostgreSQL)| Cyient Ltd. Noida.*
>
> *--*
>
> Mobile: +91-9953975849 <+91%2099539%2075849> | Ext 1078
> |dinesh.chan...@cyient.com
>
> Plot No. 7, NSEZ, Phase-II ,Noida-Dadri Road, Noida - 201 305,India.
>
>
>
> --
>
> DISCLAIMER:
>
> This email message is for the sole use of the intended recipient(s) and
> may contain confidential and privileged information. Any unauthorized
> review, use, disclosure or distribution is prohibited. If you are not the
> intended recipient, please contact the sender by reply email and destroy
> all copies of the original message. Check all attachments for viruses
> before opening them. All views or opinions presented in this e-mail are
> those of the author and may not reflect the opinion of Cyient or those of
> our affiliates.
>


Re: [PERFORM] Millions of tables

2016-09-26 Thread julyanto SUTANDANG
-sorry for my last email, which also not bottom posting-

Hi Greg,
On Mon, Sep 26, 2016 at 11:19 AM, Greg Spiegelberg 
wrote:

> I did look at PostgresXL and CitusDB.  Both are admirable however neither
> could support the need to read a random record consistently under 30ms.
> It's a similar problem Cassandra and others have: network latency.  At this
> scale, to provide the ability to access any given record amongst trillions
> it is imperative to know precisely where it is stored (system & database)
> and read a relatively small index.  I have other requirements that prohibit
> use of any technology that is eventually consistent.
>
 Then, you can get below 30ms, but how many process you might have to have
conncurently?
This is something that you should consider, single machine can only have
less than 50 HT for intel, 192HT for Power8, still it is far below millions
compare with the number of tables (8Million)
If you use index correctly, you would not need sequencial scan since the
scanning run on the memory (index loaded into memory)
Do you plan to query thru Master table of the partition? it is quite slow
actually, considering millions rule to check for every query.

with 8 Millions of data, you would require very big data storage for sure
and it would not fit mounted into single machine unless you would planning
to use IBM z machines.


> I liken the problem to fishing.  To find a particular fish of length,
> size, color  in a data lake you must accept the possibility of scanning
> the entire lake.  However, if all fish were in barrels where each barrel
> had a particular kind of fish of specific length, size, color  then the
> problem is far simpler.
>
>


Re: [PERFORM] Millions of tables

2016-09-25 Thread julyanto SUTANDANG
Dear Greg,

Have you checked PostgresXL ?
with millions of table, how the apps choose which table is approriate?
in my opinion, with that scale it should go with parallel query with
data sharing like what PostgresXL is done.

Thanks,


Julyanto SUTANDANG

Equnix Business Solutions, PT
(An Open Source and Open Mind Company)
www.equnix.co.id
Pusat Niaga ITC Roxy Mas Blok C2/42.  Jl. KH Hasyim Ashari 125, Jakarta Pusat
T: +6221 2282 F: +62216315281 M: +628164858028


Caution: The information enclosed in this email (and any attachments)
may be legally privileged and/or confidential and is intended only for
the use of the addressee(s). No addressee should forward, print, copy,
or otherwise reproduce this message in any manner that would allow it
to be viewed by any individual not originally listed as a recipient.
If the reader of this message is not the intended recipient, you are
hereby notified that any unauthorized disclosure, dissemination,
distribution, copying or the taking of any action in reliance on the
information herein is strictly prohibited. If you have received this
communication in error, please immediately notify the sender and
delete this message.Unless it is made by the authorized person, any
views expressed in this message are those of the individual sender and
may not necessarily reflect the views of PT Equnix Business Solutions.


On Mon, Sep 26, 2016 at 9:50 AM, Greg Spiegelberg
<gspiegelb...@gmail.com> wrote:
> Hey all,
>
> Obviously everyone who's been in PostgreSQL or almost any RDBMS for a time
> has said not to have millions of tables.  I too have long believed it until
> recently.
>
> AWS d2.8xlarge instance with 9.5 is my test rig using XFS on EBS (io1) for
> PGDATA.  Over the weekend, I created 8M tables with 16M indexes on those
> tables.  Table creation initially took 0.018031 secs, average 0.027467 and
> after tossing out outliers (qty 5) the maximum creation time found was
> 0.66139 seconds.  Total time 30 hours, 31 minutes and 8.435049 seconds.
> Tables were created by a single process.  Do note that table creation is
> done via plpgsql function as there are other housekeeping tasks necessary
> though minimal.
>
> No system tuning but here is a list of PostgreSQL knobs and switches:
> shared_buffers = 2GB
> work_mem = 48 MB
> max_stack_depth = 4 MB
> synchronous_commit = off
> effective_cache_size = 200 GB
> pg_xlog is on it's own file system
>
> There are some still obvious problems.  General DBA functions such as VACUUM
> and ANALYZE should not be done.  Each will run forever and cause much grief.
> Backups are problematic in the traditional pg_dump and PITR space.  Large
> JOIN's by VIEW, SELECT or via table inheritance (I am abusing it in my test
> case) are no-no's.  A system or database crash could take potentially hours
> to days to recover.  There are likely other issues ahead.
>
> You may wonder, "why is Greg attempting such a thing?"  I looked at
> DynamoDB, BigTable, and Cassandra.  I like Greenplum but, let's face it,
> it's antiquated and don't get me started on "Hadoop".  I looked at many
> others and ultimately the recommended use of each vendor was to have one
> table for all data.  That overcomes the millions of tables problem, right?
>
> Problem with the "one big table" solution is I anticipate 1,200 trillion
> records.  Random access is expected and the customer expects <30ms reads for
> a single record fetch.
>
> No data is loaded... yet  Table and index creation only.  I am interested in
> the opinions of all including tests I may perform.  If you had this setup,
> what would you capture / analyze?  I have a job running preparing data.  I
> did this on a much smaller scale (50k tables) and data load via function
> allowed close to 6,000 records/second.  The schema has been simplified since
> and last test reach just over 20,000 records/second with 300k tables.
>
> I'm not looking for alternatives yet but input to my test.  Takers?
>
> I can't promise immediate feedback but will do my best to respond with
> results.
>
> TIA,
> -Greg


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgresql 8.4 optimize for heavy select load

2016-09-19 Thread julyanto SUTANDANG
You might wanted to upgrade to new  version 9.5 with small effort by using
pg_upgrade,
we have done upgrading and achieve more than 20x faster from 8.4 to 9.5 (it
depends on the type of sql statement actually)

Jul.



Julyanto SUTANDANG

Equnix Business Solutions, PT
(An Open Source and Open Mind Company)
www.equnix.co.id
Pusat Niaga ITC Roxy Mas Blok C2/42.  Jl. KH Hasyim Ashari 125, Jakarta
Pusat
T: +6221 2282 F: +62216315281 M: +628164858028


Caution: The information enclosed in this email (and any attachments) may
be legally privileged and/or confidential and is intended only for the use
of the addressee(s). No addressee should forward, print, copy, or otherwise
reproduce this message in any manner that would allow it to be viewed by
any individual not originally listed as a recipient. If the reader of this
message is not the intended recipient, you are hereby notified that any
unauthorized disclosure, dissemination, distribution, copying or the taking
of any action in reliance on the information herein is strictly prohibited.
If you have received this communication in error, please immediately notify
the sender and delete this message.Unless it is made by the authorized
person, any views expressed in this message are those of the individual
sender and may not necessarily reflect the views of PT Equnix Business
Solutions.

On Mon, Sep 19, 2016 at 3:23 PM, Mark Kirkwood <
mark.kirkw...@catalyst.net.nz> wrote:

> On 19/09/16 19:40, Job wrote:
>
> Hello,
>>
>> i would please like to have some suggestions to optimize Postgres 8.4 for
>> a very heavy number of select (with join) queries.
>> The queries read data, very rarely they write.
>>
>>
> We probably need to see schema and query examples to help you (with
> EXPLAIN ANALYZE output). Also - err 8.4 - I (and others probably) will
> recommend you upgrade to a more recent (and supported for that matter)
> version - currently 9.5/9.6 - lots of performance improvements you are
> missing out on!
>
> Best wishes
>
> Mark
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] Indexes for hashes

2016-06-17 Thread julyanto SUTANDANG
Crc32 is great because it is supported by Intel Hardware, unfortunatelly
you have to code something like this:

http://stackoverflow.com/questions/31184201/how-to-implement-crc32-taking-advantage-of-intel-specific-instructions


int32_t sse42_crc32(const uint8_t *bytes, size_t len){
  uint32_t hash = 0;
  size_t i = 0;
  for (i=0;i<len;i++) {
hash = _mm_crc32_u8(hash, bytes[i]);
  }

  return hash;}

It is supported by GCC and will implemented as hardware computing which
really fast.
you can use 64bit integer to have more precise hashing, so don't worry
about uniformity.

Btw: crc32 is not part of the cryptography, it is part of hashing or
signature.

Regards,


On Fri, Jun 17, 2016 at 3:32 PM, Ivan Voras <ivo...@gmail.com> wrote:

> And in any case, there's no crc32 in the built-in pgcrypto module.
>
>
> On 17 June 2016 at 06:18, Claudio Freire <klaussfre...@gmail.com> wrote:
>
>> On Fri, Jun 17, 2016 at 1:09 AM, julyanto SUTANDANG
>> <julya...@equnix.co.id> wrote:
>> > This way is doing faster using crc32(data) than hashtext since crc32 is
>> > hardware accelerated in intel (and others perhaps)
>> > this way (crc32)  is no way the same as hash, much way faster than
>> others...
>> >
>> > Regards,
>>
>> Sure, but I've had uniformity issues with crc32.
>>
>
>


-- 


Julyanto SUTANDANG

Equnix Business Solutions, PT
(An Open Source an Open Mind Company)

Pusat Niaga ITC Roxy Mas Blok C2/42.  Jl. KH Hasyim Ashari 125, Jakarta
Pusat
T: +6221 2282 F: +62216315281 M: +628164858028


Caution: The information enclosed in this email (and any attachments) may
be legally privileged and/or confidential and is intended only for the use
of the addressee(s). No addressee should forward, print, copy, or otherwise
reproduce this message in any manner that would allow it to be viewed by
any individual not originally listed as a recipient. If the reader of this
message is not the intended recipient, you are hereby notified that any
unauthorized disclosure, dissemination, distribution, copying or the taking
of any action in reliance on the information herein is strictly prohibited.
If you have received this communication in error, please immediately notify
the sender and delete this message.Unless it is made by the authorized
person, any views expressed in this message are those of the individual
sender and may not necessarily reflect the views of PT Equnix Business
Solutions.


Re: [PERFORM] Indexes for hashes

2016-06-16 Thread julyanto SUTANDANG
This way is doing faster using crc32(data) than hashtext since crc32 is
hardware accelerated in intel (and others perhaps)
this way (crc32)  is no way the same as hash, much way faster than
others...

Regards,


On Fri, Jun 17, 2016 at 10:51 AM, Claudio Freire <klaussfre...@gmail.com>
wrote:

> On Wed, Jun 15, 2016 at 6:34 AM, Ivan Voras <ivo...@gmail.com> wrote:
> >
> > I have an application which stores a large amounts of hex-encoded hash
> > strings (nearly 100 GB of them), which means:
> >
> > The number of distinct characters (alphabet) is limited to 16
> > Each string is of the same length, 64 characters
> > The strings are essentially random
> >
> > Creating a B-Tree index on this results in the index size being larger
> than
> > the table itself, and there are disk space constraints.
> >
> > I've found the SP-GIST radix tree index, and thought it could be a good
> > match for the data because of the above constraints. An attempt to
> create it
> > (as in CREATE INDEX ON t USING spgist(field_name)) apparently takes more
> > than 12 hours (while a similar B-tree index takes a few hours at most),
> so
> > I've interrupted it because "it probably is not going to finish in a
> > reasonable time". Some slides I found on the spgist index allude that
> both
> > build time and size are not really suitable for this purpose.
>
>
> I've found that hash btree indexes tend to perform well in these
> situations:
>
> CREATE INDEX ON t USING btree (hashtext(fieldname));
>
> However, you'll have to modify your queries to query for both, the
> hashtext and the text itself:
>
> SELECT * FROM t WHERE hashtext(fieldname) = hashtext('blabla') AND
> fieldname = 'blabla';
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>



-- 


Julyanto SUTANDANG

Equnix Business Solutions, PT
(An Open Source an Open Mind Company)

Pusat Niaga ITC Roxy Mas Blok C2/42.  Jl. KH Hasyim Ashari 125, Jakarta
Pusat
T: +6221 2282 F: +62216315281 M: +628164858028


Caution: The information enclosed in this email (and any attachments) may
be legally privileged and/or confidential and is intended only for the use
of the addressee(s). No addressee should forward, print, copy, or otherwise
reproduce this message in any manner that would allow it to be viewed by
any individual not originally listed as a recipient. If the reader of this
message is not the intended recipient, you are hereby notified that any
unauthorized disclosure, dissemination, distribution, copying or the taking
of any action in reliance on the information herein is strictly prohibited.
If you have received this communication in error, please immediately notify
the sender and delete this message.Unless it is made by the authorized
person, any views expressed in this message are those of the individual
sender and may not necessarily reflect the views of PT Equnix Business
Solutions.


Re: [PERFORM] Indexes for hashes

2016-06-15 Thread julyanto SUTANDANG
Hi Ivan,

How about using crc32 ? and then index the integer as the result of crc32
function? you can split the hash into 2 part and do crc32 2x ? and then
create composite index on both integer (the crc32 result)
instead of using 64 char, you only employ 2 integer as index key.

Regards,

Jul

On Wed, Jun 15, 2016 at 8:54 PM, Ivan Voras <ivo...@gmail.com> wrote:

> Hi,
>
> I understand your idea, and have also been thinking about it. Basically,
> existing applications would need to be modified, however slightly, and that
> wouldn't be good.
>
>
>
>
> On 15 June 2016 at 15:38, hubert depesz lubaczewski <dep...@depesz.com>
> wrote:
>
>> On Wed, Jun 15, 2016 at 11:34:18AM +0200, Ivan Voras wrote:
>> > I have an application which stores a large amounts of hex-encoded hash
>> > strings (nearly 100 GB of them), which means:
>>
>> Why do you keep them hex encoded, and not use bytea?
>>
>> I made a sample table with 1 million rows, looking like this:
>>
>>  Table "public.new"
>>  Column  | Type  | Modifiers
>> -+---+---
>>  texthex | text  |
>>  a_bytea | bytea |
>>
>> values are like:
>>
>> $ select * from new limit 10;
>>  texthex  |
>> a_bytea
>>
>> --+
>>  c968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f |
>> \xc968f64426b941bc9a8f6d4e87fc151c7a7192679837618570b7989c67c31e2f
>>  61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db |
>> \x61dffbf002d7fc3db9df5953aca7f2e434a78d4c5fdd0db6f90f43ee8c4371db
>>  757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033 |
>> \x757acf228adf2357356fd38d03b80529771f211e0ad3b35b66a23d6de53e5033
>>  fba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15 |
>> \xfba35b8b33a7fccc2ac7d96389d43be509ff17636fe3c0f8a33af6d009f84f15
>>  ecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a |
>> \xecd8587a8b9acae650760cea8683f8e1c131c4054c0d64b1d7de0ff269ccc61a
>>  11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea |
>> \x11782c73bb3fc9f281b41d3eff8c1e7907b3494b3abe7b6982c1e88f49dad2ea
>>  5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852 |
>> \x5862bd8d645e4d44997a485c616bc18f1acabeaec5df3c3b09b9d4c08643e852
>>  2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c |
>> \x2d09a5cca2c03153a55faa3aff13df0f0593f4355a1b2cfcf9237c2931b4918c
>>  2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7 |
>> \x2186eb2bcc12319ee6e00f7e08a1d61e379a75c01c579c29d0338693bc31c7c7
>>  2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa |
>> \x2061bd05049c51bd1162e4d77f72a37f06d2397fc522ef587ed172a5ad8d57aa
>> (10 rows)
>>
>> created two indexes:
>> create index i1 on new (texthex);
>> create index i2 on new (a_bytea);
>>
>> i1 is 91MB, and i2 is 56MB.
>>
>> Index creation was also much faster - best out of 3 runs for i1 was
>> 4928.982
>> ms, best out of 3 runs for i2 was 2047.648 ms
>>
>> Best regards,
>>
>> depesz
>>
>>
>


-- 


Julyanto SUTANDANG

Equnix Business Solutions, PT
(An Open Source an Open Mind Company)

Pusat Niaga ITC Roxy Mas Blok C2/42.  Jl. KH Hasyim Ashari 125, Jakarta
Pusat
T: +6221 2282 F: +62216315281 M: +628164858028


Caution: The information enclosed in this email (and any attachments) may
be legally privileged and/or confidential and is intended only for the use
of the addressee(s). No addressee should forward, print, copy, or otherwise
reproduce this message in any manner that would allow it to be viewed by
any individual not originally listed as a recipient. If the reader of this
message is not the intended recipient, you are hereby notified that any
unauthorized disclosure, dissemination, distribution, copying or the taking
of any action in reliance on the information herein is strictly prohibited.
If you have received this communication in error, please immediately notify
the sender and delete this message.Unless it is made by the authorized
person, any views expressed in this message are those of the individual
sender and may not necessarily reflect the views of PT Equnix Business
Solutions.