Re: [SPAM] [GENERAL] COPY command & binary format
Well the job is done. The talend component is working ( https://github.com/parisni/talend/tree/master/tPostgresqlOutputBulkAPHP). It allows creating a file (binary or csv) locally, and then use the COPY function with "FROM STDIN" that does not need to push the file on a remote database server. I have made a little comparison test: column1: character varying column2: integer column3: boolean 10 000 000 tuples Type| Create file time | Bulk load time | Total Time | File size Binary | 11137 milliseconds | 21661 milliseconds | 32798 milliseconds | 250 MO CSV | 23226 milliseconds | 22192 milliseconds | 45418 milliseconds | 179 MO Binary format is definitely faster and safer - faster because writing binary is faster than text file. I guess the bulk load time bottleneck is the network, then this is equivalent for both format. It is two time faster to load a binary when the file is on the database server. - safer thanks to the format (each value is preceded by its lenght) more robust thant CSV and separators (that can be present in the text). Code has been based on : - https://github.com/uwescience/myria/blob/master/src/edu/washington/escience/myria/PostgresBinaryTupleWriter.java - https://github.com/bytefish/PgBulkInsert/tree/master/PgBulkInsert/src/main/de/bytefish/pgbulkinsert/pgsql/handlers Thanks, 2016-05-10 15:08 GMT+02:00 Cat: > On Tue, May 10, 2016 at 03:00:55PM +0200, Nicolas Paris wrote: > > > The way I want is : > > > csv -> binary -> postgresql > > > > > > Is this just to be quicker or are you going to add some business logic > > > while converting CSV data? > > > As you mentioned ETL, I assume the second, as I don't think that > > > converting CSV to binary and then loading it to PostgreSQL will be more > > > convenient than loading directly from CSV... as quicker as it can be, > you > > > have anyway to load data from CSV. > > > > > > > Right, ETL process means huge business logic. > > get the data (csv or other) -> transform it -> produce a binary -> copy > > from binary from stdin > > > > Producing 100GO CSVs, is a waste of time. > > Ah. You need to fiddle with the data. Then you need to weigh the pros of > something agnostic to Postgres's internals to something that needs to be > aware of them. > > You will need to delve into the source code for data types more complex > than INTEGER, TEXT and BYTEA (which was the majority of my data when I > was just looking into it). > > -- > "A search of his car uncovered pornography, a homemade sex aid, women's > stockings and a Jack Russell terrier." > - > http://www.dailytelegraph.com.au/news/wacky/indeed/story-e6frev20-118083480 >
Re: [SPAM] [GENERAL] COPY command & binary format
On Tue, May 10, 2016 at 03:00:55PM +0200, Nicolas Paris wrote: > > The way I want is : > > csv -> binary -> postgresql > > > > Is this just to be quicker or are you going to add some business logic > > while converting CSV data? > > As you mentioned ETL, I assume the second, as I don't think that > > converting CSV to binary and then loading it to PostgreSQL will be more > > convenient than loading directly from CSV... as quicker as it can be, you > > have anyway to load data from CSV. > > > > Right, ETL process means huge business logic. > get the data (csv or other) -> transform it -> produce a binary -> copy > from binary from stdin > > Producing 100GO CSVs, is a waste of time. Ah. You need to fiddle with the data. Then you need to weigh the pros of something agnostic to Postgres's internals to something that needs to be aware of them. You will need to delve into the source code for data types more complex than INTEGER, TEXT and BYTEA (which was the majority of my data when I was just looking into it). -- "A search of his car uncovered pornography, a homemade sex aid, women's stockings and a Jack Russell terrier." - http://www.dailytelegraph.com.au/news/wacky/indeed/story-e6frev20-118083480 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [SPAM] [GENERAL] COPY command & binary format
On Tue, May 10, 2016 at 01:38:12PM +0200, Nicolas Paris wrote: > The way I want is : > csv -> binary -> postgresql > > And if possible, transforming csv to binary throught java. > > Use case is ETL process. Not sure what the point would be tbh if the data is already in CSV. You might aswell submit the CSV to postgres and let it deal with it. It'll probably be faster. It'll also be more portable. The BINARY format is what Postgres uses internally (more or less). I had to look at the source code to figure out how to insert a timestamp (FYI: Postgres stores timestamps as epoch based off the year 2000 not 1970 amongst other fun things). -- "A search of his car uncovered pornography, a homemade sex aid, women's stockings and a Jack Russell terrier." - http://www.dailytelegraph.com.au/news/wacky/indeed/story-e6frev20-118083480 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [SPAM] [GENERAL] COPY command & binary format
2016-05-10 14:47 GMT+02:00 Moreno Andreo: > Il 10/05/2016 13:38, Nicolas Paris ha scritto: > > 2016-05-10 13:04 GMT+02:00 Moreno Andreo : > >> Il 10/05/2016 12:56, Nicolas Paris ha scritto: >> >> Hello, >> >> What is the way to build a binary format (instead of a csv) ? Is there >> specification for this file ? >> http://www.postgresql.org/docs/9.5/static/sql-copy.html >> >> I always create binary files with >> COPY table TO 'path/to/file' WITH BINARY >> >> > Fine, this works in this way : > postgresql -> binary > binary -> postgresql > > The way I want is : > csv -> binary -> postgresql > > Is this just to be quicker or are you going to add some business logic > while converting CSV data? > As you mentioned ETL, I assume the second, as I don't think that > converting CSV to binary and then loading it to PostgreSQL will be more > convenient than loading directly from CSV... as quicker as it can be, you > have anyway to load data from CSV. > Right, ETL process means huge business logic. get the data (csv or other) -> transform it -> produce a binary -> copy from binary from stdin Producing 100GO CSVs, is a waste of time. > Binary file format is briefly described in the last part of the doc you > linked, under "Binary format", and there's also reference to source files. > > > And if possible, transforming csv to binary throught java. > > This is beyond my knowledge, ATM. I'm just starting with Java and JDBC is > still in the TODO list, sorry... :-) > > Cheers > Moreno.- > Documentation explains a bit. Moreover, I have found a detailled answer here : http://stackoverflow.com/questions/14242117/java-library-to-write-binary-format-for-postgres-copy My ultimate goal is to encapsulate it in a Talend component. (talend is an open-source java based ETL software). Thanks, I ll keep you aware.
Re: [SPAM] [GENERAL] COPY command & binary format
Il 10/05/2016 13:38, Nicolas Paris ha scritto: 2016-05-10 13:04 GMT+02:00 Moreno Andreo: Il 10/05/2016 12:56, Nicolas Paris ha scritto: Hello, What is the way to build a binary format (instead of a csv) ? Is there specification for this file ? http://www.postgresql.org/docs/9.5/static/sql-copy.html I always create binary files with COPY table TO 'path/to/file' WITH BINARY Fine, this works in this way : postgresql -> binary binary -> postgresql The way I want is : csv -> binary -> postgresql Is this just to be quicker or are you going to add some business logic while converting CSV data? As you mentioned ETL, I assume the second, as I don't think that converting CSV to binary and then loading it to PostgreSQL will be more convenient than loading directly from CSV... as quicker as it can be, you have anyway to load data from CSV. Binary file format is briefly described in the last part of the doc you linked, under "Binary format", and there's also reference to source files. And if possible, transforming csv to binary throught java. This is beyond my knowledge, ATM. I'm just starting with Java and JDBC is still in the TODO list, sorry... :-) Cheers Moreno.-
Re: [GENERAL] COPY command & binary format
Le 10/05/2016 à 12:56, Nicolas Paris a écrit : Hello, What is the way to build a binary format (instead of a csv) ? Is there specification for this file ? http://www.postgresql.org/docs/9.5/static/sql-copy.html Could I create such format from java ? I guess this would be far faster, and maybe safer than CSVs Thanks by advance, Hi Making a driver that do what you want is not difficult. You will achieve better performances than than loading data from CSV, and you also will have better precision for floating values (there is no text conversion). In the link you provide there is a description of the file format in section Binary Format. Mathieu Pujol
Re: [SPAM] [GENERAL] COPY command & binary format
2016-05-10 13:04 GMT+02:00 Moreno Andreo: > Il 10/05/2016 12:56, Nicolas Paris ha scritto: > > Hello, > > What is the way to build a binary format (instead of a csv) ? Is there > specification for this file ? > http://www.postgresql.org/docs/9.5/static/sql-copy.html > > I always create binary files with > COPY table TO 'path/to/file' WITH BINARY > > Fine, this works in this way : postgresql -> binary binary -> postgresql The way I want is : csv -> binary -> postgresql And if possible, transforming csv to binary throught java. Use case is ETL process.
Re: [GENERAL] COPY command & binary format
On Tue, May 10, 2016 at 4:36 PM Sameer Kumarwrote: > On Tue, May 10, 2016 at 4:26 PM Nicolas Paris wrote: > >> Hello, >> >> What is the way to build a binary format (instead of a csv) ? Is there >> specification for this file ? >> http://www.postgresql.org/docs/9.5/static/sql-copy.html >> > >> >> Could I create such format from java ? >> > > You can use COPY JDBC API to copy to STDOUT and then compress it before > you use usual Java file operations to write it to a file. You will have to > follow the reverse process while reading from this file and LOADING to a > table. > > But why would you want to do that? > > >> >> I guess this would be far faster, and maybe safer than CSVs >> > > I don't think assumption is right. COPY is not meant for backup, it is for > LOAD and UN-LOAD. > > What you probably need is pg_dump with -Fc format. > http://www.postgresql.org/docs/current/static/app-pgdump.html > > Like someone else suggested upthread you can use Binary format in COPY command (default is text) > >> Thanks by advance, >> > -- > -- > Best Regards > Sameer Kumar | DB Solution Architect > *ASHNIK PTE. LTD.* > > 101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533 > > T: +65 6438 3504 | M: +65 8110 0350 | www.ashnik.com > -- -- Best Regards Sameer Kumar | DB Solution Architect *ASHNIK PTE. LTD.* 101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533 T: +65 6438 3504 | M: +65 8110 0350 | www.ashnik.com
Re: [GENERAL] COPY command & binary format
On Tue, May 10, 2016 at 4:26 PM Nicolas Pariswrote: > Hello, > > What is the way to build a binary format (instead of a csv) ? Is there > specification for this file ? > http://www.postgresql.org/docs/9.5/static/sql-copy.html > > > Could I create such format from java ? > You can use COPY JDBC API to copy to STDOUT and then compress it before you use usual Java file operations to write it to a file. You will have to follow the reverse process while reading from this file and LOADING to a table. But why would you want to do that? > > I guess this would be far faster, and maybe safer than CSVs > I don't think assumption is right. COPY is not meant for backup, it is for LOAD and UN-LOAD. What you probably need is pg_dump with -Fc format. http://www.postgresql.org/docs/current/static/app-pgdump.html > > Thanks by advance, > -- -- Best Regards Sameer Kumar | DB Solution Architect *ASHNIK PTE. LTD.* 101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533 T: +65 6438 3504 | M: +65 8110 0350 | www.ashnik.com
Re: [SPAM] [GENERAL] COPY command & binary format
Il 10/05/2016 12:56, Nicolas Paris ha scritto: Hello, What is the way to build a binary format (instead of a csv) ? Is there specification for this file ? http://www.postgresql.org/docs/9.5/static/sql-copy.html I always create binary files with COPY table TO 'path/to/file' WITH BINARY Cheers Moreno.-
[GENERAL] COPY command & binary format
Hello, What is the way to build a binary format (instead of a csv) ? Is there specification for this file ? http://www.postgresql.org/docs/9.5/static/sql-copy.html Could I create such format from java ? I guess this would be far faster, and maybe safer than CSVs Thanks by advance,