Re: [SPAM] [GENERAL] COPY command & binary format

2016-05-14 Thread Nicolas Paris
Well the job is done. The talend component is working (
https://github.com/parisni/talend/tree/master/tPostgresqlOutputBulkAPHP).
It allows creating a file (binary or csv) locally, and then use the COPY
function with "FROM STDIN" that does not need to push the file on a remote
database server.

I have made a little comparison test:

column1: character varying
column2: integer
column3: boolean
10 000 000 tuples

Type| Create file time   | Bulk load time   | Total
Time  |   File size
Binary | 11137 milliseconds |  21661 milliseconds | 32798 milliseconds |
250 MO
CSV | 23226 milliseconds |  22192 milliseconds |  45418 milliseconds |
179 MO


Binary format is definitely faster and safer
- faster because writing binary is faster than text file. I guess the bulk
load time bottleneck is the network, then this is equivalent for both
format. It is two time faster to load a binary when the file is on the
database server.
- safer thanks to the format (each value is preceded by its lenght) more
robust thant CSV and separators (that can be present in the text).


Code has been based on :

-
https://github.com/uwescience/myria/blob/master/src/edu/washington/escience/myria/PostgresBinaryTupleWriter.java
-
https://github.com/bytefish/PgBulkInsert/tree/master/PgBulkInsert/src/main/de/bytefish/pgbulkinsert/pgsql/handlers

Thanks,

2016-05-10 15:08 GMT+02:00 Cat :

> On Tue, May 10, 2016 at 03:00:55PM +0200, Nicolas Paris wrote:
> > > The way I want is :
> > > csv -> binary -> postgresql
> > >
> > > Is this just to be quicker or are you going to add some business logic
> > > while converting CSV data?
> > > As you mentioned ETL, I assume the second, as I don't think that
> > > converting CSV to binary and then loading it to PostgreSQL will be more
> > > convenient than loading directly from CSV... as quicker as it can be,
> you
> > > have anyway to load data from CSV.
> > >
> > ​
> > Right, ETL process means huge business logic.
> > get the data (csv or other) -> transform it -> produce a binary -> copy
> > from binary from stdin ​
> >
> > Producing 100GO CSVs, is a waste of time.
>
> Ah. You need to fiddle with the data. Then you need to weigh the pros of
> something agnostic to Postgres's internals to something that needs to be
> aware of them.
>
> You will need to delve into the source code for data types more complex
> than INTEGER, TEXT and BYTEA (which was the majority of my data when I
> was just looking into it).
>
> --
>   "A search of his car uncovered pornography, a homemade sex aid, women's
>   stockings and a Jack Russell terrier."
> -
> http://www.dailytelegraph.com.au/news/wacky/indeed/story-e6frev20-118083480
>


Re: [SPAM] [GENERAL] COPY command & binary format

2016-05-10 Thread Cat
On Tue, May 10, 2016 at 03:00:55PM +0200, Nicolas Paris wrote:
> > The way I want is :
> > csv -> binary -> postgresql
> >
> > Is this just to be quicker or are you going to add some business logic
> > while converting CSV data?
> > As you mentioned ETL, I assume the second, as I don't think that
> > converting CSV to binary and then loading it to PostgreSQL will be more
> > convenient than loading directly from CSV... as quicker as it can be, you
> > have anyway to load data from CSV.
> >
> ​
> Right, ETL process means huge business logic.
> get the data (csv or other) -> transform it -> produce a binary -> copy
> from binary from stdin ​
> 
> Producing 100GO CSVs, is a waste of time.

Ah. You need to fiddle with the data. Then you need to weigh the pros of
something agnostic to Postgres's internals to something that needs to be
aware of them.

You will need to delve into the source code for data types more complex
than INTEGER, TEXT and BYTEA (which was the majority of my data when I
was just looking into it).

-- 
  "A search of his car uncovered pornography, a homemade sex aid, women's 
  stockings and a Jack Russell terrier."
- 
http://www.dailytelegraph.com.au/news/wacky/indeed/story-e6frev20-118083480


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [SPAM] [GENERAL] COPY command & binary format

2016-05-10 Thread Cat
On Tue, May 10, 2016 at 01:38:12PM +0200, Nicolas Paris wrote:
> The way I want is :
> csv -> binary -> postgresql
> 
> And if possible, transforming csv to binary throught java​.
> 
> Use case is ETL process.

Not sure what the point would be tbh if the data is already in CSV.
You might aswell submit the CSV to postgres and let it deal with it.
It'll probably be faster. It'll also be more portable. The BINARY
format is what Postgres uses internally (more or less). I had to
look at the source code to figure out how to insert a timestamp
(FYI: Postgres stores timestamps as epoch based off the year 2000 not
1970 amongst other fun things).


-- 
  "A search of his car uncovered pornography, a homemade sex aid, women's 
  stockings and a Jack Russell terrier."
- 
http://www.dailytelegraph.com.au/news/wacky/indeed/story-e6frev20-118083480


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [SPAM] [GENERAL] COPY command & binary format

2016-05-10 Thread Nicolas Paris
2016-05-10 14:47 GMT+02:00 Moreno Andreo :

> Il 10/05/2016 13:38, Nicolas Paris ha scritto:
>
> 2016-05-10 13:04 GMT+02:00 Moreno Andreo :
>
>> Il 10/05/2016 12:56, Nicolas Paris ha scritto:
>>
>> Hello,
>>
>> What is the way to build a binary format (instead of a csv) ? Is there
>> specification for this file ?
>> http://www.postgresql.org/docs/9.5/static/sql-copy.html
>>
>> I always create binary files with
>> COPY table TO 'path/to/file' WITH BINARY
>>
>>
> ​ Fine, this works in this way :
> postgresql -> binary
> binary -> postgresql
>
> The way I want is :
> csv -> binary -> postgresql
>
> Is this just to be quicker or are you going to add some business logic
> while converting CSV data?
> As you mentioned ETL, I assume the second, as I don't think that
> converting CSV to binary and then loading it to PostgreSQL will be more
> convenient than loading directly from CSV... as quicker as it can be, you
> have anyway to load data from CSV.
>
​
Right, ETL process means huge business logic.
get the data (csv or other) -> transform it -> produce a binary -> copy
from binary from stdin ​

Producing 100GO CSVs, is a waste of time.



> Binary file format is briefly described in the last part of the doc you
> linked, under "Binary format", and there's also reference to source files.
>
>
> And if possible, transforming csv to binary throught java​.
>
> This is beyond my knowledge, ATM. I'm just starting with Java and JDBC is
> still in the TODO list, sorry... :-)
>
> Cheers
> Moreno.-
>

​Documentation explains a bit. Moreover, I have found a detailled answer
here :
​
http://stackoverflow.com/questions/14242117/java-library-to-write-binary-format-for-postgres-copy
​

My ultimate goal is to encapsulate it in a Talend component. (talend is an
open-source java based ETL software).

Thanks, I ll keep you aware.


Re: [SPAM] [GENERAL] COPY command & binary format

2016-05-10 Thread Moreno Andreo

  
  
Il 10/05/2016 13:38, Nicolas Paris ha
  scritto:


  
2016-05-10 13:04 GMT+02:00 Moreno
  Andreo :
  

  
  Il 10/05/2016 12:56, Nicolas Paris ha scritto:
  
  

  Hello,

  
  What

is the way to build a binary format (instead of
a csv) ? Is there specification for this file ?
http://www.postgresql.org/docs/9.5/static/sql-copy.html
  

  
 I always create binary files with
COPY table TO 'path/to/file' WITH BINARY

  

 
  
  ​
Fine, this works in this way : 
postgresql -> binary
  
  binary
-> postgresql

  
  The
way I want is :
  
  csv
-> binary -> postgresql
  

  

Is this just to be quicker or are you going to add some business
logic while converting CSV data?
As you mentioned ETL, I assume the second, as I don't think that
converting CSV to binary and then loading it to PostgreSQL will be
more convenient than loading directly from CSV... as quicker as it
can be, you have anyway to load data from CSV.

Binary file format is briefly described in the last part of the doc
you linked, under "Binary format", and there's also reference to
source files.

  

  
  
  And
if possible, transforming csv to binary throught java​.
  

  

This is beyond my knowledge, ATM. I'm just starting with Java and
JDBC is still in the TODO list, sorry... :-)

Cheers
Moreno.-
  





Re: [GENERAL] COPY command & binary format

2016-05-10 Thread Pujol Mathieu



Le 10/05/2016 à 12:56, Nicolas Paris a écrit :

Hello,

What is the way to build a binary format (instead of a csv) ? Is there 
specification for this file ?

http://www.postgresql.org/docs/9.5/static/sql-copy.html

Could I create such format from java ?

I guess this would be far faster, and maybe safer than CSVs

Thanks by advance,

Hi
Making a driver that do what you want is not difficult. You will achieve 
better performances than than loading data from CSV, and you also will 
have better precision for floating values (there is no text conversion).
In the link you provide there is a description of the file format in 
section Binary Format.

Mathieu Pujol



Re: [SPAM] [GENERAL] COPY command & binary format

2016-05-10 Thread Nicolas Paris
2016-05-10 13:04 GMT+02:00 Moreno Andreo :

> Il 10/05/2016 12:56, Nicolas Paris ha scritto:
>
> Hello,
>
> What is the way to build a binary format (instead of a csv) ? Is there
> specification for this file ?
> http://www.postgresql.org/docs/9.5/static/sql-copy.html
>
> I always create binary files with
> COPY table TO 'path/to/file' WITH BINARY
>
>
​Fine, this works in this way :
postgresql -> binary
binary -> postgresql

The way I want is :
csv -> binary -> postgresql

And if possible, transforming csv to binary throught java​.

Use case is ETL process.


Re: [GENERAL] COPY command & binary format

2016-05-10 Thread Sameer Kumar
On Tue, May 10, 2016 at 4:36 PM Sameer Kumar 
wrote:

> On Tue, May 10, 2016 at 4:26 PM Nicolas Paris  wrote:
>
>> Hello,
>>
>> What is the way to build a binary format (instead of a csv) ? Is there
>> specification for this file ?
>> http://www.postgresql.org/docs/9.5/static/sql-copy.html
>>
>
>>
>> Could I create such format from java ?
>>
>
> You can use COPY JDBC API to copy to STDOUT and then compress it before
> you use usual Java file operations to write it to a file. You will have to
> follow the reverse process while reading from this file and LOADING to a
> table.
>
> But why would you want to do that?
>
>
>>
>> I guess this would be far faster, and maybe safer than CSVs
>>
>
> I don't think assumption is right. COPY is not meant for backup, it is for
> LOAD and UN-LOAD.
>
> What you probably need is pg_dump with -Fc format.
> http://www.postgresql.org/docs/current/static/app-pgdump.html
>
>

Like someone else suggested upthread you can use Binary format in COPY
command (default is text)


>
>> Thanks by advance,
>>
> --
> --
> Best Regards
> Sameer Kumar | DB Solution Architect
> *ASHNIK PTE. LTD.*
>
> 101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533
>
> T: +65 6438 3504 | M: +65 8110 0350 | www.ashnik.com
>
-- 
--
Best Regards
Sameer Kumar | DB Solution Architect
*ASHNIK PTE. LTD.*

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533

T: +65 6438 3504 | M: +65 8110 0350 | www.ashnik.com


Re: [GENERAL] COPY command & binary format

2016-05-10 Thread Sameer Kumar
On Tue, May 10, 2016 at 4:26 PM Nicolas Paris  wrote:

> Hello,
>
> What is the way to build a binary format (instead of a csv) ? Is there
> specification for this file ?
> http://www.postgresql.org/docs/9.5/static/sql-copy.html
>

>
> Could I create such format from java ?
>

You can use COPY JDBC API to copy to STDOUT and then compress it before you
use usual Java file operations to write it to a file. You will have to
follow the reverse process while reading from this file and LOADING to a
table.

But why would you want to do that?


>
> I guess this would be far faster, and maybe safer than CSVs
>

I don't think assumption is right. COPY is not meant for backup, it is for
LOAD and UN-LOAD.

What you probably need is pg_dump with -Fc format.
http://www.postgresql.org/docs/current/static/app-pgdump.html


>
> Thanks by advance,
>
-- 
--
Best Regards
Sameer Kumar | DB Solution Architect
*ASHNIK PTE. LTD.*

101 Cecil Street, #11-11 Tong Eng Building, Singapore 069 533

T: +65 6438 3504 | M: +65 8110 0350 | www.ashnik.com


Re: [SPAM] [GENERAL] COPY command & binary format

2016-05-10 Thread Moreno Andreo

  
  
Il 10/05/2016 12:56, Nicolas Paris ha
  scritto:


  
Hello,
  

What
  is the way to build a binary format (instead of a csv) ? Is
  there specification for this file ?
  http://www.postgresql.org/docs/9.5/static/sql-copy.html

  

I always create binary files with
COPY table TO 'path/to/file' WITH BINARY

Cheers
Moreno.-
  





[GENERAL] COPY command & binary format

2016-05-10 Thread Nicolas Paris
Hello,

What is the way to build a binary format (instead of a csv) ? Is there
specification for this file ?
http://www.postgresql.org/docs/9.5/static/sql-copy.html

Could I create such format from java ?

I guess this would be far faster, and maybe safer than CSVs

Thanks by advance,