Re: [GENERAL] postgres pg_bulkload c filter function in c programming

2016-12-29 Thread rajmhn
Thanks Francis.

Meanwhile just gone through usage of perl in postgres function. In turn,
this Postgres function can be called in pg_bulkload. I think, this task can
be done. Let me give a try.



--
View this message in context: 
http://postgresql.nabble.com/postgres-pg-bulkload-c-filter-function-in-c-programming-tp5936651p5936707.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] postgres pg_bulkload c filter function in c programming

2016-12-29 Thread rajmhn
Yes. 

I can able to apply those mentioned transformation in pgloader &
pg_bulkload-SQL filter. Yet to take performance stats. Meanwhile, I'm trying
to figure out the other best possible option. We are counting more on
performance, error & audit handling.

Thanks 



 




--
View this message in context: 
http://postgresql.nabble.com/postgres-pg-bulkload-c-filter-function-in-c-programming-tp5936651p5936705.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] postgres pg_bulkload c filter function in c programming

2016-12-29 Thread Alvaro Herrera
rajmhn wrote:

> But, how this can be accomplished when it have 100's of columns from source.
> Need to apply transformations only for few columns as mentioned above. 

Did you try pgloader?  See http://pgloader.io/

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] postgres pg_bulkload c filter function in c programming

2016-12-29 Thread Francisco Olarte
On Thu, Dec 29, 2016 at 8:41 PM, rajmhn  wrote:
> Thanks Francis.That seems to be a good solution.

Yep, but not for your problem as ...

>
> Thought to use pg_bulkload, a third party library instead of copy, where
> reject handling can be done in efficient way.

Mine was just an idea to do the part of the load you described
assuming pg_bulkload usage was optional. Not being it, it will not
work. MAYBE you can use the technique to preprocess the files for
pg_bulkload ( if possible this is nice, as the goood thing of
preprocessing them is you repeat until you get them right, no
DB-touchy ).

> Transformation(FILTER)
> functions can be implemented with any languages in pg_bulkload before it was
> loaded to table. SQL, C, PLs are ok, but you should write functions as fast
> as possible because they are called many times.



> In this case, function should be written in Perl and called inside the
> Postgressql function. Do you think that will work it out? But pg_bulkload is
> preferring C function over SQL function for performance.

I'm not familiar with pg_bulkload usage. I've read about it but all my
loading problemas have been solved better by using copy ( especially
factoring total time, I already know to use copy and a couple dozen
languages in which to write filters to preclean data for copy. In the
time I learn enough of pg_bulkload I can load filter and load a lot of
data ).

Regarding C vs perl, it seems pg_bulkload does server side processing.
In the server the funcion calling overhead is HUGE, specially when
transitioning between different languages. IMO the time spent doing
the data processing in perl would be 0 when compared with the time to
pass the data around to perl. C will be faster because the calling
barrier is smaller inside the server.

Just for data processing of things like you I've normally found
filters like the one I described can easily saturate an SSD array, and
the difference in time for processing is dwarfed by the difference in
time for developing the filter. In fact in any modern OS with write
through and readahead disk management the normal difference between
filtering in perl or C is perl may use 10% of 1 core, C 1%, perl
filter is developed in 15 minutes, C in an hour, and perl filter takes
some extra milliseconds to start. AND, if you are not familiar with
processing data in C you can easily code a slower solution than in
perl ( as perl was dessigned for this ).


> I will try this option as you suggested.

Just remember my option is not using pg_bulkload with perl stored
procedures. I cannot recommend anything if you use pg_bulkload.

I suggested using copy and perl to preclean the data. It just seemed
to me from the description of your problem you were using a too
complex tool. Now that you are introducing new terms, like reject
handling, I'll step out until I can make a sugestion ( don't bother to
define it for me, it seems a bulkload related term and I'm not able to
study that tool ).


FrancisCO Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] postgres pg_bulkload c filter function in c programming

2016-12-29 Thread rajmhn
Thanks Francis.That seems to be a good solution. 

Thought to use pg_bulkload, a third party library instead of copy, where
reject handling can be done in efficient way. Transformation(FILTER)
functions can be implemented with any languages in pg_bulkload before it was
loaded to table. SQL, C, PLs are ok, but you should write functions as fast
as possible because they are called many times.

In this case, function should be written in Perl and called inside the
Postgressql function. Do you think that will work it out? But pg_bulkload is
preferring C function over SQL function for performance.

I will try this option as you suggested.

Thanks



--
View this message in context: 
http://postgresql.nabble.com/postgres-pg-bulkload-c-filter-function-in-c-programming-tp5936651p5936695.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] postgres pg_bulkload c filter function in c programming

2016-12-29 Thread Francisco Olarte
Hi:

On Thu, Dec 29, 2016 at 3:01 PM, Jan de Visser  wrote:
> On Thursday, December 29, 2016 5:10:08 AM EST rajmhn wrote:
...
>> I'm new to C. Gone through this documentation. Not clear, how to start.
...
> It seems to me it would be much easier to load the data into a temporary
> table, and from there transform into the form you need it to be. If you're not
> experienced in C (and probably even if you are) you can import *a lot* of data
> in the time it would take you to develop that custom filter.
> Besides, your requirements don't appear to be that difficult.

for his type of requirements I would recommend going the perl ( any
similar language will do, but after using it for 25 years I find it is
the king for that ) + pipe route. I mean, write a script which spits
"copy xxx from stdin\n" plus the transformed rows, pipe it trough
psql.

Total time is difficult to beat, as debugging is very easy, write the
filtering function using <> - chomp - split - s/// - join - print and
debug it by feeding it some lines with head.

Untested code from which I remembter of the specs, could be something like:

print "Copy my_table from stdin;\n";
my $e = "The magic col-e default value";
while(<>) {
  chomp; # get rid of OPTIONAl line terminator
  # Get the columns and do the c-d column swap
  my ($a,$b,$d,$c)=split /\|/, $_;
  # Add dashed to the date in column b:
  $b =~ s/^(\d\d\d\d)(\d\d)(\d\d)$/$1-$2-$3/;
  # zap not numerics a to 0:
  ($a=~/^\d+$/) or ($a = 0);
  # And send it with the default separators ( scaping left as an
exercise to the reader, ask if needed, I've wrtten and tested it
several times ).
  print join("\t", $a,$b,$c,$d,$e),"\n";
}

Test it with "head the-datafile | perl the_script.pl |
my_favourite_pager" until correct, the beauty of this approache  is
you do not touch the db in debug, feed it to psql when done. In my
experience the perl script overhead is unnoticeable in any 2k+ machine
(and perl was specifically dessigned to be good at this kind of things
).

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] postgres pg_bulkload c filter function in c programming

2016-12-29 Thread rajmhn
Thanks Jan

I'm converting the code from Oracle to Postgres. Both version of code will
be available for different users.

In Oracle, doing these kind of transformation in SQL loader. Need to follow
the same kind of approach in Postgres. SQL filter approach was very easy in
terms of coding. From documentation found, C filter was very much faster
than SQL.

I'm very new to C. Tried few samples, as mentioned in this link.

https://www.postgresql.org/docs/current/static/xfunc-c.html

It worked. 

But, how this can be accomplished when it have 100's of columns from source.
Need to apply transformations only for few columns as mentioned above. 

Totally struck, nobody is here to help. If you can guide me with some syntax
with the sample data provided above, will be grateful. I can manage it from
there.

Thanks 



--
View this message in context: 
http://postgresql.nabble.com/postgres-pg-bulkload-c-filter-function-in-c-programming-tp5936651p5936658.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] postgres pg_bulkload c filter function in c programming

2016-12-29 Thread Jan de Visser
On Thursday, December 29, 2016 5:10:08 AM EST rajmhn wrote:
> Gurus,
> 
> Reading the data from file and loading it using pg_bulkload- C filter. As
> per documentation, C filter is much faster than SQL filter.
> 
> I'm new to C. Gone through this documentation. Not clear, how to start.
> https://www.postgresql.org/docs/current/static/xfunc-c.html.
> 
> Can someone kindly guide me to create C code, so that it can be called in
> postgres function?
> 

It seems to me it would be much easier to load the data into a temporary 
table, and from there transform into the form you need it to be. If you're not 
experienced in C (and probably even if you are) you can import *a lot* of data 
in the time it would take you to develop that custom filter. 

Besides, your requirements don't appear to be that difficult.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general