I have found cursors example in plperl. now it works fine. the code is below.
CREATE OR REPLACE FUNCTION uniprot_frekvencije.pattern_counter2("patLength" integer) RETURNS character varying AS $BODY$ my $sth = spi_query("select sequence from entry"); my $patLen = $_[0]; my $patt = ''; my $row; my %patterns=(); while (defined ($row = spi_fetchrow($sth))) { my $seq = $row->{sequence}; for (my $x = 0;$x<=length($seq) - $patLen;$x++){ $patt=substr($seq,$x,$patLen); if (! defined $patterns{$patt}) { $patterns{$patt}=1; }else{ $patterns{$patt}++; } } } foreach $patt (keys %patterns){ my $sql="insert into patterns values('".$patt."',".$patterns{$patt}.")"; spi_exec_query($sql); } return ''; $BODY$ LANGUAGE plperl VOLATILE COST 100; On Mon, Jul 11, 2011 at 4:42 PM, Viktor Bojović <viktor.bojo...@gmail.com>wrote: > Thanx Wayne and Pavel, > > i will try to make this work with cursors. Theoretically there can be more > than 4 bilion records in result, but only 58mil returned which took 3 days. > Possibly i will have to create temporary table and store results form > %patterns into it after each 500k records, and group them at the end. > > I didn't manage to find example where plpgsql uses hash arrays or where > plperl uses cursors. > Any of these methods links/examples would be helpful to me. > > Thanx in advance > > > On Fri, Jul 8, 2011 at 9:49 PM, <lists-pg...@useunix.net> wrote: > >> Hi Viktor, >> >> I'm not sure what your requirements are in terms of performance and >> stability of the your result set. See Pavel's response. A cursor issues >> a single query and renders a single result set. The result set is >> static, the cursor just gives you finer control/performance when >> retrieving rows from the set. Using a transaction will also render better >> performance when %patterns contains a large number of keys/values, >> insert all of them in one transaction, the same one you opened for the >> cursor. >> >> Your method issues many queries and will take longer for each successive >> query. And the number of queries will increase as table size increases. >> It could also return duplicate rows and/or missed rows due to other >> transactions completing between your select query. >> >> If you can tolerate the above issues then so be it, if not you really >> should look at cursors. >> >> Also there might be a bug in your code if you delete entries from >> 'entry'. Your depending on $rowCountAll to remain static which is not the >> case if you ever delete entries. You can fix this by skipping the >> "select count(1)" step and just breaking your loop when less then >> $windowSize entries are returned from the "select sequence.." query. >> >> Wayne >> >> >> On Fri, Jul 08, 2011 at 08:55:36PM +0200, Viktor Bojovi?? wrote: >> > Thanx Wayne, >> > at the end i did it that way and it works. >> > The code is below. >> > >> > CREATE FUNCTION pattern_counter1("patLength" integer) RETURNS character >> > varying >> > LANGUAGE plperl >> > AS $_X$ >> > my $rvCnt = spi_exec_query("select count(1) as cnt from entry"); >> > #my $rowCountAll = $rvCnt->{processed}; >> > my $row = $rvCnt->{rows}[0]; >> > my $rowCountAll = $row->{cnt}; >> > my $windowSize = 500000; >> > my %patterns=(); >> > for (my $p=0;$p<$rowCountAll;$p+=$windowSize){ >> > my $sql="select sequence from entry limit $windowSize offset $p"; >> > >> > my $rv = spi_exec_query($sql); >> > my $rowCount = $rv->{processed}; >> > my $patLen = $_[0]; >> > my $patt = ''; >> > >> > foreach my $rn (0 .. $rowCount -1){ >> > my $row = $rv->{rows}[$rn]; >> > my $seq = $row->{sequence}; >> > for (my $x = 1;$x<=length($seq) - $patLen;$x++){ >> > $patt=substr($seq,$x,$patLen); >> > if (! defined $patterns{$patt}) { >> > $patterns{$patt}=1; >> > }else{ >> > $patterns{$patt}++; >> > } >> > } >> > } >> > } >> > >> > foreach $patt (keys %patterns){ >> > my $sql="insert into patterns values('".$patt."',".$patterns{$patt}.")"; >> > spi_exec_query($sql); >> > } >> > return $tmp; >> > $_X$; >> > >> > >> > On Fri, Jul 8, 2011 at 8:50 PM, <lists-pg...@useunix.net> wrote: >> > >> > > I'm have the same situation with large tables. Take a look at using a >> > > cursor to fetch several thousand rows at a time. I presume what's >> > > happening is that perl is attempting to create a massive list/array in >> > > memory. If you use a cursor the list should only contain X number of >> > > rows where X in the number specified at each fetch execution. You'll >> > > need to define the cursor inside a transaction block. >> > > >> > > - begin transaction >> > > - define the cursor >> > > - fetch rows from cursor >> > > - while row count from previous step > 0, execute previous step >> > > - terminate transaction >> > > >> > > Or you could use plpgsql instead of plperl, FOR loops over result sets >> in >> > > plpgsql implicitly use cursors... it's just a little less code. >> > > >> > > Hope that helps, >> > > Wayne >> > > >> > > On Tue, Jul 05, 2011 at 10:29:03PM +0200, Viktor Bojovi?? wrote: >> > > > Hi, >> > > > while reading 20GB table through PL/PERL function , it constantly >> grows >> > > in >> > > > RAM. >> > > > I wanted to ask you which is the best way to read table inside that >> > > > function without such memory consumption. >> > > > Thanks in advance >> > > > >> > > > Code is here: >> > > > >> > > > CREATE FUNCTION pattern_counter("patLength" integer) >> > > > RETURNS varchar AS >> > > > $BODY$ >> > > > my $rv = spi_exec_query("select sequence from entry"); >> > > > my $rowCount = $rv->{processed}; >> > > > my $patLen = $_[0]; >> > > > my $patt = ''; >> > > > my %patterns=(); >> > > > foreach my $rn (0 .. $rowCount -1){ >> > > > my $row = $rv->{rows}[$rn]; >> > > > my $seq = $row->{sequence}; >> > > > for (my $x = 1;$x<=length($seq) - $patLen;$x++){ >> > > > $patt=substr($seq,$x,$patLen); >> > > > if (! defined $patterns{$patt}) { >> > > > $patterns{$patt}=1; >> > > > }else{ >> > > > $patterns{$patt}++; >> > > > } >> > > > } >> > > > } >> > > > foreach $patt (keys %patterns){ >> > > > my $sql="insert into patterns >> > > values('".$patt."',".$patterns{$patt}.")"; >> > > > spi_exec_query($sql); >> > > > } >> > > > return ''; >> > > > $BODY$ >> > > > LANGUAGE plperl VOLATILE >> > > > COST 100; >> > > > >> > > > >> > > > >> > > > -- >> > > > --------------------------------------- >> > > > Viktor Bojovi?? >> > > > --------------------------------------- >> > > > Wherever I go, Murphy goes with me >> > > >> > >> > >> > >> > -- >> > --------------------------------------- >> > Viktor Bojovi?? >> > --------------------------------------- >> > Wherever I go, Murphy goes with me >> > > > > -- > --------------------------------------- > Viktor Bojović > > --------------------------------------- > Wherever I go, Murphy goes with me > -- --------------------------------------- Viktor Bojović --------------------------------------- Wherever I go, Murphy goes with me