Re: [Pdl-general] Fastest way to read large file and create PDL matrices?

kmx Tue, 07 Jul 2015 01:12:04 -0700

Ben,

if all your data files will be approx. 10MB as your sample, maybe it is notnecessary to go rcols-way as suggested by Chris (which might be faster butalso a bit complicated as your data parsing is not trivial).


You can simply first parse your data file into 2 perl arrays $re and $im like:
$re->[$row][$column] = $data[2*$i+1]; # [0] houses the real value
$im->[$row][$column] = $data[2*$i+2]; # [1] the imaginary

And in the end convert them into a complex piddle by:
return pdl($re) + pdl($im) * i;

See http://pastebin.com/Rzt5CZ9Z

On my laptop the processing time went from 4.5s to 0.5s

As for the parsing memory consumption I expect that for 100+ instances ofthese matrices you should be fine with 1GB RAM.


--
kmx

On 6.7.2015 4:25, Benjamin Silva wrote:

Hi Chris,

I had just posted the code snippet with the hopes someone would look atit and see something obviously wrong. The actual subroutine that parsesthe input file is a bit more complex since the file I am parsing isn'tall just matrix data; there is a header and some other stuff that I pullout, so there are a bunch of other loops and such going on in there.That said, I've included a working copy of the script as well as a samplefile that it operates on. Any idea why it's so slow? Please forgive myawful code.


link to my code:
http://pastebin.com/JS9gTs6t

link to the sample file it operates on:
http://www.5plates.com/sample.zip

Note that this example looks relatively fast (4 seconds), but this scriptis reading just a single matrix from a file that houses just 2 matrices.I will be using this script on a file that has 100+ instances of thesematrices. That means I'll have to read each of these 100 matrices whichmakes 4 seconds turn into several minutes.


Please let me know if this helps, or if you can't read the files linked.

Thanks!
-Ben

On Sun, Jul 5, 2015 at 1:11 PM, Chris Marshall <[email protected]<mailto:[email protected]>> wrote:


    Hi Ben-

    It helps if you include a small working code example rather than
    an out-of-context snippet---something we can run.  Some thoughts:

    - rcols and wcols are useful reading an writing 2D piddles

    - $zmatrix_pre seems to be a perl array object with N x M piddle elements

    - PDL is optimized for large data operations
       - Reading in complex values one-by-one is going to be very slow when
         you have 350x350 elements.
       - You should be reading all the values in one operation and create the
          PDL from that.

    I can't help more since your code doesn't even set $row or $col but
    maybe the above thoughts will give you an idea.  You can use wcols()
    to write out a 2D piddle and rcols() to read it back. Something like
    this might be applicable from a pdl2 session:

    pdl> $im = sequence(5,5)/25;

    pdl> $re = random(5,5);

    pdl> use PDL::Complex

    pdl> p i
    0 +1i

    pdl> $c = $re + i * $im;

    pdl> p $c

    [

[0.971898 +0i 0.754039+0.04i 0.50257+0.08i 0.190826+0.12i0.613931+0.16i][0.132726 +0.2i 0.327291+0.24i 0.251733+0.28i 0.184122+0.32i0.787163+0.36i][0.103273 +0.4i 0.793739+0.44i 0.286722+0.48i 0.71684+0.52i0.939528+0.56i][0.114506 +0.6i 0.750494+0.64i 0.757878+0.68i 0.761478+0.72i0.827088+0.76i][ 0.69723 +0.8i 0.438457+0.84i 0.177937+0.88i 0.321631+0.92i0.750218+0.96i]

    ]


    I recommend trying out small cases in the pdl2 or perldl shells to
    see how things
    work.  Once you see the patterns it is easier to apply to bigger data
    in a program
    or script.

    Cheers,
    Chris



    On 7/5/2015 14:57, Benjamin Silva wrote:

    Hello,

    I recently recoded some of my old scripts to use the PDL libraries
    instead of some subroutines I had written myself.  These scripts
    were for doing matrix inversion and matrix manipulation for medium
    sized matrices of complex numbers (~350x350 max matrix size). These
    matrices are housed in plain text files, and I have a parser that
    goes through and builds a piddle out of the data in the file. My old
    subroutines were slow to do the processing, but were extremely fast
    for reading in the file and creating the matrix. The new subroutine,
    using PDL, is extremely fast to do the processing, but now it is
    crazy slow for reading in the file and creating the initial PDL.  My
    method for creating the PDL is shown below.  Can anyone please let
    me know if there's a faster way to do this?  Almost all of the time
    savings I've achieved by going to PDL have been consumed by the
    slower file parsing and PDL building.  I've run the code through a
    profiler, and it's definitely wasting a lot of cycles on the 3rd
    line in the while loop where I'm creating the cplx data structure.

    open (FILE, "$input_file") or die;
    while($inline1=<FILE>){
    chomp $inline1;
    @data = split(/\s+/, $inline1);
    $zmatrix_pre[$row][$column] = cplx($data[0]+$data[1]*i);
    }

    Thanks for any help!
    -Ben





------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/


_______________________________________________
pdl-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pdl-general

------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/

_______________________________________________
pdl-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pdl-general

Re: [Pdl-general] Fastest way to read large file and create PDL matrices?

Reply via email to