Ben-

You can set the value of $PDL::undefval to make missing column data to a specific value. If you don't have the same number of columns in all rows, you'll need to use explicit column numbers so that all the data is read. Otherwise, the extra columns will be
dropped.  E.g.,

pdl> #cat > in44.cols
 1 2 3 4
 1 2 3
 1 2 3 4

pdl> p rcols 'in44.cols', []
Reading data into piddles of type: [ Double ]
Read in 12 elements.

[
 [1 1 1]
 [2 2 2]
 [3 3 3]
 [4 0 4]
]

while

pdl> #cat > in34.cols
 1 2 3
 1 2 3 4
 1 2 3 4

pdl> p rcols 'in34.cols', []    # uh-oh!
Reading data into piddles of type: [ Double ]
Read in 9 elements.

[
 [1 1 1]
 [2 2 2]
 [3 3 3]
]

pdl> p rcols 'in34.cols', [0..3]  # this works again
Reading data into piddles of type: [ Double ]
Read in 12 elements.

[
 [1 1 1]
 [2 2 2]
 [3 3 3]
 [0 4 4]
]


The zero is from the default value of $PDL::undefval.

Hope this helps,
Chris

On 7/6/2015 09:12, Chris Marshall wrote:
Ben-

You *really* need to look at the rcols() routine: pdldoc rcols
I don't know much about your data as it seems a bit irregular
and too large to verify the format but I was able to slurp it into
a containing 2D PDL:

pdl> $data = rcols 'sample.s321p', [], { EXCLUDE=>'/^[#\!]/' };
Reading data into piddles of type: [ Double ]
Read in 468018 elements.

pdl> ?vars
PDL variables in package main::

Name         Type   Dimension       Flow State          Mem
----------------------------------------------------------------
$data        Double D [52002,9]            -C 0.00KB

pdl> p cat($data->statsover)->mv(-1,0)# the cat()->mv() makes the stats line up vertically

[
[ 2326.8359 375207.06 1.5208731e-06 -0.97827545 61000000 4653.4882 375203.45] [-0.00028558513 0.0058766133 4.8714989e-06 -0.92295157 0.021193039 0.00058756778 0.0058765568] [ 0.0023870574 0.055326859 1.7099293e-06 -0.97826891 0.99688132 0.01008911 0.055326327] [-0.00025093421 0.0014797211 5.2744513e-06 -0.044927633 0.07611833 0.00054094668 0.0014797069] [ 0.0023014344 0.055315241 1.7512188e-06 -0.97827491 0.98798882 0.010026523 0.055314709] [-0.00024539816 0.0016970776 6.2292964e-06 -0.091357195 0.075922572 0.00054341477 0.0016970613] [ 0.0022908007 0.055309355 1.3068813e-06 -0.97826886 0.98751119 0.0098599264 0.055308823] [-0.00024407073 0.0017278177 6.06088e-06 -0.091357195 0.07590755 0.0005446816 0.0017278011] [-3.2840509e-07 5.2955969e-05 0 -0.0086078672 0 6.5678493e-07 5.2955459e-05]
]


pdl> p $data->stats
258.537987240325 125070.089699174 1.190347227888e-06 -0.9782754458665 61000000 517.071919985122 125069.956082351

pdl> ?vars
PDL variables in package main::

Name         Type   Dimension       Flow State          Mem
----------------------------------------------------------------
$data        Double D [52002,9]            P 3.57MB


Which is less than 4MB data.  You could possibly read all the
100+ array data into one 2D PDL and then use slicing operations
and reshape to extract the matrix forms.  Try things out in the
pdl2 or perldl shell to see what works.

Cheers,
Chris

On 7/5/2015 22:25, Benjamin Silva wrote:
Hi Chris,

I had just posted the code snippet with the hopes someone would look at it and see something obviously wrong. The actual subroutine that parses the input file is a bit more complex since the file I am parsing isn't all just matrix data; there is a header and some other stuff that I pull out, so there are a bunch of other loops and such going on in there. That said, I've included a working copy of the script as well as a sample file that it operates on. Any idea why it's so slow? Please forgive my awful code.

link to my code:
http://pastebin.com/JS9gTs6t

link to the sample file it operates on:
http://www.5plates.com/sample.zip

Note that this example looks relatively fast (4 seconds), but this script is reading just a single matrix from a file that houses just 2 matrices. I will be using this script on a file that has 100+ instances of these matrices. That means I'll have to read each of these 100 matrices which makes 4 seconds turn into several minutes.

Please let me know if this helps, or if you can't read the files linked.

Thanks!
-Ben



On Sun, Jul 5, 2015 at 1:11 PM, Chris Marshall <[email protected] <mailto:[email protected]>>wrote:

    Hi Ben-

    It helps if you include a small working code example rather than
    an out-of-context snippet---something we can run.  Some thoughts:

    - rcols and wcols are useful reading an writing 2D piddles

    - $zmatrix_pre seems to be a perl array object with N x M piddle
    elements

    - PDL is optimized for large data operations
       - Reading in complex values one-by-one is going to be very
    slow when
         you have 350x350 elements.
       - You should be reading all the values in one operation and
    create the
          PDL from that.

    I can't help more since your code doesn't even set $row or $col but
    maybe the above thoughts will give you an idea.  You can use wcols()
    to write out a 2D piddle and rcols() to read it back.  Something like
    this might be applicable from a pdl2 session:

    pdl> $im = sequence(5,5)/25;

    pdl> $re = random(5,5);

    pdl> use PDL::Complex

    pdl> p i
    0 +1i

    pdl> $c = $re + i * $im;

    pdl> p $c

    [
[0.971898 +0i 0.754039+0.04i 0.50257+0.08i 0.190826+0.12i 0.613931+0.16i] [0.132726 +0.2i 0.327291+0.24i 0.251733+0.28i 0.184122+0.32i 0.787163+0.36i] [0.103273 +0.4i 0.793739+0.44i 0.286722+0.48i 0.71684+0.52i 0.939528+0.56i] [0.114506 +0.6i 0.750494+0.64i 0.757878+0.68i 0.761478+0.72i 0.827088+0.76i] [ 0.69723 +0.8i 0.438457+0.84i 0.177937+0.88i 0.321631+0.92i 0.750218+0.96i]
    ]


    I recommend trying out small cases in the pdl2 or perldl shells
    to see how things
    work.  Once you see the patterns it is easier to apply to bigger
    data in a program
    or script.

    Cheers,
    Chris



    On 7/5/2015 14:57, Benjamin Silva wrote:
    Hello,

    I recently recoded some of my old scripts to use the PDL
    libraries instead of some subroutines I had written myself.
    These scripts were for doing matrix inversion and matrix
    manipulation for medium sized matrices of complex numbers
    (~350x350 max matrix size). These matrices are housed in plain
    text files, and I have a parser that goes through and builds a
    piddle out of the data in the file.  My old subroutines were
    slow to do the processing, but were extremely fast for reading
    in the file and creating the matrix.  The new subroutine, using
    PDL, is extremely fast to do the processing, but now it is crazy
    slow for reading in the file and creating the initial PDL.  My
    method for creating the PDL is shown below.  Can anyone please
    let me know if there's a faster way to do this?  Almost all of
    the time savings I've achieved by going to PDL have been
    consumed by the slower file parsing and PDL building.  I've run
    the code through a profiler, and it's definitely wasting a lot
    of cycles on the 3rd line in the while loop where I'm creating
    the cplx data structure.

    open (FILE, "$input_file") or die;
    while($inline1=<FILE>){
    chomp $inline1;
    @data = split(/\s+/, $inline1);
    $zmatrix_pre[$row][$column] = cplx($data[0]+$data[1]*i);
    }

    Thanks for any help!
    -Ben





------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that
you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
pdl-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pdl-general

Reply via email to