moin Rob,

On 2011-09-29 14:16, Rob Freeman wrote:
> Hi Bryan,
> 
> I haven't seen this appear on the list yet. Are you sure you sent from
> the address you are subscribed with? I've been caught by that one
> before elsewhere.

I had begun to suspect something similar myself... now testing with the
proper address; thanks for the heads-up!

> Thanks for the feedback. And David, Chris, and Mark too.
> 
> Bryan, nice to find someone working on a similar problem. I'll check
> out your work. At first glace your ($indices,$values) pairs might be
> similar to how I have addressed the problem up to now. Not within PDL,
> but as a regular Perl hash.
> 
> Your logarithmic search time sounds interesting. I get linear time
> searching for word pairs. Frankly I was glad to get it down from
> exponential time in the size of the vectors.

Amen.

> It may not be the same
> search we are talking about. I'm searching for pairs between every
> element of a vector, and every element of another.

Ah, that's a bit trickier.  I've written a helper routine
PDL::CCS::Ops::ccs_binop_align_mia() to do basically just this, which I
use to implement the standard binary arithmetic operations.  The major
gotcha is that "missing" (read 'zero') values in either operand do not
get aligned at all (which is why matrix multiplication rsp. dot-product
are such a PITA); hence the "_mia" suffix (= 'missing-is-annihiliator').
 It's worst-case quadratic time, but for very sparse pdls (including my
co-occurrence matrices), it's much closer to linear (sorted parallel
search of operands' index vectors).  It requires a bit of twiddling on
the perl end to ensure that all vectors get aligned (block-wise
iteration) because PDL needs to know the size of the output pdls before
they've been computed, which is generally A Bad Thing for sparse matrix
operations.

> However I do it though my memory usage just explodes when I try a
> cross product. Too many intermediate products being summed, nesting
> two or three levels of substitutions over 10k+ dimensions. I need to
> keep all intermediate products in RAM at once or it is way too slow.
> But with it all in RAM the memory usage blows out.

Yup.  Cross-product is nasty even with sparse pdls.  Occassionally I had
to drill open the CCS::Nd objects (really just perl arrays) and work
directly on the underlying ($indices,$values) pairs -- cross-product is
pretty simple to define by direct manipulation of $indices, although
memory usage is still O(Nnz**2), where Nnz is the number of non-zero values.

marmosets,
        Bryan

> -Rob
> 
> P.S. Mark, I'm not sure if you're suggesting you manage to reduce your
> multi-dimensional problem down to one dimension, or that you can get
> rid of uncertainty by using multi-dimensions.
> 
> On Thu, Sep 29, 2011 at 5:30 PM, Bryan Jurish <[email protected]> wrote:
>> morning all,
>>
>> Apologies for barging in on the discussion (this being my first post to
>> this list); just saw one of my buzzwords and thought I might have
>> something to contribute (and of course a chance for a shameless plug)... ;-)
>>
>> On 2011-09-28 16:04, chm wrote:
>>> On 9/28/2011 8:15 AM, David Mertens wrote:
>>>> Hey Rob,
>>>>
>>>> I've CC'd the PDL list in case somebody there can speak more to your
>>>> concerns about PDL. I've never heard of anybody needing 10s of
>>>> thousands of
>>>> dimensions, and PDL might only support 256. Anybody know? As far as
>>>> sparse
>>>> matrix support, I never used it and it's not a crowning feature. Can
>>>> anybody
>>>> else speak more to the matter?
>>
>> PDL's sparse matrix support is basically non-existent, afaik.  There's a
>> proof-of-concept implementation floating around somewhere in the core I
>> looked at a few years ago, but iirc any kind of operation at all on a
>> sparse piddle will blow it up into a dense one (or at least returns a
>> dense pdl, which often amounts to the same thing as far as memory
>> overflow is concerned).
>>
>> I also needed large sparse pdls in my work (very sparse 3d word
>> co-occurrence matrices on the order of 50000**3), and wound up rolling
>> my own bastard implementation out of PDL::PP and perl hackery; the
>> module is available here:
>>
>>  http://search.cpan.org/~moocow/PDL-CCS-1.14/CCS/Nd.pm
>>
>> The basic idea is to store a sparse N-dimensional pdl as a pair
>> ($indices,$values), where $indices is a lexicographically sorted list of
>> index-vectors (a la whichND(), indexND()), and $values are the
>> corresponding (non-zero) values.  Index operations are logarithmic time
>> with vsearchvec(), and basic arithmetic operations and ufuncs can be
>> twiddled with some block-wise processing heuristics.  Abstract matrix
>> multiplication is hairy, and when it comes down to it these things are
>> perl objects and not "real" pdls in their own right, so a lot of pdl's
>> functionality gets lost.  It might be useful for the word-matrix
>> application though...
>>
>> That having been said, I'd love to see some "real" sparse n-d matrix
>> support in the pdl core; I'm just not sure to what degree the
>> assumptions implicit in the core code might prevent an efficient sparse
>> matrix implementation (e.g. explicit iteration over all possible values,
>> implicit assumption of dense output pdls, etc.)
>>
>> marmosets,
>>        Bryan
>>
>> --
>> Bryan Jurish                           "There is *always* one more bug."
>> [email protected]           -Lubarsky's Law of Cybernetic Entomology
-- 
***************************************************

Bryan Jurish
Deutsches Textarchiv
Berlin-Brandenburgische Akademie der Wissenschaften

Jägerstr. 22/23
10117 Berlin

Tel.:      +49 (0)30 20370 539
E-Mail:    [email protected]

***************************************************

_______________________________________________
Perldl mailing list
[email protected]
http://mailman.jach.hawaii.edu/mailman/listinfo/perldl

Reply via email to