On 2015-07-24 at 09:25:18 -0400, David Mertens wrote:
> Hello Kostas,

Hello Kostas,

The approach that I use in the helper modules that I have in Data::Frame
differs based on what you want to use it for:

  - PDL::SV <https://metacpan.org/pod/PDL::SV>: a PDL that can store any
    sort of Perl scalar. It uses an Perl arrayref internally that is
    mapped using integers and supports `slice`, `dice`, `at`, and
    `unpdl` as well as stringification. 

    The mapping is one-to-one, so the internal array will have the same
    number of elements as elements in the PDL.

    It is mainly meant as a container that can be used to give a
    consistent interface inside of Data::Frame.

  - PDL::Factor <https://metacpan.org/pod/PDL::Factor>: a PDL that is
    used for categorical data. Also uses an integer to map to an array,
    but uses the concept of levels (like in R) to give each category a
    unique internal integer ID.

    Supports checking equality and inequality as well as slice, dice,
    and stringification.

I'm not particularly happy with the internals and interface as they are
now since they are far from complete, but with more feedback on what
features it needs, that can be improved. Also, I will need to write
documentation.

Cheers,
- Zaki Mughal

P.S. I recently worked with MALLET for an NLP task. It works
surprisingly well through Inline::Java. I'll see if I can take
some its ideas as well.

> 
> To follow up on what Bryan said, I wonder what sort of PDL functionality
> you hope to use with a piddle of words, as opposed to a normal Perl array.
> I have a hard time imagining you'll need the multidimensional handling PDL
> provides. Even if you want a list of lists, PDL will only work with a
> collection of lists that have identical length. A Perl list of lists can
> accommodate variable length lists, and those lists can accommodate strings
> of variable length. Perl's map and grep are pretty flexible and fast, too.
> 
> One the other hand, if you're doing computational linguistics, the typical
> approach I've seen is to map all words to integers and analyze the
> collections of integers. You can build a hash lookup table to map from the
> words to the integers, and a regular Perl array of the words themselves can
> map integer offsets to the original words.
> 
> Of course, I could be wrong. What is the actual problem you are trying to
> solve?
> 
> David
> 
> On Fri, Jul 24, 2015 at 9:10 AM, Bryan Jurish <[email protected]>
> wrote:
> 
> > moin Konstantinos,
> >
> > afaik, builtin support only includes PDL::Char, which is restricted
> > fixed-length strings encoded as byte-values (e.g. ASCII).  There's also 
> > Zakariyya
> > Mughal's Data::Frame which seems capable of handling variable-length
> > strings, but I'm unclear on the details; perhaps he can chime in.  Whenever
> > I need to do something like this (very often, since I work with text data),
> > I usually end up building an extra hash+array pair for mapping back and
> > forth between strings and integer-IDs, and let PDL work with just the IDs.
> > Not pretty, but it works.
> >
> > marmosets,
> >   Bryan
> >
> > On Fri, Jul 24, 2015 at 2:38 PM, Konstantinos Billis <[email protected]>
> > wrote:
> >
> >> Hi people,
> >>
> >>
> >> Just a quick question. I am using PDL to build arrays, for example "
> >> zeroes" function.  If I understand correctly, those elements of the
> >> arrays should contain only numbers (or bad, inf etc). Could I use any other
> >> function for creating strings/words of lists/arrays instead of numbers? In
> >> other words, for example, to initialize an array with NULLs and then add
> >> strings or words in particular positions of the array.
> >>
> >>
> >> Many Thanks,
> >> Kostas
> >>
> >>
> >> ------------------------------------------------------------------------------
> >>
> >> _______________________________________________
> >> pdl-general mailing list
> >> [email protected]
> >> https://lists.sourceforge.net/lists/listinfo/pdl-general
> >>
> >>
> >
> >
> > --
> > Bryan Jurish                           "There is *always* one more bug."
> > [email protected]         -Lubarsky's Law of Cybernetic Entomology
> >
> >
> > ------------------------------------------------------------------------------
> >
> > _______________________________________________
> > pdl-general mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/pdl-general
> >
> >
> 
> 
> -- 
>  "Debugging is twice as hard as writing the code in the first place.
>   Therefore, if you write the code as cleverly as possible, you are,
>   by definition, not smart enough to debug it." -- Brian Kernighan

> ------------------------------------------------------------------------------

> _______________________________________________
> pdl-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/pdl-general


------------------------------------------------------------------------------
_______________________________________________
pdl-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pdl-general

Reply via email to