I cannot speak for Kostas, but I often mix text and numbers. Consider an
example, where I have a list of text identifiers and corresponding numbers,
e.g., p-values. Now I want to find identifiers with p < 0.05. If $id and $p
were piddles with text identifiers and p-vales, respectively, I could do

my $sel = which($p < 0.05);
wcols $id($sel);

to print identifiers corresponding to me selection. Alas, I cannot do this,
so I have to do what Bryan does: employ a hash and an array for converting
between strings and integer ids, and then, in order to print selected
identifiers I'd have to use a loop. Much less elegant. It would be nice to
have (variable-length) strings implemented in PDL.

Marek

On Fri, 24 Jul 2015 at 14:25 David Mertens <[email protected]> wrote:

> Hello Kostas,
>
> To follow up on what Bryan said, I wonder what sort of PDL functionality
> you hope to use with a piddle of words, as opposed to a normal Perl array.
> I have a hard time imagining you'll need the multidimensional handling PDL
> provides. Even if you want a list of lists, PDL will only work with a
> collection of lists that have identical length. A Perl list of lists can
> accommodate variable length lists, and those lists can accommodate strings
> of variable length. Perl's map and grep are pretty flexible and fast, too.
>
> One the other hand, if you're doing computational linguistics, the typical
> approach I've seen is to map all words to integers and analyze the
> collections of integers. You can build a hash lookup table to map from the
> words to the integers, and a regular Perl array of the words themselves can
> map integer offsets to the original words.
>
> Of course, I could be wrong. What is the actual problem you are trying to
> solve?
>
> David
>
> On Fri, Jul 24, 2015 at 9:10 AM, Bryan Jurish <[email protected]>
> wrote:
>
>> moin Konstantinos,
>>
>> afaik, builtin support only includes PDL::Char, which is restricted
>> fixed-length strings encoded as byte-values (e.g. ASCII).  There's also 
>> Zakariyya
>> Mughal's Data::Frame which seems capable of handling variable-length
>> strings, but I'm unclear on the details; perhaps he can chime in.  Whenever
>> I need to do something like this (very often, since I work with text data),
>> I usually end up building an extra hash+array pair for mapping back and
>> forth between strings and integer-IDs, and let PDL work with just the IDs.
>> Not pretty, but it works.
>>
>> marmosets,
>>   Bryan
>>
>> On Fri, Jul 24, 2015 at 2:38 PM, Konstantinos Billis <[email protected]>
>> wrote:
>>
>>> Hi people,
>>>
>>>
>>> Just a quick question. I am using PDL to build arrays, for example "
>>> zeroes" function.  If I understand correctly, those elements of the
>>> arrays should contain only numbers (or bad, inf etc). Could I use any other
>>> function for creating strings/words of lists/arrays instead of numbers? In
>>> other words, for example, to initialize an array with NULLs and then add
>>> strings or words in particular positions of the array.
>>>
>>>
>>> Many Thanks,
>>> Kostas
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> pdl-general mailing list
>>> [email protected]
>>> https://lists.sourceforge.net/lists/listinfo/pdl-general
>>>
>>>
>>
>>
>> --
>> Bryan Jurish                           "There is *always* one more bug."
>> [email protected]         -Lubarsky's Law of Cybernetic Entomology
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> pdl-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/pdl-general
>>
>>
>
>
> --
>  "Debugging is twice as hard as writing the code in the first place.
>   Therefore, if you write the code as cleverly as possible, you are,
>   by definition, not smart enough to debug it." -- Brian Kernighan
>
> ------------------------------------------------------------------------------
> _______________________________________________
> pdl-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/pdl-general
>
-- 
Dr Marek Gierliński
Data Analyst
The Data Analysis Group
The Barton Group
Division of Computational Biology and GRE
College of Life Sciences
University of Dundee, Dundee, Scotland, UK.
Tel:+44 1382 386427
www.compbio.dundee.ac.uk/dag.html
------------------------------------------------------------------------------
_______________________________________________
pdl-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pdl-general

Reply via email to