On Aug 7, 2015, at 12:22 AM, Federico Calboli wrote:
>
>> On 7 Aug 2015, at 01:59, Bert Gunter <[email protected]> wrote:
>>
>> Boris:
>>
>> You may be right, but it seems like esp to me based on the op's
>> non-description of likelihood of coming from the same noisy process. My
>> response would be: seek local statistical help, as your replies indicate a
>> good deal of statistical confusion.
>>
>> Cheers,
>> Bert
>
> Bert,
>
> as this is R-help and not cross-validated I am looking for a precanned
> function that would test whether the order of characters in two character
> vectors comes from the same (noisy) process. I would thus expect you to say
> something on the lines of:
>
> function X uses method Y to do something like that
> function W uses method Z to do something like that
> …
>
> look into those, figure out exactly what you are testing and use the most
> appropiate function.
>
> The whys and wherefores are for me to deal with, I just want to know whether
> someone has built a function that does, or seems to do, what I asked for. As
> I said, this is R-help, and I seek help for R use.
> findFn("levenshtein")
found 57 matches; retrieving 3 pages
2 3
Downloaded 44 links in 17 packages.
stringdist::stringdist( paste0(x, collapse=""), paste0(letters[y],
collapse="") )
[1] 30
--
HTH;
David.
>
> I do concede that my original question might have left many wondering, but I
> guess my reply to Boris would have cleared any doubts. I am therefore
> puzzled by the great deal of confusion on your part in understanding the
> purpose of my question and, in general, of this list.
>
> Best wishes
>
> F
>
>
>>
>>
>>
>> On Thursday, August 6, 2015, Boris Steipe <[email protected]> wrote:
>> You are looking for what is known as the "Cayley distance" between vectors -
>> an edit distance that allows only transpositions. RSeek mentions PerMallows
>> (https://cran.r-project.org/web/packages/PerMallows/PerMallows.pdf) and
>> Rankluster
>> (https://cran.r-project.org/web/packages/Rankcluster/Rankcluster.pdf) as
>> packages that support work with Cayley distances. It seems to me that
>> distCayley() in Rankcluster does what you want. From the examples:
>>
>> x=1:5
>> y=c(2,3,1,4,5)
>> distCayley(x,y)
>> 8
>>
>>
>> Cheers,
>> Boris
>>
>>
>>
>>
>>
>> On Aug 6, 2015, at 9:51 AM, Federico Calboli <[email protected]>
>> wrote:
>>
>>>>
>>>> On 6 Aug 2015, at 15:40, Bert Gunter <[email protected]> wrote:
>>>>
>>>> Define "goodness of match" . For exact matches, see ?"==" , all.equal,
>>>> etc.
>>>
>>> Fair point. I would define it as a number that tells me how likely it is
>>> that the same (noisy) process produced both lists.
>>>
>>> BW
>>>
>>> F
>>>
>>>
>>>
>>>
>>>>
>>>> Bert
>>>>
>>>> On Thursday, August 6, 2015, Federico Calboli
>>>> <[email protected]> wrote:
>>>> Hi All,
>>>>
>>>> let’s assume I have a vector of letters drawn only once from the alphabet:
>>>>
>>>> x = sample(letters, 15, replace = F)
>>>> x
>>>> [1] "z" "t" "g" "l" "u" "d" "w" "x" "a" "q" "k" "j" "f" "n" “v"
>>>>
>>>> y = x[c(1:7,9:8, 10:12, 14, 15, 13)]
>>>>
>>>> I would now like to test how good a match y is for x. Obviously I can
>>>> transform the letters in numbers and use a rank test, but I was left
>>>> wondering whether this is the only solution and whether there are more
>>>> appropriate solutions that are already implemented in R (I am not going to
>>>> reinvent the wheel if I can avoid it).
>>>>
>>>> BW
>>>>
>>>> F
>>>>
>>>>
>>>> --
>>>> Federico Calboli
>>>> Ecological Genetics Research Unit
>>>> Department of Biosciences
>>>> PO Box 65 (Biocenter 3, Viikinkaari 1)
>>>> FIN-00014 University of Helsinki
>>>> Finland
>>>>
>>>> [email protected]
>>>>
>>>> ______________________________________________
>>>> [email protected] mailing list -- To UNSUBSCRIBE and more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible code.
>>>>
>>>>
>>>> --
>>>> Bert Gunter
>>>>
>>>> "Data is not information. Information is not knowledge. And knowledge is
>>>> certainly not wisdom."
>>>> -- Clifford Stoll
>>>>
>>>
>>>
>>> --
>>> Federico Calboli
>>> Ecological Genetics Research Unit
>>> Department of Biosciences
>>> PO Box 65 (Biocenter 3, Viikinkaari 1)
>>> FIN-00014 University of Helsinki
>>> Finland
>>>
>>> [email protected]
>>>
>>> ______________________________________________
>>> [email protected] mailing list -- To UNSUBSCRIBE and more, see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>
>> ______________________________________________
>> [email protected] mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>> --
>> Bert Gunter
>>
>> "Data is not information. Information is not knowledge. And knowledge is
>> certainly not wisdom."
>> -- Clifford Stoll
>>
>
>
> --
> Federico Calboli
> Ecological Genetics Research Unit
> Department of Biosciences
> PO Box 65 (Biocenter 3, Viikinkaari 1)
> FIN-00014 University of Helsinki
> Finland
>
> [email protected]
>
> ______________________________________________
> [email protected] mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA
______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.