This is a reply to the message copied below from Roland Sookias

Hi Roland,
This depends slightly on the structure of the data. If you have crocodile tails 
with a set of discrete shape states in the same anatomical part, then I would 
recommend coding it as a multistate characteristic. MrBayes provides pretty 
flexible coding for multi-state characters, and I think will handle up to at 
least 10 states in a single character. There might be an implementation in R 
now that is equivalent but I haven't done this in while. Anyone else want to 
weigh in about an R implementation for multistate characters? 

If you really have more than 10 states for a single character I'd be concerned 
about whether multiple observers can consistently code those states anyway.

If instead you have truly inapplicable characters, for example crocodiles 
without tails, and then ones with tails with various shapes, then I think the 
best approach is to have two characters. A presense/absence character and a 
shape character.

You also can binarize every state as separate characters, but as you say it 
introduces an implicit homology among qualitatively different absence states. 
I've never liked this approach, but there are some articles defending it. I 
think it's validity may depend on a relatively even frequency distribution of 
each state. Also, if you have other anatomical parts in the analysis, then you 
are effectively weighting more heavily those parts that you split into more 
binary characters - they will tend to drive the result. Some software allow you 
to explicitly correct character weights so that would solve this, but not all 
software provide that. Note - this binarizing by state is a common approach in 
phylogeny of language cognates - but I believe that has come about because of a 
tail wagging the dog problem. BEAST, unless they changed it recently, didn't 
support multistate character evolution and the language phylo people use BEAST 
not MrBayes. 

There are a bunch of articles about this issue but I'd have to go dig them up. 
I think I cited them in some of my earlier papers that did anatomical and 
cultural phylogenetic work.


Message: 1
Date: Thu, 8 Feb 2018 17:27:54 +0100
From: Roland Sookias <>
Subject: [R-sig-phylo] Not inferring homology within "absence" state
        in phylogenetic analysis
Content-Type: text/plain; charset="utf-8"

 Dear all

Maybe someone has some insight here...

I am coming up against the problem, when it comes to phylogenetic analysis.
Basically I want to conduct a parsimony (or other phylogenetic) analysis where 
"inapplicable" scores are treated as separate states *for each taxon* .

I.e. I want to hypothesize shared ancestry for taxa scored with one state 
(let's say state 0), but not hypothesize shared ancestry for the other taxa. 
However, I still want to penalize a change in state from and to state 0.

There are three approaches which I have thought about, but none seems to fit 
the bill:

-Score all taxa not showing state 0 as separate states. This should do what I 
want, but the problem here is the limit on the number of states in most 

-Scoring binary presence/absence. The problem is here that it could end up 
being parsimonious to group the "absence" state together, when there is no 
reason to infer homology within (i.e. for taxa scored with) this state.

-Score as 0 and inapplicable. The problem is this does not penalize a change 
from 0 to inapplicable.

A real life example, is the shape of the ilium in crocodiles. I want to say 
that it is likely that a particular curve in the dorsal margin of the ilium in 
crocodile-line crocodilians is homologous, but I don't want to hypothesize 
homology of those taxa "lacking" this state. They are all equally far from each 

Thanks very much indeed


Roland Sookias

        [[alternative HTML version deleted]]

R-sig-phylo mailing list -
Searchable archive at

Reply via email to