[R] Simulation - Natrual Selection

2011-01-05 Thread Ben Ward

Hi,

I've been modelling some data over the past few days, of my work, 
repeatedly challenging microbes to a certain concentration of cleaner, 
until the required concentration to inhibit or kill them increaces, at 
which point they are challenged to a slightly higher concentration each 
day. I'm doing ths for two different cleaners and I'm collecting the 
required concentration to kill them as a percentage, the challenge 
number, the cleaner as a two level variable, and the lineage theyre in, 
because I have several different lineages. I'm expecting the values to 
rise for one cleaner but not the other as they aqquire resistance for 
one but not the other. Which has happened, but I have wide variation 
because one linage aqquired a very dramatic change which has made it 
immune to 50%, whereas the others, have exhibited a much more gradual 
increace, and so I have very weak p values for the cleaner variable, 
because it is secondary to the challenge vector, which has the most 
explanatory power, because without time and these challenges, the 
selection would no happen.  I was using two bacterium species, but one 
was keen on giving hight erratic results, and insisted on becoming cross 
contaminated, BUT if I include it's data, It shoves cleaner over the 
p0.05 threshold, so i may just be having a problem with lack of data. So 
I've been asking about bootstrapping, which I plan to do to my cases, 
and thenfit a model to see what the confidence is like then. I assume if 
I bootstrap then it will re-select whole cases, and not jumble 
everything up, otherwise a microbe (totake the most extreme value as an 
example) with 50% concentration tolerance at the beginning, would make 
no sense at all. I'm also planning on doing models lineage by lineage, 
rather than putting them into one whole, just to have a look at what 
happens.


But what I really wanted to know from this email, was if there's a 
package or function for natrual selection simulation I could make use 
of, to see if I can simulate the experiment. I want to start with a 
distribution of concentration tolerance values, taken from the 
inhibitory concentration values from my first lot of microbes, back when 
term began. Draw 3000 from this. Then values in that draw that fall 
below the exposure concentration I did in my experiment, are removed, or 
have a high chance of being removed. Then, from what is left, a draw is 
made again - or perhaps a copy operation (rather than a random draw) 
until I have 3000 again, rather than have all exactly the same 
concentration, then a value can be added to some of them, that increaces 
their concentration tolerance slightly, but not by a great deal, except 
in a few individuals, where it may be increaced dramatically(some sort 
of exponential dstribution perhaps). Then when the distribution of this 
simulated population of microbes has reached the next concentration 
(possibly the mean or mode of the distribution) (I have a series of 1 in 
2 dilutions, so 100% 50%, 25% and so on), then they move on to the next 
concentration.


I know it's probably quite a heavy thing, it was just a thought that 
came to me, if anybody has any experience in this area of R or knows of 
something that allows this to be done, please let me know.


Thanks,
Ben.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simulation - Natrual Selection

2011-01-05 Thread Mike Marchywka





 Date: Wed, 5 Jan 2011 15:48:46 +
 From: benjamin.w...@bathspa.org
 To: r-help@r-project.org
 Subject: [R] Simulation - Natrual Selection

 Hi,

 I've been modelling some data over the past few days, of my work,
 repeatedly challenging microbes to a certain concentration of cleaner,
 until the required concentration to inhibit or kill them increaces, at
 which point they are challenged to a slightly higher concentration each
 day. I'm doing ths for two different cleaners and I'm collecting the
 required concentration to kill them as a percentage, the challenge
 number, the cleaner as a two level variable, and the lineage theyre in,
 because I have several different lineages. I'm expecting the values to
 rise for one cleaner but not the other as they aqquire resistance for
 one but not the other. Which has happened, but I have wide variation
 because one linage aqquired a very dramatic change which has made it
 immune to 50%, whereas the others, have exhibited a much more gradual
 increace, and so I have very weak p values for the cleaner variable,
 because it is secondary to the challenge vector, which has the most
 explanatory power, because without time and these challenges, the
 selection would no happen. I was using two bacterium species, but one
 was keen on giving hight erratic results, and insisted on becoming cross
 contaminated, BUT if I include it's data, It shoves cleaner over the
 p0.05 threshold, so i may just be having a problem with lack of data. So
 I've been asking about bootstrapping, which I plan to do to my cases,
 and thenfit a model to see what the confidence is like then. I assume if
 I bootstrap then it will re-select whole cases, and not jumble
 everything up, otherwise a microbe (totake the most extreme value as an
 example) with 50% concentration tolerance at the beginning, would make
 no sense at all. I'm also planning on doing models lineage by lineage,
 rather than putting them into one whole, just to have a look at what
 happens.

You can't really have a p-value without a specific hypothesis to test,
if you have that then all your other questions are probably easy to answer.
Generally you want to sample from things that are iid or maybe you
want to test the identical i. 

Generally you want to have done a lit search ahead of time and 
had some idea of likely evolution dynamics of your system given
your design and things like your forcing functions etc.
Most statisticians would not take seriously a posteriori designs and
indeed it can be hard to avoid rationalization and selection bias ( problems
that always and only effect people who disagree with me LOL) as being
anything other than exploratory or hypothesis generating- you are looking
for predictive value. While it is not always worthwhile doing blind tests,
it may be something worth considering ( do you know which group gets what 
thing?)


 But what I really wanted to know from this email, was if there's a
 package or function for natrual selection simulation I could make use
 of, to see if I can simulate the experiment. I want to start with a


http://www.google.com/#sclient=psyhl=enq=%22R+package%22+natural+selection

but as implied above, R has lots of analysis stuff and maybe you
would find something more useful that is not linked to the keywords
you suggest. You may find, for whatever reason, you could write a differential
equation to express your results but that isn't often used with natural 
selection.


 distribution of concentration tolerance values, taken from th

e
 inhibitory concentration values from my first lot of microbes, back when
 term began. Draw 3000 from this. Then values in that draw that fall
 below the exposure concentration I did in my experiment, are removed, or
 have a high chance of being removed. Then, from what is left, a draw is
 made again - or perhaps a copy operation (rather than a random draw)
 until I have 3000 again, rather than have all exactly the same
 concentration, then a value can be added to some of them, that increaces
 their concentration tolerance slightly, but not by a great deal, except
 in a few individuals, where it may be increaced dramatically(some sort
 of exponential dstribution perhaps). Then when the distribution of this
 simulated population of microbes has reached the next concentration
 (possibly the mean or mode of the distribution) (I have a series of 1 in
 2 dilutions, so 100% 50%, 25% and so on), then they move on to the next
 concentration.

 I know it's probably quite a heavy thing, it was just a thought that
 came to me, if anybody has any experience in this area of R or knows of
 something that allows this to be done, please let me know.

 Thanks,
 Ben.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code

Re: [R] Simulation - Natrual Selection

2011-01-05 Thread Ben Ward

On 05/01/2011 16:37, Mike Marchywka wrote:






Date: Wed, 5 Jan 2011 15:48:46 +
From: benjamin.w...@bathspa.org
To: r-help@r-project.org
Subject: [R] Simulation - Natrual Selection

Hi,

I've been modelling some data over the past few days, of my work,
repeatedly challenging microbes to a certain concentration of cleaner,
until the required concentration to inhibit or kill them increaces, at
which point they are challenged to a slightly higher concentration each
day. I'm doing ths for two different cleaners and I'm collecting the
required concentration to kill them as a percentage, the challenge
number, the cleaner as a two level variable, and the lineage theyre in,
because I have several different lineages. I'm expecting the values to
rise for one cleaner but not the other as they aqquire resistance for
one but not the other. Which has happened, but I have wide variation
because one linage aqquired a very dramatic change which has made it
immune to 50%, whereas the others, have exhibited a much more gradual
increace, and so I have very weak p values for the cleaner variable,
because it is secondary to the challenge vector, which has the most
explanatory power, because without time and these challenges, the
selection would no happen. I was using two bacterium species, but one
was keen on giving hight erratic results, and insisted on becoming cross
contaminated, BUT if I include it's data, It shoves cleaner over the
p0.05 threshold, so i may just be having a problem with lack of data. So
I've been asking about bootstrapping, which I plan to do to my cases,
and thenfit a model to see what the confidence is like then. I assume if
I bootstrap then it will re-select whole cases, and not jumble
everything up, otherwise a microbe (totake the most extreme value as an
example) with 50% concentration tolerance at the beginning, would make
no sense at all. I'm also planning on doing models lineage by lineage,
rather than putting them into one whole, just to have a look at what
happens.


You can't really have a p-value without a specific hypothesis to test,
if you have that then all your other questions are probably easy to answer.
Generally you want to sample from things that are iid or maybe you
want to test the identical i.
My Hypothesis is that Cleaner A (I don't really want to go into names or 
brands), will exhbit a rise in concentration tolerance values, or 
rather, the microbial culture I keep exposed to it, will, reflecting 
aqquisition of antimicrobial resistance. And this has largely happened. 
And that in cleaner B, this will not happen, or if it does, it will not 
be as dramatic and take longer. So I expecting in my model, the cleaner 
variable to have a p below 0.05, and quite hight explanatory power, and 
a satisfying coefficient. The notion behind the hypothesis being that 
one might have a more difficult complex chemical structure, requiring 
more mutations to develop some resistance.
I can't really do anything with genes or chemical structure at my 
current institution and at my level because  of no equippment for that 
sort of thing, and that they felt it would be too far for a 3rd year 
project. So I'm using the concentration required to kill them - or stop 
them from growing, as a indication.

Generally you want to have done a lit search ahead of time and
had some idea of likely evolution dynamics of your system given
your design and things like your forcing functions etc.
Most statisticians would not take seriously a posteriori designs and
indeed it can be hard to avoid rationalization and selection bias ( problems
that always and only effect people who disagree with me LOL) as being
anything other than exploratory or hypothesis generating- you are looking
for predictive value. While it is not always worthwhile doing blind tests,
it may be something worth considering ( do you know which group gets what 
thing?)



But what I really wanted to know from this email, was if there's a
package or function for natrual selection simulation I could make use
of, to see if I can simulate the experiment. I want to start with a


http://www.google.com/#sclient=psyhl=enq=%22R+package%22+natural+selection

but as implied above, R has lots of analysis stuff and maybe you
would find something more useful that is not linked to the keywords
you suggest. You may find, for whatever reason, you could write a differential
equation to express your results but that isn't often used with natural 
selection.



distribution of concentration tolerance values, taken from th

e

inhibitory concentration values from my first lot of microbes, back when
term began. Draw 3000 from this. Then values in that draw that fall
below the exposure concentration I did in my experiment, are removed, or
have a high chance of being removed. Then, from what is left, a draw is
made again - or perhaps a copy operation (rather than a random draw)
until I have 3000 again, rather than have all exactly the same
concentration, then a value

Re: [R] Simulation - Natrual Selection

2011-01-05 Thread Ben Ward

On 05/01/2011 17:40, Bert Gunter wrote:

My hypothesis was specified before I did my experiment. Whilst far from
perfect, I've tried to do the best I can to assess rise in resistance,
without going into genetics as it's not possible. (Although may be at the
next institution I've applied for MSc).

With my hypothesis (I mentioned it below), I was of the frame of mind that a
nonsignificant p-value on the cleaner variable (for now - experiment is far
from over), indicated a lack of evidence for rejecting the null. And so at
the minute, it looks like the type of cleaner makes no difference.

I have no fundamental objection, but be careful. I would simply
qualify your last sentence by saying that it means that the
experimental noise is to great to precisely determine the size of the
cleaner effect. Scientific reality tells us that it is never exactly
0; what your results show is that your uncertainty about the value of
the difference encompasses both positive and negative values. This
does NOT mean that the difference might not be scientifically large
enough to be of interest -- a confidence interval for the difference
(MUCH better than a P value) would help you determine that. If the
interval is narrow enough that the difference, positive or negative,
is too small to be of scientific interest, then you're done. If the
linterval is large, then it tells you that you need more data, a
better experiment (less noisy) etc.

-- Bert

At the moment I wouldn't call the confidence interval small, it's 
definately wide, and at the minute the confidence interval covers zero. 
My R-squared at the minite is also 0.5, this is mostly due to the few 
extreme cases of adaptation as I mentioned before, but I'm hesitant to 
remove it as papers in my literature study which also evolve bacteria 
show that there is often (sometimes wide) variation in the paths 
populations take. So whilst mathematically a bit undesirable, and makes 
me and the model uncertain, it does fall into place with what is known, 
or has been previously shown of the reality of selection. Again if I 
include the data from the bacteria dropped from the study, all that 
improves, and uncertainty is reduced.


It may also be worth me mentioning, I am also taking a more traditional 
approach (by that I mean a more Statistics 101 approach, indeed that 
is all the stats tuition covered in my course as a taught element), 
incase what I've described above did not work or was not ideal, because 
we (me and my supervisor) did forsee a model report may contain a lot of 
uncertainty. Indeed we did expect some populations to adapt and some to 
not etc. So I've also been collecting data on the width of the zones of 
inhibition shown by putting disks of cleaner on plates of growth, and 
measuring the dead zone that results. I can get lots of data from this 
with only a few plates, and doing this at the start of the study, a few 
times in the middle, and at the end. Will allow me to do more 
traditional analysis, for example t.test on the dead zone widths at the 
end of the study, between cleaner a and b.  Or a non parametric 
equivalent, maybe even a permutation test. The modelling stuff is 
already beyond what my supervisor expects of me, but I felt it would add 
value and a lot more insight to the study, allowing more variables to be 
accounted for, than a more short-sighted traditional test.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.