[R] Simulation - Natrual Selection
Hi, I've been modelling some data over the past few days, of my work, repeatedly challenging microbes to a certain concentration of cleaner, until the required concentration to inhibit or kill them increaces, at which point they are challenged to a slightly higher concentration each day. I'm doing ths for two different cleaners and I'm collecting the required concentration to kill them as a percentage, the challenge number, the cleaner as a two level variable, and the lineage theyre in, because I have several different lineages. I'm expecting the values to rise for one cleaner but not the other as they aqquire resistance for one but not the other. Which has happened, but I have wide variation because one linage aqquired a very dramatic change which has made it immune to 50%, whereas the others, have exhibited a much more gradual increace, and so I have very weak p values for the cleaner variable, because it is secondary to the challenge vector, which has the most explanatory power, because without time and these challenges, the selection would no happen. I was using two bacterium species, but one was keen on giving hight erratic results, and insisted on becoming cross contaminated, BUT if I include it's data, It shoves cleaner over the p0.05 threshold, so i may just be having a problem with lack of data. So I've been asking about bootstrapping, which I plan to do to my cases, and thenfit a model to see what the confidence is like then. I assume if I bootstrap then it will re-select whole cases, and not jumble everything up, otherwise a microbe (totake the most extreme value as an example) with 50% concentration tolerance at the beginning, would make no sense at all. I'm also planning on doing models lineage by lineage, rather than putting them into one whole, just to have a look at what happens. But what I really wanted to know from this email, was if there's a package or function for natrual selection simulation I could make use of, to see if I can simulate the experiment. I want to start with a distribution of concentration tolerance values, taken from the inhibitory concentration values from my first lot of microbes, back when term began. Draw 3000 from this. Then values in that draw that fall below the exposure concentration I did in my experiment, are removed, or have a high chance of being removed. Then, from what is left, a draw is made again - or perhaps a copy operation (rather than a random draw) until I have 3000 again, rather than have all exactly the same concentration, then a value can be added to some of them, that increaces their concentration tolerance slightly, but not by a great deal, except in a few individuals, where it may be increaced dramatically(some sort of exponential dstribution perhaps). Then when the distribution of this simulated population of microbes has reached the next concentration (possibly the mean or mode of the distribution) (I have a series of 1 in 2 dilutions, so 100% 50%, 25% and so on), then they move on to the next concentration. I know it's probably quite a heavy thing, it was just a thought that came to me, if anybody has any experience in this area of R or knows of something that allows this to be done, please let me know. Thanks, Ben. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Simulation - Natrual Selection
Date: Wed, 5 Jan 2011 15:48:46 + From: benjamin.w...@bathspa.org To: r-help@r-project.org Subject: [R] Simulation - Natrual Selection Hi, I've been modelling some data over the past few days, of my work, repeatedly challenging microbes to a certain concentration of cleaner, until the required concentration to inhibit or kill them increaces, at which point they are challenged to a slightly higher concentration each day. I'm doing ths for two different cleaners and I'm collecting the required concentration to kill them as a percentage, the challenge number, the cleaner as a two level variable, and the lineage theyre in, because I have several different lineages. I'm expecting the values to rise for one cleaner but not the other as they aqquire resistance for one but not the other. Which has happened, but I have wide variation because one linage aqquired a very dramatic change which has made it immune to 50%, whereas the others, have exhibited a much more gradual increace, and so I have very weak p values for the cleaner variable, because it is secondary to the challenge vector, which has the most explanatory power, because without time and these challenges, the selection would no happen. I was using two bacterium species, but one was keen on giving hight erratic results, and insisted on becoming cross contaminated, BUT if I include it's data, It shoves cleaner over the p0.05 threshold, so i may just be having a problem with lack of data. So I've been asking about bootstrapping, which I plan to do to my cases, and thenfit a model to see what the confidence is like then. I assume if I bootstrap then it will re-select whole cases, and not jumble everything up, otherwise a microbe (totake the most extreme value as an example) with 50% concentration tolerance at the beginning, would make no sense at all. I'm also planning on doing models lineage by lineage, rather than putting them into one whole, just to have a look at what happens. You can't really have a p-value without a specific hypothesis to test, if you have that then all your other questions are probably easy to answer. Generally you want to sample from things that are iid or maybe you want to test the identical i. Generally you want to have done a lit search ahead of time and had some idea of likely evolution dynamics of your system given your design and things like your forcing functions etc. Most statisticians would not take seriously a posteriori designs and indeed it can be hard to avoid rationalization and selection bias ( problems that always and only effect people who disagree with me LOL) as being anything other than exploratory or hypothesis generating- you are looking for predictive value. While it is not always worthwhile doing blind tests, it may be something worth considering ( do you know which group gets what thing?) But what I really wanted to know from this email, was if there's a package or function for natrual selection simulation I could make use of, to see if I can simulate the experiment. I want to start with a http://www.google.com/#sclient=psyhl=enq=%22R+package%22+natural+selection but as implied above, R has lots of analysis stuff and maybe you would find something more useful that is not linked to the keywords you suggest. You may find, for whatever reason, you could write a differential equation to express your results but that isn't often used with natural selection. distribution of concentration tolerance values, taken from th e inhibitory concentration values from my first lot of microbes, back when term began. Draw 3000 from this. Then values in that draw that fall below the exposure concentration I did in my experiment, are removed, or have a high chance of being removed. Then, from what is left, a draw is made again - or perhaps a copy operation (rather than a random draw) until I have 3000 again, rather than have all exactly the same concentration, then a value can be added to some of them, that increaces their concentration tolerance slightly, but not by a great deal, except in a few individuals, where it may be increaced dramatically(some sort of exponential dstribution perhaps). Then when the distribution of this simulated population of microbes has reached the next concentration (possibly the mean or mode of the distribution) (I have a series of 1 in 2 dilutions, so 100% 50%, 25% and so on), then they move on to the next concentration. I know it's probably quite a heavy thing, it was just a thought that came to me, if anybody has any experience in this area of R or knows of something that allows this to be done, please let me know. Thanks, Ben. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code
Re: [R] Simulation - Natrual Selection
On 05/01/2011 16:37, Mike Marchywka wrote: Date: Wed, 5 Jan 2011 15:48:46 + From: benjamin.w...@bathspa.org To: r-help@r-project.org Subject: [R] Simulation - Natrual Selection Hi, I've been modelling some data over the past few days, of my work, repeatedly challenging microbes to a certain concentration of cleaner, until the required concentration to inhibit or kill them increaces, at which point they are challenged to a slightly higher concentration each day. I'm doing ths for two different cleaners and I'm collecting the required concentration to kill them as a percentage, the challenge number, the cleaner as a two level variable, and the lineage theyre in, because I have several different lineages. I'm expecting the values to rise for one cleaner but not the other as they aqquire resistance for one but not the other. Which has happened, but I have wide variation because one linage aqquired a very dramatic change which has made it immune to 50%, whereas the others, have exhibited a much more gradual increace, and so I have very weak p values for the cleaner variable, because it is secondary to the challenge vector, which has the most explanatory power, because without time and these challenges, the selection would no happen. I was using two bacterium species, but one was keen on giving hight erratic results, and insisted on becoming cross contaminated, BUT if I include it's data, It shoves cleaner over the p0.05 threshold, so i may just be having a problem with lack of data. So I've been asking about bootstrapping, which I plan to do to my cases, and thenfit a model to see what the confidence is like then. I assume if I bootstrap then it will re-select whole cases, and not jumble everything up, otherwise a microbe (totake the most extreme value as an example) with 50% concentration tolerance at the beginning, would make no sense at all. I'm also planning on doing models lineage by lineage, rather than putting them into one whole, just to have a look at what happens. You can't really have a p-value without a specific hypothesis to test, if you have that then all your other questions are probably easy to answer. Generally you want to sample from things that are iid or maybe you want to test the identical i. My Hypothesis is that Cleaner A (I don't really want to go into names or brands), will exhbit a rise in concentration tolerance values, or rather, the microbial culture I keep exposed to it, will, reflecting aqquisition of antimicrobial resistance. And this has largely happened. And that in cleaner B, this will not happen, or if it does, it will not be as dramatic and take longer. So I expecting in my model, the cleaner variable to have a p below 0.05, and quite hight explanatory power, and a satisfying coefficient. The notion behind the hypothesis being that one might have a more difficult complex chemical structure, requiring more mutations to develop some resistance. I can't really do anything with genes or chemical structure at my current institution and at my level because of no equippment for that sort of thing, and that they felt it would be too far for a 3rd year project. So I'm using the concentration required to kill them - or stop them from growing, as a indication. Generally you want to have done a lit search ahead of time and had some idea of likely evolution dynamics of your system given your design and things like your forcing functions etc. Most statisticians would not take seriously a posteriori designs and indeed it can be hard to avoid rationalization and selection bias ( problems that always and only effect people who disagree with me LOL) as being anything other than exploratory or hypothesis generating- you are looking for predictive value. While it is not always worthwhile doing blind tests, it may be something worth considering ( do you know which group gets what thing?) But what I really wanted to know from this email, was if there's a package or function for natrual selection simulation I could make use of, to see if I can simulate the experiment. I want to start with a http://www.google.com/#sclient=psyhl=enq=%22R+package%22+natural+selection but as implied above, R has lots of analysis stuff and maybe you would find something more useful that is not linked to the keywords you suggest. You may find, for whatever reason, you could write a differential equation to express your results but that isn't often used with natural selection. distribution of concentration tolerance values, taken from th e inhibitory concentration values from my first lot of microbes, back when term began. Draw 3000 from this. Then values in that draw that fall below the exposure concentration I did in my experiment, are removed, or have a high chance of being removed. Then, from what is left, a draw is made again - or perhaps a copy operation (rather than a random draw) until I have 3000 again, rather than have all exactly the same concentration, then a value
Re: [R] Simulation - Natrual Selection
On 05/01/2011 17:40, Bert Gunter wrote: My hypothesis was specified before I did my experiment. Whilst far from perfect, I've tried to do the best I can to assess rise in resistance, without going into genetics as it's not possible. (Although may be at the next institution I've applied for MSc). With my hypothesis (I mentioned it below), I was of the frame of mind that a nonsignificant p-value on the cleaner variable (for now - experiment is far from over), indicated a lack of evidence for rejecting the null. And so at the minute, it looks like the type of cleaner makes no difference. I have no fundamental objection, but be careful. I would simply qualify your last sentence by saying that it means that the experimental noise is to great to precisely determine the size of the cleaner effect. Scientific reality tells us that it is never exactly 0; what your results show is that your uncertainty about the value of the difference encompasses both positive and negative values. This does NOT mean that the difference might not be scientifically large enough to be of interest -- a confidence interval for the difference (MUCH better than a P value) would help you determine that. If the interval is narrow enough that the difference, positive or negative, is too small to be of scientific interest, then you're done. If the linterval is large, then it tells you that you need more data, a better experiment (less noisy) etc. -- Bert At the moment I wouldn't call the confidence interval small, it's definately wide, and at the minute the confidence interval covers zero. My R-squared at the minite is also 0.5, this is mostly due to the few extreme cases of adaptation as I mentioned before, but I'm hesitant to remove it as papers in my literature study which also evolve bacteria show that there is often (sometimes wide) variation in the paths populations take. So whilst mathematically a bit undesirable, and makes me and the model uncertain, it does fall into place with what is known, or has been previously shown of the reality of selection. Again if I include the data from the bacteria dropped from the study, all that improves, and uncertainty is reduced. It may also be worth me mentioning, I am also taking a more traditional approach (by that I mean a more Statistics 101 approach, indeed that is all the stats tuition covered in my course as a taught element), incase what I've described above did not work or was not ideal, because we (me and my supervisor) did forsee a model report may contain a lot of uncertainty. Indeed we did expect some populations to adapt and some to not etc. So I've also been collecting data on the width of the zones of inhibition shown by putting disks of cleaner on plates of growth, and measuring the dead zone that results. I can get lots of data from this with only a few plates, and doing this at the start of the study, a few times in the middle, and at the end. Will allow me to do more traditional analysis, for example t.test on the dead zone widths at the end of the study, between cleaner a and b. Or a non parametric equivalent, maybe even a permutation test. The modelling stuff is already beyond what my supervisor expects of me, but I felt it would add value and a lot more insight to the study, allowing more variables to be accounted for, than a more short-sighted traditional test. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.