Re: [R] New var
Sent from my iPhone > On Jun 4, 2017, at 1:36 PM, David L Carlsonwrote: > > Since the number of choices is small (6), how about this? > > Starting with Jeff's initial DFM: > > DFM <- structure(list(obs = 1:6, start = structure(c(16467, 14710, 13152, > 13787, 15126, 12696), class = "Date"), end = structure(c(17167, > 14975, 13636, 13879, 15340, 12753), class = "Date"), D = c(700, > 265, 484, 92, 214, 57), bin = structure(c(6L, 3L, 5L, 1L, 3L, > 1L), .Label = c("[0,100)", "[100,200)", "[200,300)", "[300,400)", > "[400,500)", "[500,Inf)"), class = c("ordered", "factor"))), .Names = > c("obs", > "start", "end", "D", "bin"), row.names = c(NA, -6L), class = "data.frame") > > Construct a matrix of the six alternatives: > > tvals <- c(1, -1, -1, -1, -1, 0, 1, -1, -1, -1, 0, 0, 1, -1, -1, 0, 0, >0, 1, -1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0) > tmat <- matrix(tvals, 6, 5, byrow=TRUE) > colnames(tmat) <- paste0("t", 1:5) > tmat > # t1 t2 t3 t4 t5 > # [1,] 1 -1 -1 -1 -1 > # [2,] 0 1 -1 -1 -1 > # [3,] 0 0 1 -1 -1 > # [4,] 0 0 0 1 -1 > # [5,] 0 0 0 0 1 > # [6,] 0 0 0 0 0 > > idx <-as.numeric(DFM$bin) > (DFM <- data.frame(DFM, tmat[idx, ])) > #obs startend D bin t1 t2 t3 t4 t5 > # 1 1 2015-02-01 2017-01-01 700 [500,Inf) 0 0 0 0 0 > # 2 2 2010-04-11 2011-01-01 265 [200,300) 0 0 1 -1 -1 > # 3 3 2006-01-04 2007-05-03 484 [400,500) 0 0 0 0 1 > # 4 4 2007-10-01 2008-01-01 92 [0,100) 1 -1 -1 -1 -1 > # 5 5 2011-06-01 2012-01-01 214 [200,300) 0 0 1 -1 -1 > # 6 6 2004-10-05 2004-12-01 57 [0,100) 1 -1 -1 -1 -1 > > > David L. Carlson > Department of Anthropology > Texas A University > > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Val > Sent: Sunday, June 4, 2017 11:31 AM > To: Jeff Newmiller > Cc: r-help@R-project.org > Subject: Re: [R] New var > > Thank you Jeff and All, > > Within a given time period (say 700 days, from the start day), I am > expecting measurements taken at each time interval;. In this case "0" means > measurement taken, "1" not taken (stopped or opted out and " -1" don't > consider that time period for that individual. This will be compared with > the actual measurements taken (Observed- expected) within each time > interval. > > > > > On Sat, Jun 3, 2017 at 9:50 PM, Jeff Newmiller > wrote: > >> # read.table is NOT part of the data.table package >> #library(data.table) >> DFM <- read.table( text= >> 'obs start end >> 1 2/1/2015 1/1/2017 >> 2 4/11/2010 1/1/2011 >> 3 1/4/2006 5/3/2007 >> 4 10/1/2007 1/1/2008 >> 5 6/1/2011 1/1/2012 >> 6 10/5/2004 12/1/2004 >> ',header = TRUE, stringsAsFactors = FALSE) >> # cleaner way to compute D >> DFM$start <- as.Date( DFM$start, format="%m/%d/%Y" ) >> DFM$end <- as.Date( DFM$end, format="%m/%d/%Y" ) >> DFM$D <- as.numeric( DFM$end - DFM$start, units="days" ) >> # categorize your data into groups >> DFM$bin <- cut( DFM$D >> , breaks=c( seq( 0, 500, 100 ), Inf ) >> , right=FALSE # do not include the right edge >> , ordered_result = TRUE >> ) >> # brute force method you should have been able to figure out to show us >> some work >> DFM$t1 <- ifelse( DFM$D < 100, 1, 0 ) >> DFM$t2 <- ifelse( 100 <= DFM$D & DFM$D < 200, 1, ifelse( DFM$D < 100, -1, >> 0 ) ) >> DFM$t3 <- ifelse( 200 <= DFM$D & DFM$D < 300, 1, ifelse( DFM$D < 200, -1, >> 0 ) ) >> DFM$t4 <- ifelse( 300 <= DFM$D & DFM$D < 400, 1, ifelse( DFM$D < 300, -1, >> 0 ) ) >> DFM$t5 <- ifelse( 400 <= DFM$D & DFM$D < 500, 1, ifelse( DFM$D < 400, -1, >> 0 ) ) >> # brute force method with ordered factor >> DFM$tf1 <- ifelse( "[0,100)" == DFM$bin, 1, 0 ) >> DFM$tf2 <- ifelse( "[100,200)" == DFM$bin, 1, ifelse( "[100,200)" < >> DFM$bin, 0, -1 ) ) >> DFM$tf3 <- ifelse( "[200,300)" == DFM$bin, 1, ifelse( "[200,300)" < >> DFM$bin, 0, -1 ) ) >> DFM$tf4 <- ifelse( "[300,400)" == DFM$bin, 1, ifelse( "[300,400)" < >> DFM$bin, 0, -1 ) ) >> DFM$tf5 <- ifelse( "[400,500)" == DFM$bin, 1, ifelse( "[400,500)" < >> DFM$bin, 0, -1 ) ) >> # less obvious approach using the fact that factors are integers >> # and using the outer function to find all combinations of elements of two >> vectors >> # and the sign function >> DFM[ , paste0( "tm", 1:5 )] <- outer( as.integer( DFM$bin ) >>, 1:5 >>, FUN = function(x,y) { >> z <- sign(y-x)+1L >> ifelse( 2 == z, -1L, z ) >> } >>) >> >> # my result, provided using dput for precise representation >> DFMresult <- structure(list(obs = 1:6, start = structure(c(16467, 14710, >> 13152, 13787, 15126, 12696), class = "Date"), end = structure(c(17167, >> 14975, 13636, 13879, 15340, 12753), class = "Date"),
Re: [R] Regression expression to delete one or more spaces at end of string
Double the [[]] and add a + for one-or-more characters: sub("[[:blank:]]+$", "", COLNAMES) > On Aug 2, 2016, at 12:46 PM, Dennis Fisherwrote: > > R 3.3.1 > OS X > > Colleagues, > > I have encountered an unexpected regex problem > > I have read an Excel file into R using the readxl package. Columns names are: > > COLNAMES <- c("Study ID", "Test and Biological Matrix", "Subject No. ", > "Collection Date", > "Collection Time", "Scheduled Time Point", "Concentration", "Concentration > Units", > "LLOQ", "ULOQ", "Comment”) > > As you can see, there is a trailing space in “Subject No. “. I would like to > delete that space. The following works: > sub(“ $”, “”, COLNAMES) > However, I would like a more general approach that removes any trailing > whitespace. > > I tried variations such as: > sub("[:blank:]$", "", COLNAMES) > (also, without the $ and ‘space' instead of ‘blank') without success — to my > surprise, characters other than the trailing space were deleted but the > trailing space remained. > > Guidance on the correct syntax would be appreciated. > > Dennis > > Dennis Fisher MD > P < (The "P Less Than" Company) > Phone / Fax: 1-866-PLessThan (1-866-753-7784) > www.PLessThan.com > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Dr. David Forrest d...@vims.edu 804-684-7900w 757-968-5509h 804-413-7125c #240 Andrews Hall Virginia Institute of Marine Science Route 1208, Greate Road PO Box 1346 Gloucester Point, VA, 23062-1346 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] [FORGED] Generating random data with non-linear correlation between two variables
Please specify your goal in the oracle/psql analytical functions you know or specify what you mean by nonlinear correlation Sent from my iPhone > On Apr 9, 2016, at 6:09 AM, Muhammad Bilal> wrote: > > No its not. I am doing all these experiments for my own learning purpose. I > am Oracle SQL & PLSQL programmer and I can do these things with Oracle > analytical functions. > > However at present I am keen to learn R, with no other interest right now. > > Thanks > -- > Muhammad Bilal > Research Assistant and PhD Student, > Bristol Enterprise, Research and Innovation Centre (BERIC), > University of the West of England (UWE), > Frenchay Campus, > Bristol, > BS16 1QY > > muhammad2.bi...@live.uwe.ac.uk > > > > From: Rolf Turner > Sent: 09 April 2016 04:46 > To: Muhammad Bilal > Cc: r-help@r-project.org > Subject: Re: [FORGED] [R] Generating random data with non-linear correlation > between two variables > >> On 09/04/16 06:57, Muhammad Bilal wrote: >> Hi All, >> >> I am new to R and don't know how to achieve it. >> >> I am interested in generating a hypothetical dataframe that is consisted of >> say two variables named v1 and v2, based on the following constraints: >> 1. The range of v1 is 500-1500. >> 2. The mean of v1 is say 1100 >> 3. The range of v2 is 300-950. >> 4. The mean of v2 is say 400 >> 5. There exists a positive trend between these two variables, meaning that >> as v1 increases, v2 be also increase. >> 6. But the trend should be slightly non-linear. i.e., curved line. >> >> Is it possible to automatically generate through functions like rnorm. >> >> Any help will be highly appreciated. > > This sounds to me very much like a homework problem. We don't do > people's homework for them on this list. > > cheers, > > Rolf Turner > > -- > Technical Editor ANZJS > Department of Statistics > University of Auckland > Phone: +64-9-373-7599 ext. 88276 > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How numerical data is stored inside ts time series objects
On Apr 21, 2015, at 9:39 PM, Paul paul.domas...@gmail.com wrote: ... I rummaged around the help files for str, summary, dput, args. This seems like a more complicated language than Matlab, VBA, or even C++'s STL of old (which was pretty thoroughly documented). A function like str() returns an object description, and I'm guessing the conventions with which the object is described depends a lot on the person who wrote the handling code for the class. The description for the variable y seems particularly elaborate. Would I be right in assuming that the notation is ad-hoc and not documented? For example, the two invocations str(x) and str(y) show a Time-Series and a ts. And there are many lines of output for str(y) that is heavy in punctuation. The details of how str() represents your x and y variables is within the utils::stl.default() function. You can hunt this down and see the code with: methods(class=class(x)) # Find the class-specific handlers -- no str() methods(str) # Find the methods for the generic getAnywhere(str.default) # or getFromNamespace('str.default','utils') Within the utils::str.default code, this 'Time-Series' specific code only triggers if the object doesn't match a long list of other items (for example: is.function(), is.list(), is.vector(object) || (is.array(object) is.atomic(object)) ...) else if (stats::is.ts(object)) { tsp.a - stats::tsp(object) str1 - paste0( Time-Series , le.str, from , format(tsp.a[1L]), to , format(tsp.a[2L]), :) std.attr - c(tsp, class) } This handling is not dependent on who wrote the ts class, but on who wrote the str.default function. A more explict way to look at the difference without the str() summarization is with dput(x) and dput(y): dput(x) structure(c(464L, 675L, 703L, 887L, 1139L, 1077L, 1318L, 1260L, 1120L, 963L, 996L, 960L, 530L, 883L, 894L, 1045L, 1199L, 1287L, 1565L, 1577L, 1076L, 918L, 1008L, 1063L, 544L, 635L, 804L, 980L, 1018L, 1064L, 1404L, 1286L, 1104L, 999L, 996L, 1015L), .Tsp = c(1, 3.916667, 12), class = ts) dput(y) structure(c(464L, 675L, 703L, 887L, 1139L, 1077L, 1318L, 1260L, 1120L, 963L, 996L, 960L, 530L, 883L, 894L, 1045L, 1199L, 1287L, 1565L, 1577L, 1076L, 918L, 1008L, 1063L, 544L, 635L, 804L, 980L, 1018L, 1064L, 1404L, 1286L, 1104L, 999L, 996L, 1015L), .Dim = c(36L, 1L), .Dimnames = list(NULL, V1), .Tsp = c(1, 3.916667, 12), class = ts) Also, Matlab sometimes needs a squeeze() to drop degenerate dimensions, and R's drop() is similar, and is less-black-magic looking than the [[1]] code: str(drop(x)) Time-Series [1:36] from 1 to 3.92: 464 675 703 887 1139 1077 1318 1260 1120 963 ... str(drop(y)) Time-Series [1:36] from 1 to 3.92: 464 675 703 887 1139 1077 1318 1260 1120 963 ... stl(drop(x),s.window='per') stl(drop(y),s.window='per') Maybe str.default() should do Time-Series interpretation of is.ts() objects for matrices as well as vectors. Dave __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Mac Yosemite R.app console typing / autorepeat performance
I am having issues with performance in the R.app console on Mac Yosemite / OS X 10.10.1. 1) While editing commands on the command line, left or right arrow gradually slows within a session from usable rates to rates like 2/second or slower. 2) Autorepeat works differently for different characters: # - ## repeats normally f - f single character a - popup menu of differently-accented ‘a’ characters. arrow keys on an existing command line - navigate as expected I’m mostly irritated by the slowing navigation. On trying to generate a repeatable example, I think it happens from having lots of text in the console window: # open an R.app session # try autorepeating ‘#' across the console line while counting how many seconds it takes. # put a lot of output in the window: rnorm(10) # Autorepeat ‘#’ for the same number of seconds and see how much of the line is filled. I get about 1/3 of the film as above. If I do a couple more rnorm(10)s, the character insertions get queued up and slower than they typing and autorepeat. Is there a way to limit the scroll-history of the console buffer? Dave -- __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] SQL vs R
It sounds as if your underlying MySQL database is too slow for your purposes. Whatever you layer on top of it will be constrained by the underlying database. To speed up the process significantly, you may need to do work on the database backend part of the process. Dave On May 6, 2014, at 7:08 AM, Dr Eberhard Lisse e...@lisse.na wrote: Thanks, tried all of that, too slow. el on 2014-05-06, 12:00 Gabor Grothendieck said the following: On Tue, May 6, 2014 at 5:12 AM, Dr Eberhard Lisse e...@lisse.na wrote: Jeff It's in MySQL, at the moment roughly 1.8 GB, if I pull it into a dataframe it saves to 180MB. I work from the dataframe. But, it's not only a size issue it's also a speed issue and hence I don't care what I am going to use, as long as it is fast. sqldf is easy to understand for me but it takes ages. If alternatives were roughly similar in speed I would remain with sqldf. dplyr sounds faster, and promising, but the intrinsic stuff is way beyond me (elderly Gynaecologist) on the learning curve... You can create indices in sqldf and that can speed up processing substantially for certain operations. See examples 4h and 4i on the sqldf home page: http://sqldf.googlecode.com. Also note that sqldf supports not only the default SQLite backend but also MySQL, h2 and postgresql. See ?sqldf for info on using sqldf with MySQL and the others. -- Dr. Eberhard W. Lisse \/ Obstetrician Gynaecologist (Saar) e...@lisse.na/ * | Telephone: +264 81 124 6733 (cell) PO Box 8421 \ / Bachbrecht, Namibia ;/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dr. David Forrest d...@vims.edu 804-684-7900w 757-968-5509h 804-413-7125c #240 Andrews Hall Virginia Institute of Marine Science Route 1208, Greate Road PO Box 1346 Gloucester Point, VA, 23062-1346 signature.asc Description: Message signed with OpenPGP using GPGMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Double Pareto Log Normal Distribution DPLN
Hi Bander, I'm pushing this discussion back to the list, because I'm not sure of the shape/rate parameters for rpareto and rexp and how they'd be applied across this mix of typo'd papers. # Reed Equation 6 http://www.math.uvic.ca/faculty/reed/dPlN.3.pdf exponentiated per end of sec 3: rdpln-function(n,a=1,b=1,t=1,v=0){exp(v+t*rnorm(n,sd=t)+rexp(n,rate=1/a)-rexp(n,rate=1/b))} # Reed Equation 10: library(VGAM) rdpln2-function(n,a,b,v,t){ rlnorm(n,meanlog=v,sdlog=t)* rpareto(n,location=1,shape=1/a)/ rpareto(n,location=1,shape=1/b)} boxplot(data.frame(x1= log(rdpln(a=2.5,b=.01,t=0.45,v=6.5,n=10)), x2= log(rdpln2(a=2.5,b=.01,t=0.45,v=6.5,n=10 # Reed equation 8 http://www.math.uvic.ca/faculty/reed/dPlN.3.pdf # with S1 errata #1 from http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0048964 ddpln - function(x, a=1, b=1, v=0, t=1){ # Density of Double Pareto LogNormal distribution # from b. alzahrani cs_2...@hotmail.com email of 2013-11-13 # per formula 8 from http://cs.stanford.edu/people/jure/pubs/dpln-kdd08.pdf c - (a * b /(a+b)) norm1-pnorm((log(x)-v-(a*t^2))/t) norm2-pnorm((log(x)-v+(b*t^2))/t) expo1- a*v+(a^2*t^2/2) expo2- -b*v+(b^2*t^2/2) # edited from the paper's eqn 8 with: s/t/v/ z- (x^(-a-1)) * exp(expo1)*( norm1) y- (x^( b-1)) * exp(expo2)*(1-norm2) # 1-norm is the complementary CDF of N(0,1) c*(z+y) } Dave On Nov 14, 2013, at 9:12 AM, David Forrest d...@vims.edu wrote: I think exponentiation of eqn 6 from the Reed paper generates DPLN variates directly, so maybe: rdpln=function(n,a=1,b=1,t=1,v=0){exp(v+t*rnorm(n,sd=t)+rexp(n,rate=1/a)-rexp(n,rate=1/b))} Dave On Nov 13, 2013, at 4:34 PM, b. alzahrani cs_2...@hotmail.com wrote: You help is much appreciated. Just one last point if you could help, Now I want to pass this curve to a function that can generate random numbers distributed according to DPLN ( right curve). I found the package Runuran can do that http://cran.r-project.org/web/packages/Runuran/Runuran.pdf but I do not know how to do it I think it would be something similar to page 8 and 9. Regards ** Bander * From: d...@vims.edu To: cs_2...@hotmail.com Subject: Re: [R] Double Pareto Log Normal Distribution DPLN Date: Wed, 13 Nov 2013 21:13:43 + I read the parameters in Fig 4, right as DPLN[2.5,0.1,0.45,6.5], so: x- 10^seq(0,4,by=.1) plot(x,ddpln(x,a=2.5,b=.01,v=6.5,t=0.45),log='xy',type='l') ... and the attached graph does not look dissimilar the figure--It starts at 10^-2, goes through 10^-4 at x=100, and elbows down around 900 and passes through 10^-6 at about 2000. The correction of Reed helps -- The uncorrected Reed Eq9 equation suggests that the the 't' in Sehshadri Eq9 should be a 'v' , but it doesn't exactly make sense with the extra 'a' in there. If the errata clears that up, then your expo2 term looks just like the expo1 term, but with a=-b. On Nov 13, 2013, at 3:43 PM, b. alzahrani wrote: Thank you very much for the help and the change you suggested in my code, I also found a correction on equation 9 that has been published by Reed ( here http://www.math.uvic.ca/faculty/reed/dPlN.3.pdf i.e. the original paper on Double Pareto Log Normal Distribution ). can you please see the correction in this linkhttp://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0048964 (in Supporting Information section, appendix S1), does your suggested code coincide with the correction on this link? as I can see Actually, I am interested in the most right curve in figure 4. and if we plot the curve with same order of the paper's parameters you suggests: plot(x,ddpln(x,a=2.8,b=.01,v=6.5,t=0.45),log='xy',type='l') the curve is different from the one in the paper?? Thanks ** Bander Alzahrani, * From: d...@vims.edu To: cs_2...@hotmail.com CC: r-h...@stat.math.ethz.ch Subject: Re: [R] Double Pareto Log Normal Distribution DPLN Date: Wed, 13 Nov 2013 19:09:34 + ...Additionally...your set of parameters match none of the curves in figure 4. I think the ordering of the parameters as listed on the graphs is different than in the text of the article. The 'v' parameter controls the location of the 'elbow' and should be near log(x) in each graph, while the 't' parameter is the sharpness of the elbow. So eyeballing the elbows: log(c(80,150,300))= 4.382027 5.010635 5.703782 These appear to match positional parameter #4 in the legends, which you apply as parameter t in your function. # editing your function a bit: ddpln - function(x, a=2.5, b=0.01, v=log(100), t=v/10){ #
Re: [R] Double Pareto Log Normal Distribution DPLN
Looks like there are typos in equation 8 of http://cs.stanford.edu/people/jure/pubs/dpln-kdd08.pdf (expo2 doesn't depend on 'v') or equation 9 of http://www.math.uvic.ca/faculty/reed/dPlN.3.pdf ('a' is not specified). Dave On Nov 13, 2013, at 11:43 AM, b. alzahrani cs_2...@hotmail.com wrote: Hi I found this paper http://cs.stanford.edu/people/jure/pubs/dpln-kdd08.pdf that models the DPLN distribution as in equation 8. I implemented this in R but cannot get the same curve as in Figure 4. can you please check if my code below is correct: e.g. is the use of pnorm() correct here? ddlpn - function(x){ a=2.8 b=0.01 v=0.45 t=6.5 j - (a * b /(a+b)) norm1-pnorm((log(x)-v-(a*t^2))/t) expo1- a*v+(a^2*t^2/2) z-exp(expo1)*(x^(-a-1))*(norm1) norm2-pnorm((log(x)-v+(b*t^2))/t) expo2- -b*t+(b^2*t^2/2) y- x^(b-1)*exp(expo2)*(1-norm2) # 1-norm is the complementary CDF of N(0,1) j*(z+y) } ** Bander Alzahrani, Teacher Assistant Information Systems Department Faculty of Computing Information Technology King Abdulaziz University * From: d...@vims.edu To: cs_2...@hotmail.com CC: r-h...@stat.math.ethz.ch Subject: Re: [R] Double Pareto Log Normal Distribution Date: Tue, 12 Nov 2013 16:51:22 + http://www.math.uvic.ca/faculty/reed/dPlN.3.pdf is the original ref and has the equations. library(VGAM) for *pareto() and library(stats) for *lnorm() should get you most of the way there. On Nov 12, 2013, at 10:47 AM, b. alzahrani cs_2...@hotmail.com wrote: Hi guys I would like to generate random number Double Pareto Log Normal Distribution (DPLN). does anyone know how to do this in R or if there is any built-in function. Thanks ** Bander * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dr. David Forrest d...@vims.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dr. David Forrest d...@vims.edu signature.asc Description: Message signed with OpenPGP using GPGMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Double Pareto Log Normal Distribution DPLN
...Additionally...your set of parameters match none of the curves in figure 4. I think the ordering of the parameters as listed on the graphs is different than in the text of the article. The 'v' parameter controls the location of the 'elbow' and should be near log(x) in each graph, while the 't' parameter is the sharpness of the elbow. So eyeballing the elbows: log(c(80,150,300))= 4.382027 5.010635 5.703782 These appear to match positional parameter #4 in the legends, which you apply as parameter t in your function. # editing your function a bit: ddpln - function(x, a=2.5, b=0.01, v=log(100), t=v/10){ # Density of Double Pareto LogNormal distribution # from b. alzahrani cs_2...@hotmail.com email of 2013-11-13 # per formula 8 from http://cs.stanford.edu/people/jure/pubs/dpln-kdd08.pdf c - (a * b /(a+b)) norm1-pnorm((log(x)-v-(a*t^2))/t) norm2-pnorm((log(x)-v+(b*t^2))/t) expo1- a*v+(a^2*t^2/2) expo2- -b*v+(b^2*t^2/2) # edited from the paper's eqn 8 with: s/t/v/ z- (x^(-a-1)) * exp(expo1)*(norm1) y- (x^(b-1)) * exp(expo2)*(1-norm2) # 1-norm is the complementary CDF of N(0,1) c*(z+y) } x-10^seq(0,5,by=0.1) # 0-10k plot(x,ddpln(x,a=2.8,b=.01,v=3.8,t=0.35),log='xy',type='l') # Similar to fig 4 left. plot(x,ddpln(x,a=2.5,b=.01,v=log(2)),log='xy',type='l') plot(x,ddpln(x,a=2.5,b=.01,v=log(10)),log='xy',type='l') plot(x,ddpln(x,a=2.5,b=.01,v=log(100)),log='xy',type='l') plot(x,ddpln(x,a=2.5,b=.01,v=log(1000)),log='xy',type='l') plot(x,ddpln(x,a=2.5,b=.01,v=log(1)),log='xy',type='l') Dave On Nov 13, 2013, at 11:43 AM, b. alzahrani cs_2...@hotmail.com wrote: Hi I found this paper http://cs.stanford.edu/people/jure/pubs/dpln-kdd08.pdf that models the DPLN distribution as in equation 8. I implemented this in R but cannot get the same curve as in Figure 4. can you please check if my code below is correct: e.g. is the use of pnorm() correct here? ddlpn - function(x){ a=2.8 b=0.01 v=0.45 t=6.5 j - (a * b /(a+b)) norm1-pnorm((log(x)-v-(a*t^2))/t) expo1- a*v+(a^2*t^2/2) z-exp(expo1)*(x^(-a-1))*(norm1) norm2-pnorm((log(x)-v+(b*t^2))/t) expo2- -b*t+(b^2*t^2/2) y- x^(b-1)*exp(expo2)*(1-norm2) # 1-norm is the complementary CDF of N(0,1) j*(z+y) } ** Bander Alzahrani, Teacher Assistant Information Systems Department Faculty of Computing Information Technology King Abdulaziz University * From: d...@vims.edu To: cs_2...@hotmail.com CC: r-h...@stat.math.ethz.ch Subject: Re: [R] Double Pareto Log Normal Distribution Date: Tue, 12 Nov 2013 16:51:22 + http://www.math.uvic.ca/faculty/reed/dPlN.3.pdf is the original ref and has the equations. library(VGAM) for *pareto() and library(stats) for *lnorm() should get you most of the way there. On Nov 12, 2013, at 10:47 AM, b. alzahrani cs_2...@hotmail.com wrote: Hi guys I would like to generate random number Double Pareto Log Normal Distribution (DPLN). does anyone know how to do this in R or if there is any built-in function. Thanks ** Bander * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dr. David Forrest d...@vims.edu [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dr. David Forrest d...@vims.edu signature.asc Description: Message signed with OpenPGP using GPGMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Double Pareto Log Normal Distribution
http://www.math.uvic.ca/faculty/reed/dPlN.3.pdf is the original ref and has the equations. library(VGAM) for *pareto() and library(stats) for *lnorm() should get you most of the way there. On Nov 12, 2013, at 10:47 AM, b. alzahrani cs_2...@hotmail.com wrote: Hi guys I would like to generate random number Double Pareto Log Normal Distribution (DPLN). does anyone know how to do this in R or if there is any built-in function. Thanks ** Bander * [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Dr. David Forrest d...@vims.edu signature.asc Description: Message signed with OpenPGP using GPGMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Weights in mixed models
Hello everyone, I wonder if sample size can be used as weight in a weighted mixed model. Or should I use just the inverse of the variance? For example, the class'lmer' in the 'lme4' package have an option 'weights' (as in the class 'lm' of 'stats'). In the help of lme4 there is an example using 'herd size' as weight in a mixed model. So, if I have a measurement data (eg height) of 10 groups (sample size ranging from 30 to 3000 individuals for each group) can I use this number (N, sample size) in the 'weights' option? Is this wrong? Finally, what to do if the results (coefficients) of weighing by 'inverse of the variance' or by 'sample size' are very different, even opposite? Thank you very much in advance David __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.