Re: [R] New var

2017-06-04 Thread David R Forrest


Sent from my iPhone

> On Jun 4, 2017, at 1:36 PM, David L Carlson  wrote:
> 
> Since the number of choices is small (6), how about this?
> 
> Starting with Jeff's initial DFM:
> 
> DFM <- structure(list(obs = 1:6, start = structure(c(16467, 14710, 13152, 
> 13787, 15126, 12696), class = "Date"), end = structure(c(17167, 
> 14975, 13636, 13879, 15340, 12753), class = "Date"), D = c(700, 
> 265, 484, 92, 214, 57), bin = structure(c(6L, 3L, 5L, 1L, 3L, 
> 1L), .Label = c("[0,100)", "[100,200)", "[200,300)", "[300,400)", 
> "[400,500)", "[500,Inf)"), class = c("ordered", "factor"))), .Names = 
> c("obs", 
> "start", "end", "D", "bin"), row.names = c(NA, -6L), class = "data.frame")
> 
> Construct a matrix of the six alternatives:
> 
> tvals <- c(1, -1, -1, -1, -1, 0, 1, -1, -1, -1, 0, 0, 1, -1, -1, 0, 0, 
>0, 1, -1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0)
> tmat <- matrix(tvals, 6, 5, byrow=TRUE)
> colnames(tmat) <- paste0("t", 1:5)
> tmat
> #  t1 t2 t3 t4 t5
> # [1,]  1 -1 -1 -1 -1
> # [2,]  0  1 -1 -1 -1
> # [3,]  0  0  1 -1 -1
> # [4,]  0  0  0  1 -1
> # [5,]  0  0  0  0  1
> # [6,]  0  0  0  0  0
> 
> idx <-as.numeric(DFM$bin)
> (DFM <- data.frame(DFM, tmat[idx, ]))
> #obs  startend   D   bin t1 t2 t3 t4 t5
> # 1   1 2015-02-01 2017-01-01 700 [500,Inf)  0  0  0  0  0
> # 2   2 2010-04-11 2011-01-01 265 [200,300)  0  0  1 -1 -1
> # 3   3 2006-01-04 2007-05-03 484 [400,500)  0  0  0  0  1
> # 4   4 2007-10-01 2008-01-01  92   [0,100)  1 -1 -1 -1 -1
> # 5   5 2011-06-01 2012-01-01 214 [200,300)  0  0  1 -1 -1
> # 6   6 2004-10-05 2004-12-01  57   [0,100)  1 -1 -1 -1 -1
> 
> 
> David L. Carlson
> Department of Anthropology
> Texas A University
> 
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Val
> Sent: Sunday, June 4, 2017 11:31 AM
> To: Jeff Newmiller 
> Cc: r-help@R-project.org
> Subject: Re: [R] New var
> 
> Thank you Jeff and All,
> 
> Within a given time period (say 700 days, from the start day),  I am
> expecting measurements taken at each time interval;. In this case "0" means
> measurement taken, "1"  not taken (stopped or opted out  and " -1"  don't
> consider that time period for that individual. This will be compared with
> the actual measurements taken (Observed- expected)  within each time
> interval.
> 
> 
> 
> 
> On Sat, Jun 3, 2017 at 9:50 PM, Jeff Newmiller 
> wrote:
> 
>> # read.table is NOT part of the data.table package
>> #library(data.table)
>> DFM <- read.table( text=
>> 'obs start end
>> 1 2/1/2015   1/1/2017
>> 2 4/11/2010  1/1/2011
>> 3 1/4/2006   5/3/2007
>> 4 10/1/2007  1/1/2008
>> 5 6/1/2011   1/1/2012
>> 6 10/5/2004 12/1/2004
>> ',header = TRUE, stringsAsFactors = FALSE)
>> # cleaner way to compute D
>> DFM$start <- as.Date( DFM$start, format="%m/%d/%Y" )
>> DFM$end <- as.Date( DFM$end, format="%m/%d/%Y" )
>> DFM$D <- as.numeric( DFM$end - DFM$start, units="days" )
>> # categorize your data into groups
>> DFM$bin <- cut( DFM$D
>>  , breaks=c( seq( 0, 500, 100 ), Inf )
>>  , right=FALSE # do not include the right edge
>>  , ordered_result = TRUE
>>  )
>> # brute force method you should have been able to figure out to show us
>> some work
>> DFM$t1 <- ifelse( DFM$D < 100, 1, 0 )
>> DFM$t2 <- ifelse( 100 <= DFM$D & DFM$D < 200, 1, ifelse( DFM$D < 100, -1,
>> 0 ) )
>> DFM$t3 <- ifelse( 200 <= DFM$D & DFM$D < 300, 1, ifelse( DFM$D < 200, -1,
>> 0 ) )
>> DFM$t4 <- ifelse( 300 <= DFM$D & DFM$D < 400, 1, ifelse( DFM$D < 300, -1,
>> 0 ) )
>> DFM$t5 <- ifelse( 400 <= DFM$D & DFM$D < 500, 1, ifelse( DFM$D < 400, -1,
>> 0 ) )
>> # brute force method with ordered factor
>> DFM$tf1 <- ifelse( "[0,100)" == DFM$bin, 1, 0 )
>> DFM$tf2 <- ifelse( "[100,200)" == DFM$bin, 1, ifelse( "[100,200)" <
>> DFM$bin, 0, -1 ) )
>> DFM$tf3 <- ifelse( "[200,300)" == DFM$bin, 1, ifelse( "[200,300)" <
>> DFM$bin, 0, -1 ) )
>> DFM$tf4 <- ifelse( "[300,400)" == DFM$bin, 1, ifelse( "[300,400)" <
>> DFM$bin, 0, -1 ) )
>> DFM$tf5 <- ifelse( "[400,500)" == DFM$bin, 1, ifelse( "[400,500)" <
>> DFM$bin, 0, -1 ) )
>> # less obvious approach using the fact that factors are integers
>> # and using the outer function to find all combinations of elements of two
>> vectors
>> # and the sign function
>> DFM[ , paste0( "tm", 1:5 )] <- outer( as.integer( DFM$bin )
>>, 1:5
>>, FUN = function(x,y) {
>>  z <- sign(y-x)+1L
>>  ifelse( 2 == z, -1L, z )
>>  }
>>)
>> 
>> # my result, provided using dput for precise representation
>> DFMresult <- structure(list(obs = 1:6, start = structure(c(16467, 14710,
>> 13152, 13787, 15126, 12696), class = "Date"), end = structure(c(17167,
>> 14975, 13636, 13879, 15340, 12753), class = "Date"), 

Re: [R] Regression expression to delete one or more spaces at end of string

2016-08-02 Thread David R Forrest
Double the [[]] and add a + for one-or-more characters: 

sub("[[:blank:]]+$", "", COLNAMES)



> On Aug 2, 2016, at 12:46 PM, Dennis Fisher  wrote:
> 
> R 3.3.1
> OS X
> 
> Colleagues, 
> 
> I have encountered an unexpected regex problem
> 
> I have read an Excel file into R using the readxl package.  Columns names are:
> 
> COLNAMES  <- c("Study ID", "Test and Biological Matrix", "Subject No. ", 
> "Collection Date", 
> "Collection Time", "Scheduled Time Point", "Concentration", "Concentration 
> Units", 
> "LLOQ", "ULOQ", "Comment”)
> 
> As you can see, there is a trailing space in “Subject No. “.  I would like to 
> delete that space.  The following works:
>   sub(“ $”, “”, COLNAMES)
> However, I would like a more general approach that removes any trailing 
> whitespace.
> 
> I tried variations such as:
>   sub("[:blank:]$", "", COLNAMES)
> (also, without the $ and ‘space' instead of ‘blank') without success — to my 
> surprise, characters other than the trailing space were deleted but the 
> trailing space remained.
> 
> Guidance on the correct syntax would be appreciated.
> 
> Dennis
> 
> Dennis Fisher MD
> P < (The "P Less Than" Company)
> Phone / Fax: 1-866-PLessThan (1-866-753-7784)
> www.PLessThan.com
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Dr. David Forrest
d...@vims.edu
804-684-7900w
757-968-5509h
804-413-7125c
#240 Andrews Hall
Virginia Institute of Marine Science
Route 1208, Greate Road
PO Box 1346
Gloucester Point, VA, 23062-1346










__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] [FORGED] Generating random data with non-linear correlation between two variables

2016-04-09 Thread David R Forrest
Please specify your goal in the oracle/psql analytical functions you know or 
specify what you mean by nonlinear correlation 

Sent from my iPhone

> On Apr 9, 2016, at 6:09 AM, Muhammad Bilal  
> wrote:
> 
> No its not. I am doing all these experiments for my own learning purpose. I 
> am Oracle SQL & PLSQL programmer and  I can do these things with Oracle 
> analytical functions.
> 
> However at present I am keen to learn R, with no other interest right now.
> 
> Thanks
> --
> Muhammad Bilal
> Research Assistant and PhD Student,
> Bristol Enterprise, Research and Innovation Centre (BERIC),
> University of the West of England (UWE),
> Frenchay Campus,
> Bristol,
> BS16 1QY
> 
> muhammad2.bi...@live.uwe.ac.uk
> 
> 
> 
> From: Rolf Turner 
> Sent: 09 April 2016 04:46
> To: Muhammad Bilal
> Cc: r-help@r-project.org
> Subject: Re: [FORGED] [R] Generating random data with non-linear correlation 
> between two variables
> 
>> On 09/04/16 06:57, Muhammad Bilal wrote:
>> Hi All,
>> 
>> I am new to R and don't know how to achieve it.
>> 
>> I am interested in generating a hypothetical dataframe that is consisted of 
>> say two variables named v1 and v2, based on the following constraints:
>> 1. The range of v1 is 500-1500.
>> 2. The mean of v1 is say 1100
>> 3. The range of v2 is 300-950.
>> 4. The mean of v2 is say 400
>> 5. There exists a positive trend between these two variables, meaning that 
>> as v1 increases, v2 be also increase.
>> 6. But the trend should be slightly non-linear. i.e., curved line.
>> 
>> Is it possible to automatically generate through functions like rnorm.
>> 
>> Any help will be highly appreciated.
> 
> This sounds to me very much like a homework problem.  We don't do
> people's homework for them on this list.
> 
> cheers,
> 
> Rolf Turner
> 
> --
> Technical Editor ANZJS
> Department of Statistics
> University of Auckland
> Phone: +64-9-373-7599 ext. 88276
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How numerical data is stored inside ts time series objects

2015-04-22 Thread David R Forrest

 On Apr 21, 2015, at 9:39 PM, Paul paul.domas...@gmail.com wrote:
...
 I rummaged around the help files for str, summary, dput, args.  This
 seems like a more complicated language than Matlab, VBA, or even C++'s
 STL of old (which was pretty thoroughly documented).  A function like
 str() returns an object description, and I'm guessing the conventions
 with which the object is described depends a lot on the person who
 wrote the handling code for the class.  The description for the
 variable y seems particularly elaborate.
 
 Would I be right in assuming that the notation is ad-hoc and not
 documented?  For example, the two invocations str(x) and str(y) show a
 Time-Series and a ts.  And there are many lines of output for str(y)
 that is heavy in punctuation.
 

The details of how str() represents your x and y variables is within the 
utils::stl.default() function.  You can hunt this down and see the code with:

  methods(class=class(x))  # Find the class-specific handlers -- no str()
  methods(str) # Find the methods for the generic
  getAnywhere(str.default)   # or getFromNamespace('str.default','utils')
  

Within the utils::str.default code, this 'Time-Series' specific code only 
triggers if the object doesn't match a long list of other items (for example: 
is.function(), is.list(), is.vector(object) || (is.array(object)  
is.atomic(object)) ...)   

else if (stats::is.ts(object)) {
tsp.a - stats::tsp(object)
str1 - paste0( Time-Series , le.str,  from , 
format(tsp.a[1L]),  to , format(tsp.a[2L]), 
:)
std.attr - c(tsp, class)
}

This handling is not dependent on who wrote the ts class, but on who wrote the 
str.default function.  

A more explict way to look at the difference without the str() summarization is 
with dput(x) and dput(y):

 dput(x)
structure(c(464L, 675L, 703L, 887L, 1139L, 1077L, 1318L, 1260L, 
1120L, 963L, 996L, 960L, 530L, 883L, 894L, 1045L, 1199L, 1287L, 
1565L, 1577L, 1076L, 918L, 1008L, 1063L, 544L, 635L, 804L, 980L, 
1018L, 1064L, 1404L, 1286L, 1104L, 999L, 996L, 1015L), .Tsp = c(1, 
3.916667, 12), class = ts)
 dput(y)
structure(c(464L, 675L, 703L, 887L, 1139L, 1077L, 1318L, 1260L, 
1120L, 963L, 996L, 960L, 530L, 883L, 894L, 1045L, 1199L, 1287L, 
1565L, 1577L, 1076L, 918L, 1008L, 1063L, 544L, 635L, 804L, 980L, 
1018L, 1064L, 1404L, 1286L, 1104L, 999L, 996L, 1015L), .Dim = c(36L, 
1L), .Dimnames = list(NULL, V1), .Tsp = c(1, 3.916667, 
12), class = ts)


Also, Matlab sometimes needs a squeeze() to drop degenerate dimensions, and R's 
drop() is similar, and is less-black-magic looking than the [[1]] code:


 str(drop(x))
 Time-Series [1:36] from 1 to 3.92: 464 675 703 887 1139 1077 1318 1260 1120 
963 ...
 str(drop(y))
 Time-Series [1:36] from 1 to 3.92: 464 675 703 887 1139 1077 1318 1260 1120 
963 ...

stl(drop(x),s.window='per')
stl(drop(y),s.window='per') 

Maybe str.default() should do Time-Series interpretation of is.ts() objects for 
matrices as well as vectors.

Dave

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Mac Yosemite R.app console typing / autorepeat performance

2015-01-13 Thread David R Forrest
I am having issues with performance in the R.app console on Mac Yosemite / OS X 
10.10.1.  

1) While editing commands on the command line, left or right arrow gradually 
slows within a session from usable rates to rates like 2/second or slower.

2) Autorepeat works differently for different characters:

  # - ##  repeats normally
  f - f   single character
  a - popup menu of differently-accented ‘a’ characters.
  arrow keys on an existing command line - navigate as expected

I’m mostly irritated by the slowing navigation.

On trying to generate a repeatable example, I think it happens from having lots 
of text in the console window:

# open an R.app session
# try autorepeating ‘#' across the console line while counting how many seconds 
it takes.
# put a lot of output in the window:
rnorm(10)
# Autorepeat ‘#’ for the same number of seconds and see how much of the line is 
filled.   I get about 1/3 of the film as above.

If I do a couple more rnorm(10)s, the character insertions get queued up 
and slower than they typing and autorepeat.

Is there a way to limit the scroll-history of the console buffer?

Dave
--









__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] SQL vs R

2014-05-06 Thread David R Forrest
It sounds as if your underlying MySQL database is too slow for your purposes.  
Whatever you layer on top of it will be constrained by the underlying database. 
 To speed up the process significantly, you may need to do work on the database 
backend part of the process.

Dave


On May 6, 2014, at 7:08 AM, Dr Eberhard Lisse e...@lisse.na wrote:

 Thanks,
 
 tried all of that, too slow.
 
 el
 
 on 2014-05-06, 12:00 Gabor Grothendieck said the following:
 On Tue, May 6, 2014 at 5:12 AM, Dr Eberhard Lisse e...@lisse.na wrote:
 Jeff
 
 It's in MySQL, at the moment roughly 1.8 GB, if I pull it into a
 dataframe it saves to 180MB. I work from the dataframe.
 
 But, it's not only a size issue it's also a speed issue and hence
 I don't care what I am going to use, as long as it is fast.
 
 sqldf is easy to understand for me but it takes ages.  If
 alternatives were roughly similar in speed I would remain with
 sqldf.
 
 dplyr sounds faster, and promising, but the intrinsic stuff is
 way beyond me (elderly Gynaecologist) on the learning curve...
 
 You can create indices in sqldf and that can speed up processing
 substantially for certain operations.  See examples 4h and 4i on
 the sqldf home page: http://sqldf.googlecode.com.  Also note that
 sqldf supports not only the default SQLite backend but also MySQL,
 h2 and postgresql.  See ?sqldf for info on using sqldf with MySQL
 and the others.
 
 
 -- 
 Dr. Eberhard W. Lisse  \/ Obstetrician  Gynaecologist (Saar)
 e...@lisse.na/ * |   Telephone: +264 81 124 6733 (cell)
 PO Box 8421 \ /
 Bachbrecht, Namibia ;/
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

--
Dr. David Forrest
d...@vims.edu
804-684-7900w
757-968-5509h
804-413-7125c
#240 Andrews Hall
Virginia Institute of Marine Science
Route 1208, Greate Road
PO Box 1346
Gloucester Point, VA, 23062-1346












signature.asc
Description: Message signed with OpenPGP using GPGMail
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Double Pareto Log Normal Distribution DPLN

2013-11-14 Thread David R Forrest
Hi Bander,

I'm pushing this discussion back to the list, because I'm not sure of the 
shape/rate parameters for rpareto and rexp and how they'd be applied across 
this mix of typo'd papers.

# Reed Equation 6 http://www.math.uvic.ca/faculty/reed/dPlN.3.pdf exponentiated 
per end of sec 3:
rdpln-function(n,a=1,b=1,t=1,v=0){exp(v+t*rnorm(n,sd=t)+rexp(n,rate=1/a)-rexp(n,rate=1/b))}

# Reed Equation 10:
library(VGAM)
rdpln2-function(n,a,b,v,t){ rlnorm(n,meanlog=v,sdlog=t)*
   rpareto(n,location=1,shape=1/a)/
   rpareto(n,location=1,shape=1/b)}
 
boxplot(data.frame(x1= log(rdpln(a=2.5,b=.01,t=0.45,v=6.5,n=10)), x2= 
log(rdpln2(a=2.5,b=.01,t=0.45,v=6.5,n=10  

# Reed equation 8 http://www.math.uvic.ca/faculty/reed/dPlN.3.pdf 
# with S1 errata #1 from 
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0048964 

ddpln - function(x, a=1, b=1, v=0, t=1){
 # Density of Double Pareto LogNormal distribution
 # from b. alzahrani cs_2...@hotmail.com email of 2013-11-13
 # per formula 8 from http://cs.stanford.edu/people/jure/pubs/dpln-kdd08.pdf

  c - (a * b /(a+b))

  norm1-pnorm((log(x)-v-(a*t^2))/t)
  norm2-pnorm((log(x)-v+(b*t^2))/t)
  expo1-  a*v+(a^2*t^2/2)
  expo2- -b*v+(b^2*t^2/2)  # edited from the paper's eqn 8  with: s/t/v/

  z- (x^(-a-1)) * exp(expo1)*(  norm1)
  y- (x^( b-1)) * exp(expo2)*(1-norm2)  # 1-norm is the complementary CDF of 
N(0,1)

  c*(z+y)
}


Dave



On Nov 14, 2013, at 9:12 AM, David Forrest d...@vims.edu
 wrote:

 
 I think exponentiation of eqn 6 from the Reed paper generates DPLN variates 
 directly, so maybe:
 
 rdpln=function(n,a=1,b=1,t=1,v=0){exp(v+t*rnorm(n,sd=t)+rexp(n,rate=1/a)-rexp(n,rate=1/b))}
 
 
 Dave
 
 
 On Nov 13, 2013, at 4:34 PM, b. alzahrani cs_2...@hotmail.com
 wrote:
 
 You help is much appreciated. Just one last point if you could help, Now I 
 want to pass this curve to a function that can generate random numbers 
 distributed according to DPLN ( right curve).
 
 I found the package Runuran can do that 
 http://cran.r-project.org/web/packages/Runuran/Runuran.pdf  but I do not 
 know how to do it I think it would be something similar to page 8 and 9.
 
 Regards
 **
 Bander 
 *
 
 
 
 From: d...@vims.edu
 To: cs_2...@hotmail.com
 Subject: Re: [R] Double Pareto Log Normal Distribution DPLN
 Date: Wed, 13 Nov 2013 21:13:43 +
 
 
 I read the parameters in Fig 4, right as DPLN[2.5,0.1,0.45,6.5], so:
 
 x- 10^seq(0,4,by=.1)
 plot(x,ddpln(x,a=2.5,b=.01,v=6.5,t=0.45),log='xy',type='l')
 
 ... and the attached graph does not look dissimilar the figure--It starts at 
 10^-2, goes through 10^-4 at x=100, and elbows down around 900 and passes 
 through 10^-6 at about 2000.
 
 The correction of Reed helps -- The uncorrected Reed Eq9 equation suggests 
 that the the 't' in Sehshadri Eq9 should be a 'v' , but it doesn't exactly 
 make sense with the extra 'a' in there.  If the errata clears that up, then 
 your expo2 term looks just like the expo1 term, but with a=-b.
 
 
 
 
 
 On Nov 13, 2013, at 3:43 PM, b. alzahrani wrote:  Thank you very much for 
 the help and the change you suggested in my code, I also found a correction 
 on equation 9 that has been published by Reed ( here 
 http://www.math.uvic.ca/faculty/reed/dPlN.3.pdf i.e. the original paper on 
 Double Pareto Log Normal Distribution ).   can you please see the 
 correction in this 
 linkhttp://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0048964
  (in Supporting Information section, appendix S1), does your suggested code 
 coincide with the correction on this link? as I can see   Actually, I am 
 interested in the most right curve in figure 4. and if we plot the curve 
 with same order of the paper's parameters you suggests: 
 plot(x,ddpln(x,a=2.8,b=.01,v=6.5,t=0.45),log='xy',type='l') the curve is 
 different from the one in the paper??   Thanks   
 **  Bander 
 Alzahrani,  *  From: 
 d...@vims.edu   To: cs_2...@hotmail.com   CC: r-h...@stat.math.ethz.ch  
  Subject: Re: [R] Double Pareto Log Normal Distribution DPLN   Date: Wed, 
 13 Nov 2013 19:09:34 + ...Additionally...your set of parameters 
 match none of the curves in figure 4. I think the ordering of the 
 parameters as listed on the graphs is different than in the text of the 
 article. The 'v' parameter controls the location of the 'elbow' and 
 should be near log(x) in each graph, while the 't' parameter is the 
 sharpness of the elbow. So eyeballing the elbows: 
 log(c(80,150,300))= 4.382027 5.010635 5.703782 These appear to match 
 positional parameter #4 in the legends, which you apply as parameter t in 
 your function.   # editing your function a bit: ddpln - 
 function(x, a=2.5, b=0.01, v=log(100), t=v/10){   # 

Re: [R] Double Pareto Log Normal Distribution DPLN

2013-11-13 Thread David R Forrest

Looks like there are typos in equation 8 of 
http://cs.stanford.edu/people/jure/pubs/dpln-kdd08.pdf  (expo2 doesn't depend 
on 'v') or equation 9 of http://www.math.uvic.ca/faculty/reed/dPlN.3.pdf ('a' 
is not specified).

Dave

On Nov 13, 2013, at 11:43 AM, b. alzahrani cs_2...@hotmail.com
 wrote:

 
 Hi
 
 I found this paper http://cs.stanford.edu/people/jure/pubs/dpln-kdd08.pdf 
 that models the DPLN distribution as in equation 8. I implemented this in R 
 but cannot get the same curve as in Figure 4. can you please check if my code 
 below is correct: e.g. is the use of pnorm() correct here?
 
 ddlpn - function(x){
  a=2.8
  b=0.01
  v=0.45
  t=6.5
  j - (a * b /(a+b))
 
  norm1-pnorm((log(x)-v-(a*t^2))/t)
  expo1- a*v+(a^2*t^2/2)
 
  z-exp(expo1)*(x^(-a-1))*(norm1)
 
  norm2-pnorm((log(x)-v+(b*t^2))/t)
 expo2- -b*t+(b^2*t^2/2)
 
  y- x^(b-1)*exp(expo2)*(1-norm2)  # 1-norm is the complementary CDF of N(0,1)
  j*(z+y)
 }
 **
 Bander Alzahrani, Teacher Assistant
 Information Systems Department
 Faculty of Computing  Information Technology
 King Abdulaziz University
 
 *
 
 
 
 From: d...@vims.edu
 To: cs_2...@hotmail.com
 CC: r-h...@stat.math.ethz.ch
 Subject: Re: [R] Double Pareto Log Normal Distribution
 Date: Tue, 12 Nov 2013 16:51:22 +
 
 
 http://www.math.uvic.ca/faculty/reed/dPlN.3.pdf is the original ref and has 
 the equations.
 
 library(VGAM) for *pareto() and library(stats) for *lnorm() should get you 
 most of the way there.
 
 On Nov 12, 2013, at 10:47 AM, b. alzahrani cs_2...@hotmail.com
 wrote:
 
 Hi guys
 I would like to generate random number Double Pareto Log Normal 
 Distribution (DPLN). does anyone know how to do this in R or if there is 
 any built-in function.
 
 Thanks
 
 **
 Bander 
 *
 
 
   
 [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 --
 Dr. David Forrest
 d...@vims.edu
 
 
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

--
Dr. David Forrest
d...@vims.edu




signature.asc
Description: Message signed with OpenPGP using GPGMail
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Double Pareto Log Normal Distribution DPLN

2013-11-13 Thread David R Forrest
...Additionally...your set of parameters match none of the curves in figure 4.

I think the ordering of the parameters as listed on the graphs is different 
than in the text of the article.

The 'v' parameter controls the location of the 'elbow' and should be near 
log(x) in each graph, while the 't' parameter is the sharpness of the elbow.  
So eyeballing the elbows:

   log(c(80,150,300))= 4.382027 5.010635 5.703782

These appear to match positional parameter #4 in the legends, which you apply 
as parameter t in your function.  


# editing your function a bit:

ddpln - function(x, a=2.5, b=0.01, v=log(100), t=v/10){
  # Density of Double Pareto LogNormal distribution
  # from b. alzahrani cs_2...@hotmail.com email of 2013-11-13
  # per formula 8 from http://cs.stanford.edu/people/jure/pubs/dpln-kdd08.pdf

   c - (a * b /(a+b))

   norm1-pnorm((log(x)-v-(a*t^2))/t)
   norm2-pnorm((log(x)-v+(b*t^2))/t)
   expo1-  a*v+(a^2*t^2/2)
   expo2- -b*v+(b^2*t^2/2)  # edited from the paper's eqn 8  with: s/t/v/

   z- (x^(-a-1)) * exp(expo1)*(norm1)
   y- (x^(b-1))  * exp(expo2)*(1-norm2)  # 1-norm is the complementary CDF of 
N(0,1)

   c*(z+y)
}

x-10^seq(0,5,by=0.1) # 0-10k

plot(x,ddpln(x,a=2.8,b=.01,v=3.8,t=0.35),log='xy',type='l')  # Similar to fig 4 
left.

plot(x,ddpln(x,a=2.5,b=.01,v=log(2)),log='xy',type='l')
plot(x,ddpln(x,a=2.5,b=.01,v=log(10)),log='xy',type='l')
plot(x,ddpln(x,a=2.5,b=.01,v=log(100)),log='xy',type='l')
plot(x,ddpln(x,a=2.5,b=.01,v=log(1000)),log='xy',type='l')
plot(x,ddpln(x,a=2.5,b=.01,v=log(1)),log='xy',type='l')

Dave

On Nov 13, 2013, at 11:43 AM, b. alzahrani cs_2...@hotmail.com
 wrote:

 
 Hi
 
 I found this paper http://cs.stanford.edu/people/jure/pubs/dpln-kdd08.pdf 
 that models the DPLN distribution as in equation 8. I implemented this in R 
 but cannot get the same curve as in Figure 4. can you please check if my code 
 below is correct: e.g. is the use of pnorm() correct here?
 
 ddlpn - function(x){
  a=2.8
  b=0.01
  v=0.45
  t=6.5
  j - (a * b /(a+b))
 
  norm1-pnorm((log(x)-v-(a*t^2))/t)
  expo1- a*v+(a^2*t^2/2)
 
  z-exp(expo1)*(x^(-a-1))*(norm1)
 
  norm2-pnorm((log(x)-v+(b*t^2))/t)
 expo2- -b*t+(b^2*t^2/2)
 
  y- x^(b-1)*exp(expo2)*(1-norm2)  # 1-norm is the complementary CDF of N(0,1)
  j*(z+y)
 }
 **
 Bander Alzahrani, Teacher Assistant
 Information Systems Department
 Faculty of Computing  Information Technology
 King Abdulaziz University
 
 *
 
 
 
 From: d...@vims.edu
 To: cs_2...@hotmail.com
 CC: r-h...@stat.math.ethz.ch
 Subject: Re: [R] Double Pareto Log Normal Distribution
 Date: Tue, 12 Nov 2013 16:51:22 +
 
 
 http://www.math.uvic.ca/faculty/reed/dPlN.3.pdf is the original ref and has 
 the equations.
 
 library(VGAM) for *pareto() and library(stats) for *lnorm() should get you 
 most of the way there.
 
 On Nov 12, 2013, at 10:47 AM, b. alzahrani cs_2...@hotmail.com
 wrote:
 
 Hi guys
 I would like to generate random number Double Pareto Log Normal 
 Distribution (DPLN). does anyone know how to do this in R or if there is 
 any built-in function.
 
 Thanks
 
 **
 Bander 
 *
 
 
   
 [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 --
 Dr. David Forrest
 d...@vims.edu
 
 
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

--
Dr. David Forrest
d...@vims.edu






signature.asc
Description: Message signed with OpenPGP using GPGMail
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Double Pareto Log Normal Distribution

2013-11-12 Thread David R Forrest

http://www.math.uvic.ca/faculty/reed/dPlN.3.pdf is the original ref and has the 
equations.

library(VGAM) for *pareto() and library(stats) for *lnorm() should get you most 
of the way there.

On Nov 12, 2013, at 10:47 AM, b. alzahrani cs_2...@hotmail.com
 wrote:

 Hi guys
 I would like to generate random number Double Pareto Log Normal Distribution 
 (DPLN). does anyone know how to do this in R or if there is any built-in 
 function.
 
 Thanks
 
 **
 Bander 
 *
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

--
Dr. David Forrest
d...@vims.edu







signature.asc
Description: Message signed with OpenPGP using GPGMail
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Weights in mixed models

2010-07-24 Thread David R.
Hello everyone,

I wonder if sample size can be used as weight  in a weighted mixed model. Or 
should I use just the inverse of the variance?

For example, the class'lmer' in the 'lme4' package have an option 'weights'
(as in the class 'lm' of 'stats'). In the help of lme4 there is an example 
using 
'herd size' as weight in a mixed model.
 
 So, if I have a measurement data (eg height) of 10 groups (sample size ranging 
from 30 to 3000 individuals for each group) can I  use this number (N, sample 
size) in the 'weights' option? Is this wrong? 

 
Finally, what to do if the results (coefficients) of weighing by 'inverse of 
the 
variance' or by 'sample size' are very different, even opposite?


Thank you very much in advance
David

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.