[R] constructing a dataframe from a database of newspaper articles
I am hoping for some assistance with formatting a large text file which consists of a series of individual records. Each record includes specific labels/field names (a sample of 1 record (one of the longest ones) is below - at end of post. What I want to do is reformat the data, so that each individual record becomes a row (some cells will have a lot of text). For example, the column variables I want are (a) HD in one column (b)BY in one column (c) WC data in one column, (d) PD data in one column, (e) SC data in one column (f) PG data in one column g) LP and TD text in one column - this column can contain quite a lot of text, e.g 1900 words. The other fields are unwanted If there were 150 individual records, when formatted this would be a 7 column by 150 row dataset. I was advised to: 1. read in the file using readLines giving a character vector one element per input line. 2. convert that to lines of the form: id op text where each such line is a field and multiline fields have been collapsed into a single line of text. This step involves detailed processing and you could do it in a loop or you could try a vectorized approach. A vectorized approach will likely involve using 3. the lines created above could be converted to a data frame with three columns and 4. reshape used to create a wide data frame. 5. then write it out using write.csv. I have got as far as being able to read the text into R - I am unsure if the warning is a problem. I am however, not at all sure what I need to do next. Any assistance is much appreciated, Bob (A) syntax mht - scan(what=c:\\cm-mht1.txt). readLines(c:\\cm-mht1.txt,n = -1) [8376] © 2006 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All [8377] rights reserved. Warning message: incomplete final line found by readLines on 'c:\cm-mht1.txt' (B) sample data HD Was Charles Manson temporarily insane when he led a wild killing rampage in the US in 1969? BY By Deborah Cassrels. WC 1834 words PD 23 June 2001 SN Courier Mail SC COUMAI PG 30 LA English CY (c) 2001 Queensland Newspapers Pty Ltd LP Was Charles Manson temporarily insane when he led a wild killing rampage in the US in 1969? Clearly he was mad and bad. But would Queensland have placed him before its Mental Health Tribunal, found him of unsound mind at the time of his crimes, institutionalised him and treated his illness? WHY is Queensland the only jurisdiction in the Commonwealth with a Mental Health Tribunal which establishes if an accused is fit to face trial or of unsound mind at the time of an alleged offence? Why is mental incompetence not determined in an adversarial court by a jury? Under the Mental Health Act 1974, the tribunal, a statutory body operating since 1985, comprises three-yearly appointments of a Supreme Court judge and two assisting psychiatrists, whose advice does not have to be accepted. The judge alone constitutes the tribunal, an inquisitorial process conducted in the Supreme Court in Brisbane. TD Victims or family are not notified of hearings or allowed to submit victim impact statements. They are prohibited from talking to the media until 28 days after the decision. And when patients return to the community there is no requirement for neighbours or victims to be notified. Is this legislation enlightened or are we just suckers, falling for time and money-saving strategies? The tribunal has earned a reputation as progressive, humane and economical among some judges who have presided over it. The inaugural chair, former Supreme Court judge Angelo Vasta QC, thinks the tribunal system is enlightened and it saves an enormous amount of expenditure. He points to the humane side of treating the ill in a secure hospital rather than punishing them for offences but is uncomfortable with borderline cases. Whether people are mad or bad ought to be established by a very thorough investigation. The associated Patient Review Tribunals (of which there are five) consist of three to six members, including the chair who is a legal officer, a medical practitioner and a mental health professional. A psychiatrist is not required. The other three have no specific qualifications and can include former patients. The tribunals operate in closed hearings and patients of unsound mind or unfit for trial are reviewed every 12 months. Leave is granted either by the Mental Health Tribunal or the Patient Review Tribunal, which determine when a restricted patient is discharged into the community. Says the Director of Mental Health, Dr Peggy Brown: In the case of serious offences you can be assured the period of monitoring is quite
Re: [R] compile R with ACML support | RHEL 4
I doubt if /opt/acml3.5.0/gnu/lib is in your library path (it might be in your ldcache paths). So you need to set LD_LIBRARY_PATH or supply -L Look in config.log to find out what actually happened. BTW: this was more of an R-devel question than R-help. On Sat, 22 Jul 2006, Evan Cooch wrote: Greetings - I'm trying to compile R under GNU/Linux (RHEL 4) on a multi-Opteron box, with ACML support. First, I downloaded and installed ACML 3.5 - GNU version, although I'm not entirely sure what the differences are - from the AMD website. The ACML libraries were installed to /opt/acml3.5.0/ Second, I ran ./configure --with-blas='-lacml' The configure went fine, except that at the end of the output, it reports that readline is the only external library configured. External libraries: readline OK, so next, I try ./configure --with-lapack Same think - only readline is referenced. So, clearly, I'm missing a particular step. I'm guessing that I need to change some environment variable (or two), or tweak something at some other stage, to get R to properly reference the ACML libraries. I'm puzzled why --with-blas='-acml' doesn't do the trick? Suggestions? Pointers to the obvious mistake? Thanks... -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] diff, POSIXct, POSIXlt, POSIXt
Dear Listers, I have encountered a strange problem using diff() and POSIXt: dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006) dts - strptime(dts, %d/%m/%Y) class(dts) [1] POSIXt POSIXlt diff(dts) Time differences of 7862400, 7948800, 15811200, 7862400, 7948800, 15724800, 7862400, 7948800,0 secs In this case the result is not the one expected: expressed in seconds and not in days, and the difference between the two last dates is not 0. Now, if one use a vector of 9 dates only (whatever the date removed), things come well: diff(dts[-1]) Time differences of 92, 183, 91, 92, 182, 91, 92, 182 days Also if one contrains dts to POSIXct dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006) dts - as.POSIXct(strptime(dts, %d/%m/%Y)) diff(dts) Time differences of 91, 92, 183, 91, 92, 182, 91, 92, 182 days Any rational in that? Patrick __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to pass eval.max from lme() to nlminb?
Dear R community, I'm fitting a complex mixed-effects model that requires numerous iterations and function evaluations. I note that nlminb accepts a list of control parameters, including eval.max. Is there a way to change the default eval.max value for nlminb when it is being called from lme? Thanks for any thoughts, Andrew -- Andrew Robinson Department of Mathematics and StatisticsTel: +61-3-8344-9763 University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599 Email: [EMAIL PROTECTED] http://www.ms.unimelb.edu.au __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Why the contrain does not work for selecting a particular range of data?
Dear: Continuing the issue of 'ifelse'! I selecting the data whose 'x2'=1 for maximizing likelihood. I used two way to do this but the results are different. 1.Way one I use the data for x2=1 and run the program. It works for me. Tthe program is described as below: function (parameters,y1,x11) { p-parameters[1] alpha1-parameters[2] beta1-parameters[3] delta1-parameters[4] lamda1-parameters[5] mu-alpha1*((x11)^beta1)*exp(-delta1*(x11^lamda1)) ifelse(y10|x110, L-lgamma(y1+p)+p*(log(p)-log(mu+p))+y1*(log(mu)-log(mu+p))-lfactorial(y1)-lgamma(p) ,Inf) L } This is working for me. 2 Way two: I select the data whose x2=1 in the whole range of data. It works but it is not right comparing the value of MLE. the program is: function (parameters,y,x1,x2) { p-parameters[1] alpha1-parameters[2] beta1-parameters[3] delta1-parameters[4] alpha2-parameters[5] mu-alpha1*((x1)^beta1)*exp(-delta1*(x1^alpha2)) if(x10 x2==1) { L-lgamma(y+p)+p*(log(p)-log(mu+p))+y*(log(mu)-log(mu+p))-lfactorial(y)-lgamma(p) } L } The reason why I edit the program by the second way is I want to use one program for getting results of the different range of data. Anyone can help? Please! Thanks! Xin Shi My Estimation function for way two is : function (parameters, y, x1,x2) { nx1 - length(x1); nx2 - length(x2); ny - length(y); x1 - matrix(x1,nrow=nx1,ncol=1); x2 - matrix(x2,nrow=nx2,ncol=1); y - matrix(y,nrow=ny,ncol=1); ##Likelihood ##-- Lvec - matrix(0,nrow=nx1,ncol=1) for (i in 1:ny) { Lvec[i] - nb_L3(parameters, y[i],x1[i],x2[i]) LL - -sum(Lvec) } LL } [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to pass eval.max from lme() to nlminb?
G'day Andrew, AR == Andrew Robinson [EMAIL PROTECTED] writes: AR I'm fitting a complex mixed-effects model that requires AR numerous iterations and function evaluations. I note that AR nlminb accepts a list of control parameters, including AR eval.max. Is there a way to change the default eval.max value AR for nlminb when it is being called from lme? Looking at the code of lme.formula, I can only find this snippet: [...] optRes - if (controlvals$opt == nlminb) { nlminb(c(coef(lmeSt)), function(lmePars) -logLik(lmeSt, lmePars), control = list(iter.max = controlvals$msMaxIter, trace = controlvals$msVerbose)) } else { optim(c(coef(lmeSt)), function(lmePars) -logLik(lmeSt, lmePars), control = list(trace = controlvals$msVerbose, maxit = controlvals$msMaxIter, reltol = if (numIter == 0) controlvals$msTol else 100 * .Machine$double.eps), method = controlvals$optimMethod) } [...] this seems to indicate that you can only change the values for 'iter.max' and 'trace' in the call to 'nlminb()' by setting values for 'msMaxIter' and 'msVerbose', using 'lmeControl', when calling 'lme()'. Cheers, Berwin == Full address Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Mathematics and Statistics+61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009e-mail: [EMAIL PROTECTED] Australiahttp://www.maths.uwa.edu.au/~berwin __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] diff, POSIXct, POSIXlt, POSIXt
Try converting to POSIXct: str(dts) 'POSIXlt', format: chr [1:10] 2003-04-15 2003-07-15 2003-10-15 2004-04-15 2004-07-15 2004-10-15 2005-04-15 ... dts [1] 2003-04-15 2003-07-15 2003-10-15 2004-04-15 2004-07-15 2004-10-15 2005-04-15 2005-07-15 [9] 2005-10-15 2006-04-15 dts - as.POSIXct(dts) dts [1] 2003-04-15 EDT 2003-07-15 EDT 2003-10-15 EDT 2004-04-15 EDT 2004-07-15 EDT 2004-10-15 EDT [7] 2005-04-15 EDT 2005-07-15 EDT 2005-10-15 EDT 2006-04-15 EDT diff(dts) Time differences of 91, 92, 183, 91, 92, 182, 91, 92, 182 days On 7/23/06, Patrick Giraudoux [EMAIL PROTECTED] wrote: Dear Listers, I have encountered a strange problem using diff() and POSIXt: dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006) dts - strptime(dts, %d/%m/%Y) class(dts) [1] POSIXt POSIXlt diff(dts) Time differences of 7862400, 7948800, 15811200, 7862400, 7948800, 15724800, 7862400, 7948800,0 secs In this case the result is not the one expected: expressed in seconds and not in days, and the difference between the two last dates is not 0. Now, if one use a vector of 9 dates only (whatever the date removed), things come well: diff(dts[-1]) Time differences of 92, 183, 91, 92, 182, 91, 92, 182 days Also if one contrains dts to POSIXct dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006) dts - as.POSIXct(strptime(dts, %d/%m/%Y)) diff(dts) Time differences of 91, 92, 183, 91, 92, 182, 91, 92, 182 days Any rational in that? Patrick __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] test
just a test -- Lep pozdrav / With regards, Gregor Gorjanc -- University of Ljubljana PhD student Biotechnical Faculty Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan Groblje 3 mail: gregor.gorjanc at bfro.uni-lj.si SI-1230 Domzale tel: +386 (0)1 72 17 861 Slovenia, Europefax: +386 (0)1 72 17 888 -- One must learn by doing the thing; for though you think you know it, you have no certainty until you try. Sophocles ~ 450 B.C. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Warning Messages using rq -quantile regressions
On Jul 23, 2006, at 5:27 AM, roger koenker wrote: When computing the median from a sample with an even number of distinct values there is inherently some ambiguity about its value: any value between the middle order statistics is a median. Similarly, in regression settings the optimization problem solved by the br version of the simplex algorithm, modified to do general quantile regression identifies cases where there may be non uniqueness of this type. When there are continuous covariates this is quite rare, when covariates are discrete then it is relatively common, at least when tau is chosen from the rationals. For univariate quantiles R provides several methods of resolving this sort of ambiguity by interpolation, br doesn't try to do this, instead returning the first vertex solution that it comes to. Should we worry about this? My answer would be no. Viewed from an asymptotic perspective any choice of a unique value among the multiple solutions is a 1/n perturbation -- with 2500 observations this is unlikely to be interesting. More to the point, inference about the coefficients of the model, which provides O(1/sqrt(n)) intervals is perfectly capable of assessing the meaningful uncertainty about these values. Finally, if you would prefer an estimation procedure that produced unique values more like the interpolation procedures in the univariate setting, you could try the fn option for the algorithm. Interior point methods for solving linear programming problems have the feature that they tend to converge to the centroid of solutions sets when such sets exist. This approach provides a means to assess the magnitude of the non-uniqueness in a particular application. I hope that this helps, url:www.econ.uiuc.edu/~rogerRoger Koenker email [EMAIL PROTECTED] Department of Economics vox:217-333-4558University of Illinois fax:217-244-6678Champaign, IL 61820 On Jul 22, 2006, at 9:07 PM, Neil KM wrote: I am a new to using quantile regressions in R. I have estimated a set of coefficients using the method=br algorithm with the rq command at various quantiles along the entire distribution. My data set contains approximately 2,500 observations and I have 7 predictor variables. I receive the following warning message: Solution may be nonunique in: rq.fit.br(x, y, tau = tau, ...) There are 13 warnings of this type after I run a single model. My results are similiar to the results I received in other stat programs using quantile reg procedures. I am unclear what these warning messages imply and if there are problems with model fit/convergence that I may need to consider. Any help would be appreciated. Thanks! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting- guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] diff, POSIXct, POSIXlt, POSIXt
Try converting to POSIXct: That's what I did finally (see the previous e-mail). dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006) dts - as.POSIXct(strptime(dts, %d/%m/%Y)) diff(dts) Time differences of 91, 92, 183, 91, 92, 182, 91, 92, 182 days What is the problem you are trying to solve? Actually, I don't understand why using diff() and POSIXct provides the expected result and not using POSIXlt. Both POSIXct and POSIXlt are of class POSIXt. The doc of diff() stresses that 'diff' is a generic function with a default method and ones for classes 'ts', 'POSIXt' and 'Date'. It does not mention differences between POSIXct and POSIXlt. Moreover, using diff() with POSIXlt has provided (wrong) numbers... and not an error. This may be difficult to detect sometimes along programme lines. Must one keep in mind that diff() is reliably applicable only on POSIXct? In this case, should not it bve mentionned in the documentation? All the best, Patrick jim holtman a écrit : Try converting to POSIXct: str(dts) 'POSIXlt', format: chr [1:10] 2003-04-15 2003-07-15 2003-10-15 2004-04-15 2004-07-15 2004-10-15 2005-04-15 ... dts [1] 2003-04-15 2003-07-15 2003-10-15 2004-04-15 2004-07-15 2004-10-15 2005-04-15 2005-07-15 [9] 2005-10-15 2006-04-15 dts - as.POSIXct(dts) dts [1] 2003-04-15 EDT 2003-07-15 EDT 2003-10-15 EDT 2004-04-15 EDT 2004-07-15 EDT 2004-10-15 EDT [7] 2005-04-15 EDT 2005-07-15 EDT 2005-10-15 EDT 2006-04-15 EDT diff(dts) Time differences of 91, 92, 183, 91, 92, 182, 91, 92, 182 days On 7/23/06, *Patrick Giraudoux* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Dear Listers, I have encountered a strange problem using diff() and POSIXt: dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006) dts - strptime(dts, %d/%m/%Y) class(dts) [1] POSIXt POSIXlt diff(dts) Time differences of 7862400, 7948800, 15811200, 7862400, 7948800, 15724800, 7862400, 7948800,0 secs In this case the result is not the one expected: expressed in seconds and not in days, and the difference between the two last dates is not 0. Now, if one use a vector of 9 dates only (whatever the date removed), things come well: diff(dts[-1]) Time differences of 92, 183, 91, 92, 182, 91, 92, 182 days Also if one contrains dts to POSIXct dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006) dts - as.POSIXct(strptime(dts, %d/%m/%Y)) diff(dts) Time differences of 91, 92, 183, 91, 92, 182, 91, 92, 182 days Any rational in that? Patrick __ R-help@stat.math.ethz.ch mailto:R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why the contrain does not work for selecting a particular range of data?
On 7/23/2006 4:07 AM, Xin wrote: Dear: Continuing the issue of 'ifelse'! I selecting the data whose 'x2'=1 for maximizing likelihood. I used two way to do this but the results are different. In the first case you used ifelse(), in the second you used if(). They behave differently: ifelse() evaluates all tests in a vector, if() only evaluates one. You probably want ifelse() in both cases. Duncan Murdoch 1.Way one I use the data for x2=1 and run the program. It works for me. Tthe program is described as below: function (parameters,y1,x11) { p-parameters[1] alpha1-parameters[2] beta1-parameters[3] delta1-parameters[4] lamda1-parameters[5] mu-alpha1*((x11)^beta1)*exp(-delta1*(x11^lamda1)) ifelse(y10|x110, L-lgamma(y1+p)+p*(log(p)-log(mu+p))+y1*(log(mu)-log(mu+p))-lfactorial(y1)-lgamma(p) ,Inf) L } This is working for me. 2 Way two: I select the data whose x2=1 in the whole range of data. It works but it is not right comparing the value of MLE. the program is: function (parameters,y,x1,x2) { p-parameters[1] alpha1-parameters[2] beta1-parameters[3] delta1-parameters[4] alpha2-parameters[5] mu-alpha1*((x1)^beta1)*exp(-delta1*(x1^alpha2)) if(x10 x2==1) { L-lgamma(y+p)+p*(log(p)-log(mu+p))+y*(log(mu)-log(mu+p))-lfactorial(y)-lgamma(p) } L } The reason why I edit the program by the second way is I want to use one program for getting results of the different range of data. Anyone can help? Please! Thanks! Xin Shi My Estimation function for way two is : function (parameters, y, x1,x2) { nx1 - length(x1); nx2 - length(x2); ny - length(y); x1 - matrix(x1,nrow=nx1,ncol=1); x2 - matrix(x2,nrow=nx2,ncol=1); y - matrix(y,nrow=ny,ncol=1); ##Likelihood ##-- Lvec - matrix(0,nrow=nx1,ncol=1) for (i in 1:ny) { Lvec[i] - nb_L3(parameters, y[i],x1[i],x2[i]) LL - -sum(Lvec) } LL } [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] diff, POSIXct, POSIXlt, POSIXt
Moving this to r-devel. Looking at the diff.POSIXt code we see the problem is that it takes the length of the input using length which is wrong since in the case of POSIXlt the length is always 9 (or maybe length should be defined differently for POSIXlt?). Try this which gives the same problem: dts[-1] - dts[-length(dts)] We get a more sensible answer if length is calculated correctly: dts[-1] - dts[-length(dts[[1]])] On 7/23/06, Patrick Giraudoux [EMAIL PROTECTED] wrote: Try converting to POSIXct: That's what I did finally (see the previous e-mail). dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006) dts - as.POSIXct(strptime(dts, %d/%m/%Y)) diff(dts) Time differences of 91, 92, 183, 91, 92, 182, 91, 92, 182 days What is the problem you are trying to solve? Actually, I don't understand why using diff() and POSIXct provides the expected result and not using POSIXlt. Both POSIXct and POSIXlt are of class POSIXt. The doc of diff() stresses that 'diff' is a generic function with a default method and ones for classes 'ts', 'POSIXt' and 'Date'. It does not mention differences between POSIXct and POSIXlt. Moreover, using diff() with POSIXlt has provided (wrong) numbers... and not an error. This may be difficult to detect sometimes along programme lines. Must one keep in mind that diff() is reliably applicable only on POSIXct? In this case, should not it bve mentionned in the documentation? All the best, Patrick jim holtman a écrit : Try converting to POSIXct: str(dts) 'POSIXlt', format: chr [1:10] 2003-04-15 2003-07-15 2003-10-15 2004-04-15 2004-07-15 2004-10-15 2005-04-15 ... dts [1] 2003-04-15 2003-07-15 2003-10-15 2004-04-15 2004-07-15 2004-10-15 2005-04-15 2005-07-15 [9] 2005-10-15 2006-04-15 dts - as.POSIXct(dts) dts [1] 2003-04-15 EDT 2003-07-15 EDT 2003-10-15 EDT 2004-04-15 EDT 2004-07-15 EDT 2004-10-15 EDT [7] 2005-04-15 EDT 2005-07-15 EDT 2005-10-15 EDT 2006-04-15 EDT diff(dts) Time differences of 91, 92, 183, 91, 92, 182, 91, 92, 182 days On 7/23/06, *Patrick Giraudoux* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Dear Listers, I have encountered a strange problem using diff() and POSIXt: dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006) dts - strptime(dts, %d/%m/%Y) class(dts) [1] POSIXt POSIXlt diff(dts) Time differences of 7862400, 7948800, 15811200, 7862400, 7948800, 15724800, 7862400, 7948800,0 secs In this case the result is not the one expected: expressed in seconds and not in days, and the difference between the two last dates is not 0. Now, if one use a vector of 9 dates only (whatever the date removed), things come well: diff(dts[-1]) Time differences of 92, 183, 91, 92, 182, 91, 92, 182 days Also if one contrains dts to POSIXct dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006) dts - as.POSIXct(strptime(dts, %d/%m/%Y)) diff(dts) Time differences of 91, 92, 183, 91, 92, 182, 91, 92, 182 days Any rational in that? Patrick __ R-help@stat.math.ethz.ch mailto:R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Is there anywhere recycle()?
Hello! I am writting a function, which should recycle one of its arguments if length of the argument is approprate i.e. something like foo - function(x, a) { n - length(x) if(length(a) n) { # recycle a oldA - a a - vector(length=n) a[1:n] - oldA } ## ... return(a) } foo(c(1, 2), a=c(1, 2)) foo(c(1, 2), a=c(1)) I am now wondering if there is any general/generic functions for such task. Thanks! -- Lep pozdrav / With regards, Gregor Gorjanc -- University of Ljubljana PhD student Biotechnical Faculty Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan Groblje 3 mail: gregor.gorjanc at bfro.uni-lj.si SI-1230 Domzale tel: +386 (0)1 72 17 861 Slovenia, Europefax: +386 (0)1 72 17 888 -- One must learn by doing the thing; for though you think you know it, you have no certainty until you try. Sophocles ~ 450 B.C. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there anywhere recycle()?
Try: foo2 - function(x, a) cbind(x,a)[,2] On 7/23/06, Gregor Gorjanc [EMAIL PROTECTED] wrote: Hello! I am writting a function, which should recycle one of its arguments if length of the argument is approprate i.e. something like foo - function(x, a) { n - length(x) if(length(a) n) { # recycle a oldA - a a - vector(length=n) a[1:n] - oldA } ## ... return(a) } foo(c(1, 2), a=c(1, 2)) foo(c(1, 2), a=c(1)) I am now wondering if there is any general/generic functions for such task. Thanks! -- Lep pozdrav / With regards, Gregor Gorjanc -- University of Ljubljana PhD student Biotechnical Faculty Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan Groblje 3 mail: gregor.gorjanc at bfro.uni-lj.si SI-1230 Domzale tel: +386 (0)1 72 17 861 Slovenia, Europefax: +386 (0)1 72 17 888 -- One must learn by doing the thing; for though you think you know it, you have no certainty until you try. Sophocles ~ 450 B.C. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there anywhere recycle()?
Hi, Gabor Grothendieck wrote: Try: foo2 - function(x, a) cbind(x,a)[,2] thank you for this. It does work to some extent, but not much better than mine foo. foo2(c(1, 2, 3), a=1) [1] 1 1 1 18:14:08 R foo2(c(1, 2, 3), a=c(1,2,3,4)) [1] 1 2 3 4 Warning message: number of rows of result is not a multiple of vector length (arg 1) in: cbind(1, x, a) 18:14:13 R foo2(c(1, 2, 3), a=c(1,2,3)) [1] 1 2 3 18:14:18 R foo2(c(1, 2, 3), a=c(1,2)) [1] 1 2 1 Warning message: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, x, a) On 7/23/06, Gregor Gorjanc [EMAIL PROTECTED] wrote: Hello! I am writting a function, which should recycle one of its arguments if length of the argument is approprate i.e. something like foo - function(x, a) { n - length(x) if(length(a) n) { # recycle a oldA - a a - vector(length=n) a[1:n] - oldA } ## ... return(a) } foo(c(1, 2), a=c(1, 2)) foo(c(1, 2), a=c(1)) I am now wondering if there is any general/generic functions for such task. Thanks! -- Lep pozdrav / With regards, Gregor Gorjanc -- University of Ljubljana PhD student Biotechnical Faculty Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan Groblje 3 mail: gregor.gorjanc at bfro.uni-lj.si SI-1230 Domzale tel: +386 (0)1 72 17 861 Slovenia, Europefax: +386 (0)1 72 17 888 -- One must learn by doing the thing; for though you think you know it, you have no certainty until you try. Sophocles ~ 450 B.C. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Lep pozdrav / With regards, Gregor Gorjanc -- University of Ljubljana PhD student Biotechnical Faculty Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan Groblje 3 mail: gregor.gorjanc at bfro.uni-lj.si SI-1230 Domzale tel: +386 (0)1 72 17 861 Slovenia, Europefax: +386 (0)1 72 17 888 -- One must learn by doing the thing; for though you think you know it, you have no certainty until you try. Sophocles ~ 450 B.C. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Is there anywhere recycle()?
Here is another possibility: rep(a, length = length(x)) On 7/23/06, Gregor Gorjanc [EMAIL PROTECTED] wrote: Hi, Gabor Grothendieck wrote: Try: foo2 - function(x, a) cbind(x,a)[,2] thank you for this. It does work to some extent, but not much better than mine foo. foo2(c(1, 2, 3), a=1) [1] 1 1 1 18:14:08 R foo2(c(1, 2, 3), a=c(1,2,3,4)) [1] 1 2 3 4 Warning message: number of rows of result is not a multiple of vector length (arg 1) in: cbind(1, x, a) 18:14:13 R foo2(c(1, 2, 3), a=c(1,2,3)) [1] 1 2 3 18:14:18 R foo2(c(1, 2, 3), a=c(1,2)) [1] 1 2 1 Warning message: number of rows of result is not a multiple of vector length (arg 2) in: cbind(1, x, a) On 7/23/06, Gregor Gorjanc [EMAIL PROTECTED] wrote: Hello! I am writting a function, which should recycle one of its arguments if length of the argument is approprate i.e. something like foo - function(x, a) { n - length(x) if(length(a) n) { # recycle a oldA - a a - vector(length=n) a[1:n] - oldA } ## ... return(a) } foo(c(1, 2), a=c(1, 2)) foo(c(1, 2), a=c(1)) I am now wondering if there is any general/generic functions for such task. Thanks! -- Lep pozdrav / With regards, Gregor Gorjanc -- University of Ljubljana PhD student Biotechnical Faculty Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan Groblje 3 mail: gregor.gorjanc at bfro.uni-lj.si SI-1230 Domzale tel: +386 (0)1 72 17 861 Slovenia, Europefax: +386 (0)1 72 17 888 -- One must learn by doing the thing; for though you think you know it, you have no certainty until you try. Sophocles ~ 450 B.C. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Lep pozdrav / With regards, Gregor Gorjanc -- University of Ljubljana PhD student Biotechnical Faculty Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan Groblje 3 mail: gregor.gorjanc at bfro.uni-lj.si SI-1230 Domzale tel: +386 (0)1 72 17 861 Slovenia, Europefax: +386 (0)1 72 17 888 -- One must learn by doing the thing; for though you think you know it, you have no certainty until you try. Sophocles ~ 450 B.C. -- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Iterated Data Input/Output with Random Forests
Hi, I am currently writing code to input a few thousand files, run them through the Random Forests package, and then output corresponding results. When I use the code below: zz-textConnection(ex.lm.out, w) sink(zz) tempData-read.delim(paste(allSnps,1,Phenotype.phn,sep=),header=TRUE,sep=,,quote=\,dec=.) tempData[[1]]-factor(tempData[[1]]) tempData.rf-randomForest(tempData[[1]]~.,data=tempData,importance=TRUE,proximity=TRUE,outscale=TRUE,replace=TRUE) tempData.rf zz-file(paste(ex,1,.data,sep=), w) cat(ex.lm.out, sep=\n, file=zz) sink() close(zz) I am able to successfully input and output for one file. However, if I try to use a for loop or a while statement e.g. for(i in 1:2) { zz-textConnection(ex.lm.out, w) sink(zz) tempData-read.delim(paste(allSnps,i,Phenotype.phn,sep=),header=TRUE,sep=,,quote=\,dec=.) tempData[[1]]-factor(tempData[[1]]) tempData.rf-randomForest(tempData[[1]]~.,data=tempData,importance=TRUE,proximity=TRUE,outscale=TRUE,replace=TRUE) tempData.rf zz-file(paste(ex,i,.data,sep=), w) cat(ex.lm.out, sep=\n, file=zz) sink() close(zz) } I get no error statements but the output is blank. Without the for statement, setting i-1 works fine. One other related question is that right now I am trying to gett the loop to work by using the paste() function with a variable (i). However, the paste function returns a string. If I wanted to make a loop of tempData$pheno1 tempData$pheno2 tempData$pheno3 ... the paste() function will not work. Is there some other method to achieve the desired effect? Thank you in advance! I have only been working with R for a few days so please bear with my lack of knowledge! John Zhou __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] constructing a dataframe from a database of newspaper articles
I was going to suggest the you use PERL, but here is my attempt at keeping it in R. This reads in each line, tried to determine if it has one of the 'separator' words at the beginning of the line and then constructs the output. mySeps - c(HD, BY, WC, PD, SC, PG, LP, TD , SN, LA, CY) # section separators myInc - 0 # record number foundHD - FALSE myFile - file('c:/datafile.txt', 'r') myRec - list() # contains the data from each record myOutput - list() # list with each record while(length(x - readLines(myFile, n=1)) 0){ first - gsub(^\\s*(\\w file://w/+).*, \\1 file://0.0.0.1/, x) # get the first word if (!foundHD){ # skip till HD found (assumes this is the start of the article if (first == HD) foundHD - TRUE else next } if (first == NS){ # skip to next HD (assumes ND ignores the rest foundHD - FALSE myOutput[[myInc - myInc + 1]] - myRec myRec - list() next } if (first %in% mySeps){ myKey - first # use at key to myRec x - sub(first, '', x) } myRec[[myKey]] - paste(myRec[[myKey]], x) # collect data from each mySep } # convert the list to 'long' dataframe for reshape myResult - NULL for (i in 1:length(myOutput)){ .x - cbind(i, names(myOutput[[i]]), unlist(myOutput[[i]][names(myOutput[[i]])])) myResult - rbind(myResult, .x) } myDF - as.data.frame(myResult) myWide - reshape(myDF, timevar=V2, idvar='i', direction='wide') On 7/23/06, Bob Green [EMAIL PROTECTED] wrote: I am hoping for some assistance with formatting a large text file which consists of a series of individual records. Each record includes specific labels/field names (a sample of 1 record (one of the longest ones) is below - at end of post. What I want to do is reformat the data, so that each individual record becomes a row (some cells will have a lot of text). For example, the column variables I want are (a) HD in one column (b)BY in one column (c) WC data in one column, (d) PD data in one column, (e) SC data in one column (f) PG data in one column g) LP and TD text in one column - this column can contain quite a lot of text, e.g1900 words. The other fields are unwanted If there were 150 individual records, when formatted this would be a 7 column by 150 row dataset. I was advised to: 1. read in the file using readLines giving a character vector one element per input line. 2. convert that to lines of the form: id op text where each such line is a field and multiline fields have been collapsed into a single line of text. This step involves detailed processing and you could do it in a loop or you could try a vectorized approach. A vectorized approach will likely involve using 3. the lines created above could be converted to a data frame with three columns and 4. reshape used to create a wide data frame. 5. then write it out using write.csv. I have got as far as being able to read the text into R - I am unsure if the warning is a problem. I am however, not at all sure what I need to do next. Any assistance is much appreciated, Bob (A) syntax mht - scan(what=c:\\cm-mht1.txt). readLines(c:\\cm-mht1.txt,n = -1) [8376] (c) 2006 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All [8377] rights reserved. Warning message: incomplete final line found by readLines on 'c:\cm-mht1.txt' (B) sample data HD Was Charles Manson temporarily insane when he led a wild killing rampage in the US in 1969? BY By Deborah Cassrels. WC 1834 words PD 23 June 2001 SN Courier Mail SC COUMAI PG 30 LA English CY (c) 2001 Queensland Newspapers Pty Ltd LP Was Charles Manson temporarily insane when he led a wild killing rampage in the US in 1969? Clearly he was mad and bad. But would Queensland have placed him before its Mental Health Tribunal, found him of unsound mind at the time of his crimes, institutionalised him and treated his illness? WHY is Queensland the only jurisdiction in the Commonwealth with a Mental Health Tribunal which establishes if an accused is fit to face trial or of unsound mind at the time of an alleged offence? Why is mental incompetence not determined in an adversarial court by a jury? Under the Mental Health Act 1974, the tribunal, a statutory body operating since 1985, comprises three-yearly appointments of a Supreme Court judge and two assisting psychiatrists, whose advice does not have to be accepted. The judge alone constitutes the tribunal, an inquisitorial process conducted in the Supreme Court in Brisbane. TD Victims or family are not notified of hearings or allowed to submit victim impact statements. They are prohibited from talking to the media until 28 days after the decision. And when patients return to the community there is no
Re: [R] Iterated Data Input/Output with Random Forests
For your last question of the 'paste', try tempdata[paste('pheno', i, sep='')] On 7/23/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hi, I am currently writing code to input a few thousand files, run them through the Random Forests package, and then output corresponding results. When I use the code below: zz-textConnection(ex.lm.out, w) sink(zz) tempData-read.delim(paste(allSnps,1,Phenotype.phn ,sep=),header=TRUE,sep=,,quote=\,dec=.) tempData[[1]]-factor(tempData[[1]]) tempData.rf -randomForest(tempData[[1]]~.,data=tempData,importance=TRUE,proximity=TRUE,outscale=TRUE,replace=TRUE) tempData.rf zz-file(paste(ex,1,.data,sep=), w) cat(ex.lm.out, sep=\n, file=zz) sink() close(zz) I am able to successfully input and output for one file. However, if I try to use a for loop or a while statement e.g. for(i in 1:2) { zz-textConnection(ex.lm.out, w) sink(zz) tempData-read.delim(paste(allSnps,i,Phenotype.phn ,sep=),header=TRUE,sep=,,quote=\,dec=.) tempData[[1]]-factor(tempData[[1]]) tempData.rf -randomForest(tempData[[1]]~.,data=tempData,importance=TRUE,proximity=TRUE,outscale=TRUE,replace=TRUE) tempData.rf zz-file(paste(ex,i,.data,sep=), w) cat(ex.lm.out, sep=\n, file=zz) sink() close(zz) } I get no error statements but the output is blank. Without the for statement, setting i-1 works fine. One other related question is that right now I am trying to gett the loop to work by using the paste() function with a variable (i). However, the paste function returns a string. If I wanted to make a loop of tempData$pheno1 tempData$pheno2 tempData$pheno3 ... the paste() function will not work. Is there some other method to achieve the desired effect? Thank you in advance! I have only been working with R for a few days so please bear with my lack of knowledge! John Zhou __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Iterated Data Input/Output with Random Forests
While tempData[paste('pheno',i,sep='')] does give the appropriate column, when I try to use that expression in the factor function: factor(tempData[paste('pheno',i,sep='')]) I get Error in sort(unique.default(x), na.last=TRUE : 'x' must be atomic. Quoting jim holtman [EMAIL PROTECTED]: For your last question of the 'paste', try tempdata[paste('pheno', i, sep='')] On 7/23/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Hi, I am currently writing code to input a few thousand files, run them through the Random Forests package, and then output corresponding results. When I use the code below: zz-textConnection(ex.lm.out, w) sink(zz) tempData-read.delim(paste(allSnps,1,Phenotype.phn ,sep=),header=TRUE,sep=,,quote=\,dec=.) tempData[[1]]-factor(tempData[[1]]) tempData.rf -randomForest(tempData[[1]]~.,data=tempData,importance=TRUE,proximity=TRUE,outscale=TRUE,replace=TRUE) tempData.rf zz-file(paste(ex,1,.data,sep=), w) cat(ex.lm.out, sep=\n, file=zz) sink() close(zz) I am able to successfully input and output for one file. However, if I try to use a for loop or a while statement e.g. for(i in 1:2) { zz-textConnection(ex.lm.out, w) sink(zz) tempData-read.delim(paste(allSnps,i,Phenotype.phn ,sep=),header=TRUE,sep=,,quote=\,dec=.) tempData[[1]]-factor(tempData[[1]]) tempData.rf -randomForest(tempData[[1]]~.,data=tempData,importance=TRUE,proximity=TRUE,outscale=TRUE,replace=TRUE) tempData.rf zz-file(paste(ex,i,.data,sep=), w) cat(ex.lm.out, sep=\n, file=zz) sink() close(zz) } I get no error statements but the output is blank. Without the for statement, setting i-1 works fine. One other related question is that right now I am trying to gett the loop to work by using the paste() function with a variable (i). However, the paste function returns a string. If I wanted to make a loop of tempData$pheno1 tempData$pheno2 tempData$pheno3 ... the paste() function will not work. Is there some other method to achieve the desired effect? Thank you in advance! I have only been working with R for a few days so please bear with my lack of knowledge! John Zhou __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Matthew Nash has left the SGDP
I will be out of the office starting Wed 02/01/2006 and will not return until Sat 02/07/2060. I have left the SGDP. I am contactable at [EMAIL PROTECTED] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] constructing a dataframe from a database of newspaper articles
Gabor indicated that a line was corrupted. Here is what it should be: Here is the line again: first - gsub(^\\s*(\\w+).*, \\1, x) # get the first word It as suppose to be '\\w+' and '\\1' For some reason, my browser must have substituted the extreneous references. On 7/23/06, Bob Green [EMAIL PROTECTED] wrote: I am hoping for some assistance with formatting a large text file which consists of a series of individual records. Each record includes specific labels/field names (a sample of 1 record (one of the longest ones) is below - at end of post. What I want to do is reformat the data, so that each individual record becomes a row (some cells will have a lot of text). For example, the column variables I want are (a) HD in one column (b)BY in one column (c) WC data in one column, (d) PD data in one column, (e) SC data in one column (f) PG data in one column g) LP and TD text in one column - this column can contain quite a lot of text, e.g1900 words. The other fields are unwanted If there were 150 individual records, when formatted this would be a 7 column by 150 row dataset. I was advised to: 1. read in the file using readLines giving a character vector one element per input line. 2. convert that to lines of the form: id op text where each such line is a field and multiline fields have been collapsed into a single line of text. This step involves detailed processing and you could do it in a loop or you could try a vectorized approach. A vectorized approach will likely involve using 3. the lines created above could be converted to a data frame with three columns and 4. reshape used to create a wide data frame. 5. then write it out using write.csv. I have got as far as being able to read the text into R - I am unsure if the warning is a problem. I am however, not at all sure what I need to do next. Any assistance is much appreciated, Bob (A) syntax mht - scan(what=c:\\cm-mht1.txt). readLines(c:\\cm-mht1.txt,n = -1) [8376] (c) 2006 Dow Jones Reuters Business Interactive LLC (trading as Factiva). All [8377] rights reserved. Warning message: incomplete final line found by readLines on 'c:\cm-mht1.txt' (B) sample data HD Was Charles Manson temporarily insane when he led a wild killing rampage in the US in 1969? BY By Deborah Cassrels. WC 1834 words PD 23 June 2001 SN Courier Mail SC COUMAI PG 30 LA English CY (c) 2001 Queensland Newspapers Pty Ltd LP Was Charles Manson temporarily insane when he led a wild killing rampage in the US in 1969? Clearly he was mad and bad. But would Queensland have placed him before its Mental Health Tribunal, found him of unsound mind at the time of his crimes, institutionalised him and treated his illness? WHY is Queensland the only jurisdiction in the Commonwealth with a Mental Health Tribunal which establishes if an accused is fit to face trial or of unsound mind at the time of an alleged offence? Why is mental incompetence not determined in an adversarial court by a jury? Under the Mental Health Act 1974, the tribunal, a statutory body operating since 1985, comprises three-yearly appointments of a Supreme Court judge and two assisting psychiatrists, whose advice does not have to be accepted. The judge alone constitutes the tribunal, an inquisitorial process conducted in the Supreme Court in Brisbane. TD Victims or family are not notified of hearings or allowed to submit victim impact statements. They are prohibited from talking to the media until 28 days after the decision. And when patients return to the community there is no requirement for neighbours or victims to be notified. Is this legislation enlightened or are we just suckers, falling for time and money-saving strategies? The tribunal has earned a reputation as progressive, humane and economical among some judges who have presided over it. The inaugural chair, former Supreme Court judge Angelo Vasta QC, thinks the tribunal system is enlightened and it saves an enormous amount of expenditure. He points to the humane side of treating the ill in a secure hospital rather than punishing them for offences but is uncomfortable with borderline cases. Whether people are mad or bad ought to be established by a very thorough investigation. The associated Patient Review Tribunals (of which there are five) consist of three to six members, including the chair who is a legal officer, a medical practitioner and a mental health professional. A psychiatrist is not required. The other three have no specific qualifications and can include former patients. The tribunals operate in closed hearings and patients of unsound mind or unfit for trial are
Re: [R] diff, POSIXct, POSIXlt, POSIXt
Hi, Gabor: For my 0.02 euros, I vote to make length(POSIXlt) = length of the series, NOT the length of the list = 9 always. I've stubbed my toe on that one many times. I always fix it by converting first to POSIXct. The key question is what would users naively expect to get from length(a_time_series)? I think most people not familiar with the POSIXlt format would expect the number of observations. After struggling for a while with code that did not perform as expected, I finally traced one such problem to the fact that length(a_time_series)= 9 if class(a_time_series) = POSIXlt, independent of the number of observations. How much code would break if this was changed? Each use of length(POSIXlt_object) would have to be replaced by something like length(as.list(POSIXlt_object)). However, since length(POSIXlt_object) is always 9, I doubt if length(POSIXlt_object) occurs very often. Currently to get the number of observations in a POSIXlt_object, you might find constructs like length(POSIXlt_object[[1]]). Or you will find people converting the POSIXlt to POSIXct and then computing the length. In either case, changing length(POSIXlt_object) to the number of observations would not break any of this code. Thanks for raising this question. Spencer Graves Gabor Grothendieck wrote: Moving this to r-devel. Looking at the diff.POSIXt code we see the problem is that it takes the length of the input using length which is wrong since in the case of POSIXlt the length is always 9 (or maybe length should be defined differently for POSIXlt?). Try this which gives the same problem: dts[-1] - dts[-length(dts)] We get a more sensible answer if length is calculated correctly: dts[-1] - dts[-length(dts[[1]])] On 7/23/06, Patrick Giraudoux [EMAIL PROTECTED] wrote: Try converting to POSIXct: That's what I did finally (see the previous e-mail). dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006) dts - as.POSIXct(strptime(dts, %d/%m/%Y)) diff(dts) Time differences of 91, 92, 183, 91, 92, 182, 91, 92, 182 days What is the problem you are trying to solve? Actually, I don't understand why using diff() and POSIXct provides the expected result and not using POSIXlt. Both POSIXct and POSIXlt are of class POSIXt. The doc of diff() stresses that 'diff' is a generic function with a default method and ones for classes 'ts', 'POSIXt' and 'Date'. It does not mention differences between POSIXct and POSIXlt. Moreover, using diff() with POSIXlt has provided (wrong) numbers... and not an error. This may be difficult to detect sometimes along programme lines. Must one keep in mind that diff() is reliably applicable only on POSIXct? In this case, should not it bve mentionned in the documentation? All the best, Patrick jim holtman a écrit : Try converting to POSIXct: str(dts) 'POSIXlt', format: chr [1:10] 2003-04-15 2003-07-15 2003-10-15 2004-04-15 2004-07-15 2004-10-15 2005-04-15 ... dts [1] 2003-04-15 2003-07-15 2003-10-15 2004-04-15 2004-07-15 2004-10-15 2005-04-15 2005-07-15 [9] 2005-10-15 2006-04-15 dts - as.POSIXct(dts) dts [1] 2003-04-15 EDT 2003-07-15 EDT 2003-10-15 EDT 2004-04-15 EDT 2004-07-15 EDT 2004-10-15 EDT [7] 2005-04-15 EDT 2005-07-15 EDT 2005-10-15 EDT 2006-04-15 EDT diff(dts) Time differences of 91, 92, 183, 91, 92, 182, 91, 92, 182 days On 7/23/06, *Patrick Giraudoux* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Dear Listers, I have encountered a strange problem using diff() and POSIXt: dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006) dts - strptime(dts, %d/%m/%Y) class(dts) [1] POSIXt POSIXlt diff(dts) Time differences of 7862400, 7948800, 15811200, 7862400, 7948800, 15724800, 7862400, 7948800,0 secs In this case the result is not the one expected: expressed in seconds and not in days, and the difference between the two last dates is not 0. Now, if one use a vector of 9 dates only (whatever the date removed), things come well: diff(dts[-1]) Time differences of 92, 183, 91, 92, 182, 91, 92, 182 days Also if one contrains dts to POSIXct dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006) dts - as.POSIXct(strptime(dts, %d/%m/%Y)) diff(dts) Time differences of 91, 92, 183, 91, 92, 182, 91, 92, 182 days Any rational in that? Patrick __ R-help@stat.math.ethz.ch mailto:R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and
Re: [R] dotchart with log scale?
Dear all, I would like to draw a dot chart on a log scale. What is the syntax for this? A barchart may use log=x, but trying this with dotchart() leads to an error message. ok, I extended dotchart() with a log option. The main reason I am sticking to dotchart() is adjusting of axes with respect to long text labels. This is the definition I am using now: mydotchart - function (x, labels = NULL, groups = NULL, gdata = NULL, cex = par(cex), pch = 21, gpch = 21, bg = par(bg), color = par(fg), gcolor = par(fg), lcolor = gray, xlim = range(x[is.finite(x)]), main = NULL, ### new option xlab = NULL, ylab = NULL, log=FALSE, ...) { opar - par(mai, cex, yaxs) on.exit(par(opar)) par(cex = cex, yaxs = i) n - length(x) if (is.matrix(x)) { if (is.null(labels)) labels - rownames(x) if (is.null(labels)) labels - as.character(1:nrow(x)) labels - rep(labels, length.out = n) if (is.null(groups)) groups - col(x, as.factor = TRUE) glabels - levels(groups) } else { if (is.null(labels)) labels - names(x) glabels - if (!is.null(groups)) levels(groups) } plot.new() linch - if (!is.null(labels)) max(strwidth(labels, inch), na.rm = TRUE) else 0 if (is.null(glabels)) { ginch - 0 goffset - 0 } else { ginch - max(strwidth(glabels, inch), na.rm = TRUE) goffset - 0.4 } if (!(is.null(labels) is.null(glabels))) { nmai - par(mai) nmai[2] - nmai[4] + max(linch + goffset, ginch) + 0.1 par(mai = nmai) } if (is.null(groups)) { o - 1:n y - o ylim - c(0, n + 1) } else { o - sort.list(as.numeric(groups), decreasing = TRUE) x - x[o] groups - groups[o] color - rep(color, length.out = length(groups))[o] lcolor - rep(lcolor, length.out = length(groups))[o] offset - cumsum(c(0, diff(as.numeric(groups)) != 0)) y - 1:n + 2 * offset ylim - range(0, y + 2) } ### instead of log= plot.window(xlim = xlim, ylim = ylim, log = ifelse(log, x, )) lheight - par(csi) if (!is.null(labels)) { linch - max(strwidth(labels, inch), na.rm = TRUE) loffset - (linch + 0.1)/lheight labs - labels[o] mtext(labs, side = 2, line = loffset, at = y, adj = 0, col = color, las = 2, cex = cex, ...) } abline(h = y, lty = dotted, col = lcolor) points(x, y, pch = pch, col = color, bg = bg) if (!is.null(groups)) { gpos - rev(cumsum(rev(tapply(groups, groups, length)) + 2) - 1) ginch - max(strwidth(glabels, inch), na.rm = TRUE) goffset - (max(linch + 0.2, ginch, na.rm = TRUE) + 0.1)/lheight mtext(glabels, side = 2, line = goffset, at = gpos, adj = 0, col = gcolor, las = 2, cex = cex, ...) if (!is.null(gdata)) { abline(h = gpos, lty = dotted) points(gdata, gpos, pch = gpch, col = gcolor, bg = bg, ...) } } axis(1) box() title(main = main, xlab = xlab, ylab = ylab, ...) invisible() } Many thanks for your attention. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using DDF With Data Import into R
I am a brand-new R user. The first project I am looking at requires the use of a US Census data file. This data file is very large, and comes with an associated data definition file (DDF). I would like to know if I can use the DDF file in association with the import of the data file (as opposed to creating a very large scan statement). This was do-able in SAS, but I have been unable to find an equivalent capability in R. Thanks very much for the help! Kat--- - [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Re constructing a dataframe from a database of newspaper articles
From: Bob Green [EMAIL PROTECTED] I am hoping for some assistance with formatting a large text file which consists of a series of individual records. Each record includes specific labels/field names (a sample of 1 record (one of the longest ones) is below - at end of post. What I want to do is reformat the data, so that each individual record becomes a row (some cells will have a lot of text). For example, the column variables I want are (a) HD in one column (b)BY in one column (c) WC data in one column, (d) PD data in one column, (e) SC data in one column (f) PG data in one column g) LP and TD text in one column - this column can contain quite a lot of text, e.g 1900 words. The other fields are unwanted If there were 150 individual records, when formatted this would be a 7 column by 150 row dataset. Most transparently, txt - readLines(c:\\cm-mht1.txt) no_of_records - length(grep(^HD,txt) res - matrix(nr=no_of_records, nc=8) idx - 0 for (i in 1:length(txt)) { if (regexpr(^HD, txt[i])!=-1) idx - idx+1 if (regexpr(^HD, txt[i])!=-1) res[idx, 1] - txt[i] if (regexpr(^BY, txt[i])!=-1) res[idx, 2] - txt[i] ... if (regexpr(^TD, txt[i])!=-1) res[idx, 8] - txt[i] } res[,7] - paste(res[,7], res[,8], sep=; ) res - res[,-8] | David Duffy (MBBS PhD) ,-_|\ | email: [EMAIL PROTECTED] ph: INT+61+7+3362-0217 fax: -0101 / * | Epidemiology Unit, Queensland Institute of Medical Research \_,-._/ | 300 Herston Rd, Brisbane, Queensland 4029, Australia GPG 4D0B994A v __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Re constructing a dataframe from a database of newspaper articles
On Mon, 24 Jul 2006, David Duffy wrote: From: Bob Green [EMAIL PROTECTED] I am hoping for some assistance with formatting a large text file which consists of a series of individual records. Each record includes specific labels/field names (a sample of 1 record (one of the longest ones) is below - at end of post. What I want to do is reformat the data, so that each individual record becomes a row (some cells will have a lot of text). For example, the column variables I want are (a) HD in one column (b)BY in one column (c) WC data in one column, (d) PD data in one column, (e) SC data in one column (f) PG data in one column g) LP and TD text in one column - this column can contain quite a lot of text, e.g 1900 words. The other fields are unwanted If there were 150 individual records, when formatted this would be a 7 column by 150 row dataset. Oops, I forgot to add the bit about multiple lines per field... txt - readLines(c:\\cm-mht1.txt) txt - gsub([ ]+, ,txt) txt - gsub(^[ ]+,,txt) no_of_records - length(grep(^HD,txt) res - matrix(, nr=no_of_records, nc=7) idx - 0 typ - 0 for (i in 1:length(txt)) { if (regexpr(^HD, txt[i])!=-1) { idx - idx+1 typ - 1 }else if (regexpr(^BY, txt[i])!=-1) { typ - 2 } ... } else if (regexpr((^LP)|(^TD), txt[i])!=-1) { typ - 7 } else if (regexpr(^[A-Z][A-Z], txt[i])!=-1) { typ - 0 } if (typ0) { res[idx,typ] - paste(res[idx,typ], txt[i], sep= ) } } | David Duffy (MBBS PhD) ,-_|\ | email: [EMAIL PROTECTED] ph: INT+61+7+3362-0217 fax: -0101 / * | Epidemiology Unit, Queensland Institute of Medical Research \_,-._/ | 300 Herston Rd, Brisbane, Queensland 4029, Australia GPG 4D0B994A v __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Saving R objects
I am trying to find the best way to save the follwoing object I am creating library(multcomp) data(recovery) Dcirec-simint(minutes~blanket, data=recovery, conf.level=0.9, alternative=less) I am probably not doing it the most efficient way I think. Here is what I am doing a-print(Dcirec) write(a,file=mult_test.dat, append=T) or save(Dcirec, file=mult.out) Which is the best way to save it, so that I can access its contents outside the R environment? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Saving R objects
It depends on what information you want to save and how the program on the other end needs it. For the save version I would at least use ascii = TRUE to get it in a more readable fashion. Look at file.show(mult_test.dat) file.show(mult.out) # but use ascii=TRUE on your save statement. to see what you are getting. Other possibilities are to use R2HTML or XML packages to output to HTML or XML. You might want to handle the various components of Dcirec separately. To see what's inside: unclass(Dcirec) str(Dcirec) dput(Dcirec) and use cat statements to output the components in the format of your choice possibly in conjunction with sprintf. On 7/23/06, Nair, Murlidharan T [EMAIL PROTECTED] wrote: I am trying to find the best way to save the follwoing object I am creating library(multcomp) data(recovery) Dcirec-simint(minutes~blanket, data=recovery, conf.level=0.9, alternative=less) I am probably not doing it the most efficient way I think. Here is what I am doing a-print(Dcirec) write(a,file=mult_test.dat, append=T) or save(Dcirec, file=mult.out) Which is the best way to save it, so that I can access its contents outside the R environment? __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to obtain 95th percentile of a normal distribution of a continuous variable
Hi, How do I get R to output the 95% cutoff from a distribution of a continous variable? summary() only displays a few statistics Thanks! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to obtain 95th percentile of a normal distribution of a continuous variable
?quantile On 7/23/06, jenny tan [EMAIL PROTECTED] wrote: Hi, How do I get R to output the 95% cutoff from a distribution of a continous variable? summary() only displays a few statistics Thanks! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- WenSui Liu (http://spaces.msn.com/statcompute/blog) Senior Decision Support Analyst Health Policy and Clinical Effectiveness Cincinnati Children Hospital Medical Center [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to obtain 95th percentile of a normal distribution of a continuous variable
?quantile jenny tan wrote: Hi, How do I get R to output the 95% cutoff from a distribution of a continous variable? summary() only displays a few statistics Thanks! __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Simon Blomberg, B.Sc.(Hons.), Ph.D, M.App.Stat. Centre for Resource and Environmental Studies The Australian National University Canberra ACT 0200 Australia T: +61 2 6125 7800 email: Simon.Blomberg_at_anu.edu.au F: +61 2 6125 0757 CRICOS Provider # 00120C __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] diff, POSIXct, POSIXlt, POSIXt
Just one more comment. It is possible to define length.POSIXlt yourself in which case diff works with POSIXlt objects. length.POSIXlt - function(x) length(x[[1]]) diff(dts) Time differences of 91, 92, 183, 91, 92, 182, 91, 92, 182 days On 7/23/06, Gabor Grothendieck [EMAIL PROTECTED] wrote: Moving this to r-devel. Looking at the diff.POSIXt code we see the problem is that it takes the length of the input using length which is wrong since in the case of POSIXlt the length is always 9 (or maybe length should be defined differently for POSIXlt?). Try this which gives the same problem: dts[-1] - dts[-length(dts)] We get a more sensible answer if length is calculated correctly: dts[-1] - dts[-length(dts[[1]])] On 7/23/06, Patrick Giraudoux [EMAIL PROTECTED] wrote: Try converting to POSIXct: That's what I did finally (see the previous e-mail). dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006) dts - as.POSIXct(strptime(dts, %d/%m/%Y)) diff(dts) Time differences of 91, 92, 183, 91, 92, 182, 91, 92, 182 days What is the problem you are trying to solve? Actually, I don't understand why using diff() and POSIXct provides the expected result and not using POSIXlt. Both POSIXct and POSIXlt are of class POSIXt. The doc of diff() stresses that 'diff' is a generic function with a default method and ones for classes 'ts', 'POSIXt' and 'Date'. It does not mention differences between POSIXct and POSIXlt. Moreover, using diff() with POSIXlt has provided (wrong) numbers... and not an error. This may be difficult to detect sometimes along programme lines. Must one keep in mind that diff() is reliably applicable only on POSIXct? In this case, should not it bve mentionned in the documentation? All the best, Patrick jim holtman a écrit : Try converting to POSIXct: str(dts) 'POSIXlt', format: chr [1:10] 2003-04-15 2003-07-15 2003-10-15 2004-04-15 2004-07-15 2004-10-15 2005-04-15 ... dts [1] 2003-04-15 2003-07-15 2003-10-15 2004-04-15 2004-07-15 2004-10-15 2005-04-15 2005-07-15 [9] 2005-10-15 2006-04-15 dts - as.POSIXct(dts) dts [1] 2003-04-15 EDT 2003-07-15 EDT 2003-10-15 EDT 2004-04-15 EDT 2004-07-15 EDT 2004-10-15 EDT [7] 2005-04-15 EDT 2005-07-15 EDT 2005-10-15 EDT 2006-04-15 EDT diff(dts) Time differences of 91, 92, 183, 91, 92, 182, 91, 92, 182 days On 7/23/06, *Patrick Giraudoux* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Dear Listers, I have encountered a strange problem using diff() and POSIXt: dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006) dts - strptime(dts, %d/%m/%Y) class(dts) [1] POSIXt POSIXlt diff(dts) Time differences of 7862400, 7948800, 15811200, 7862400, 7948800, 15724800, 7862400, 7948800,0 secs In this case the result is not the one expected: expressed in seconds and not in days, and the difference between the two last dates is not 0. Now, if one use a vector of 9 dates only (whatever the date removed), things come well: diff(dts[-1]) Time differences of 92, 183, 91, 92, 182, 91, 92, 182 days Also if one contrains dts to POSIXct dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006) dts - as.POSIXct(strptime(dts, %d/%m/%Y)) diff(dts) Time differences of 91, 92, 183, 91, 92, 182, 91, 92, 182 days Any rational in that? Patrick __ R-help@stat.math.ethz.ch mailto:R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve? [[alternative HTML version deleted]] __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] diff, POSIXct, POSIXlt, POSIXt
OK. Got it. Thanks a lot everybody. I however feel that although the problem can be technically handled by any user aware of it, it should be fixed in R in a more general way, either by modifying the diff() code so that it really handles any kind of POSIXt (POSIXlt and POSIXct) with the same final result (as claimed in the documentation), or mentioning explicitely in de documentation that diff(), as it is written currently, can handle correctly only POSIXct (and not any POSIXt or POSIXlt). There is a kind of danger of wrong output for users (even those reading the documentation) if things are left as they are, and I have detected this problem just by chance. All the best, Patrick Gabor Grothendieck a écrit : Just one more comment. It is possible to define length.POSIXlt yourself in which case diff works with POSIXlt objects. length.POSIXlt - function(x) length(x[[1]]) diff(dts) Time differences of 91, 92, 183, 91, 92, 182, 91, 92, 182 days On 7/23/06, Gabor Grothendieck [EMAIL PROTECTED] wrote: Moving this to r-devel. Looking at the diff.POSIXt code we see the problem is that it takes the length of the input using length which is wrong since in the case of POSIXlt the length is always 9 (or maybe length should be defined differently for POSIXlt?). Try this which gives the same problem: dts[-1] - dts[-length(dts)] We get a more sensible answer if length is calculated correctly: dts[-1] - dts[-length(dts[[1]])] On 7/23/06, Patrick Giraudoux [EMAIL PROTECTED] wrote: Try converting to POSIXct: That's what I did finally (see the previous e-mail). dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006) dts - as.POSIXct(strptime(dts, %d/%m/%Y)) diff(dts) Time differences of 91, 92, 183, 91, 92, 182, 91, 92, 182 days What is the problem you are trying to solve? Actually, I don't understand why using diff() and POSIXct provides the expected result and not using POSIXlt. Both POSIXct and POSIXlt are of class POSIXt. The doc of diff() stresses that 'diff' is a generic function with a default method and ones for classes 'ts', 'POSIXt' and 'Date'. It does not mention differences between POSIXct and POSIXlt. Moreover, using diff() with POSIXlt has provided (wrong) numbers... and not an error. This may be difficult to detect sometimes along programme lines. Must one keep in mind that diff() is reliably applicable only on POSIXct? In this case, should not it bve mentionned in the documentation? All the best, Patrick jim holtman a écrit : Try converting to POSIXct: str(dts) 'POSIXlt', format: chr [1:10] 2003-04-15 2003-07-15 2003-10-15 2004-04-15 2004-07-15 2004-10-15 2005-04-15 ... dts [1] 2003-04-15 2003-07-15 2003-10-15 2004-04-15 2004-07-15 2004-10-15 2005-04-15 2005-07-15 [9] 2005-10-15 2006-04-15 dts - as.POSIXct(dts) dts [1] 2003-04-15 EDT 2003-07-15 EDT 2003-10-15 EDT 2004-04-15 EDT 2004-07-15 EDT 2004-10-15 EDT [7] 2005-04-15 EDT 2005-07-15 EDT 2005-10-15 EDT 2006-04-15 EDT diff(dts) Time differences of 91, 92, 183, 91, 92, 182, 91, 92, 182 days On 7/23/06, *Patrick Giraudoux* [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: Dear Listers, I have encountered a strange problem using diff() and POSIXt: dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006) dts - strptime(dts, %d/%m/%Y) class(dts) [1] POSIXt POSIXlt diff(dts) Time differences of 7862400, 7948800, 15811200, 7862400, 7948800, 15724800, 7862400, 7948800,0 secs In this case the result is not the one expected: expressed in seconds and not in days, and the difference between the two last dates is not 0. Now, if one use a vector of 9 dates only (whatever the date removed), things come well: diff(dts[-1]) Time differences of 92, 183, 91, 92, 182, 91, 92, 182 days Also if one contrains dts to POSIXct dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006) dts - as.POSIXct(strptime(dts, %d/%m/%Y)) diff(dts) Time differences of 91, 92, 183, 91, 92, 182, 91, 92, 182 days Any rational in that? Patrick __ R-help@stat.math.ethz.ch mailto:R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Cincinnati, OH +1 513