[R] constructing a dataframe from a database of newspaper articles

2006-07-23 Thread Bob Green


I am hoping for some assistance with formatting a large text file which 
consists of a series of individual records. Each record includes specific 
labels/field names (a sample of 1 record (one of the longest ones) is 
below  - at end of post. What I want to do is reformat the data, so that 
each individual record becomes a row (some cells will have a lot of text). 
For example, the column variables I want are (a) HD  in one column 
(b)BY in one column (c) WC data in one column,  (d) PD data in one 
column, (e) SC data in one column (f) PG data in one column   g) LP and TD 
text in one column  - this column can contain quite a lot of text, e.g 1900 
words. The other fields are unwanted

If there were 150 individual records, when formatted this would be a 7 
column by 150 row dataset.

I was advised to:

1. read in the file using readLines giving a character vector one element 
per input line.
2. convert that to lines of the form:
id op text
where each such line is a field and multiline fields have been collapsed 
into a single line of text. This step involves
detailed processing and you could do it in a loop or you could try a 
vectorized approach. A vectorized approach
will likely involve using
3. the lines created above could be converted to a data frame with three 
columns and
4. reshape used to create a wide data frame.
5. then write it out using write.csv.

I have got as far as being able to read the text into R  - I am unsure if 
the warning is a problem. I am however, not at all sure what I need to do next.

Any assistance is much appreciated,


Bob

(A) syntax

  mht - scan(what=c:\\cm-mht1.txt).
readLines(c:\\cm-mht1.txt,n = -1)

[8376] © 2006 Dow Jones Reuters Business Interactive LLC (trading as 
Factiva). All 
[8377] rights reserved. 

Warning message:
incomplete final line found by readLines on 'c:\cm-mht1.txt'

(B) sample data

   HD Was Charles Manson temporarily insane when he led a wild killing
   rampage in the US in 1969?
   BY By Deborah Cassrels.
   WC 1834 words
   PD 23 June 2001
   SN Courier Mail
   SC COUMAI
   PG 30
   LA English
   CY (c) 2001 Queensland Newspapers Pty Ltd

   LP Was Charles Manson temporarily insane when he led a wild killing
   rampage in the US in 1969? Clearly he was mad and bad. But would
   Queensland have placed him before its Mental Health Tribunal, found 
him of
   unsound mind at the time of his crimes, institutionalised him and
   treated his illness? WHY is Queensland the only jurisdiction in the
   Commonwealth with a Mental Health Tribunal which establishes if an 
accused
   is fit to face trial or of unsound mind at the time of an alleged 
offence?
   Why is mental incompetence not determined in an adversarial court by a
   jury? Under the Mental Health Act 1974, the tribunal, a statutory body
   operating since 1985, comprises three-yearly appointments of a Supreme
   Court judge and two assisting psychiatrists, whose advice does not 
have to
   be accepted. The judge alone constitutes the tribunal, an inquisitorial
   process conducted in the Supreme Court in Brisbane.

   TD Victims or family are not notified of hearings or allowed to submit
   victim impact statements. They are prohibited from talking to the media
   until 28 days after the decision. And when patients return to the
   community there is no requirement for neighbours or victims to be
   notified. Is this legislation enlightened or are we just suckers, 
falling
   for time and money-saving strategies? The tribunal has earned a 
reputation
   as progressive, humane and economical among some judges who have 
presided
   over it. The inaugural chair, former Supreme Court judge Angelo 
Vasta QC,
   thinks the tribunal system is enlightened and it saves an enormous
   amount of expenditure. He points to the humane side of treating the 
ill
   in a secure hospital rather than punishing them for offences but is
   uncomfortable with borderline cases. Whether people are mad or bad 
ought
   to be established by a very thorough investigation.
   The associated Patient Review Tribunals (of which there are five) 
consist
   of three to six members, including the chair who is a legal officer, a
   medical practitioner and a mental health professional. A 
psychiatrist is
   not required. The other three have no specific qualifications and can
   include former patients. The tribunals operate in closed hearings and
   patients of unsound mind or unfit for trial are reviewed every 12 
months.
   Leave is granted either by the Mental Health Tribunal or the Patient
   Review Tribunal, which determine when a restricted patient is 
discharged
   into the community. Says the Director of Mental Health, Dr Peggy Brown:
   In the case of serious offences you can be assured the period of
   monitoring is quite 

Re: [R] compile R with ACML support | RHEL 4

2006-07-23 Thread Prof Brian Ripley
I doubt if /opt/acml3.5.0/gnu/lib is in your library path (it might be in 
your ldcache paths).  So you need to set LD_LIBRARY_PATH or supply -L

Look in config.log to find out what actually happened.

BTW: this was more of an R-devel question than R-help.

On Sat, 22 Jul 2006, Evan Cooch wrote:

 Greetings -
 
 I'm trying to compile R under GNU/Linux (RHEL 4) on a multi-Opteron box, 
 with ACML support.
 
 First, I downloaded and installed ACML 3.5 - GNU version, although I'm 
 not entirely sure what the differences are - from the AMD website. The 
 ACML libraries were installed to /opt/acml3.5.0/
 
 Second, I ran ./configure --with-blas='-lacml'  
 
 The configure went fine, except that at the end of the output, it 
 reports that readline is the only external library configured.
 
 External libraries: readline
 
 OK, so next, I try
 
 ./configure --with-lapack
 
 Same think - only readline is referenced.
 
 So, clearly, I'm missing a particular step. I'm guessing that I need to 
 change some environment variable (or two), or tweak something at some 
 other stage, to get R to properly reference the ACML libraries. I'm 
 puzzled why --with-blas='-acml' doesn't do the trick?
 
 Suggestions? Pointers to the obvious mistake?
 
 Thanks...

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] diff, POSIXct, POSIXlt, POSIXt

2006-07-23 Thread Patrick Giraudoux
Dear Listers,

I have encountered a strange problem using diff() and POSIXt:

dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006)
dts - strptime(dts, %d/%m/%Y)
class(dts)

[1] POSIXt  POSIXlt

diff(dts)

Time differences of  7862400,  7948800, 15811200,  7862400,  7948800, 
15724800,  7862400,  7948800,0 secs

In this case the result is not the one expected: expressed in seconds 
and not in days, and the difference between the two last dates is not 0.

Now, if one use a vector of 9 dates only (whatever the date removed), 
things come well:

diff(dts[-1])

Time differences of  92, 183,  91,  92, 182,  91,  92, 182 days

Also if one contrains dts to POSIXct

dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006)
dts - as.POSIXct(strptime(dts, %d/%m/%Y))
diff(dts)

Time differences of  91,  92, 183,  91,  92, 182,  91,  92, 182 days

Any rational in that?

Patrick

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to pass eval.max from lme() to nlminb?

2006-07-23 Thread Andrew Robinson
Dear R community,

I'm fitting a complex mixed-effects model that requires numerous
iterations and function evaluations.  I note that nlminb accepts a
list of control parameters, including eval.max.  Is there a way to
change the default eval.max value for nlminb when it is being called
from lme?

Thanks for any thoughts,

Andrew
-- 
Andrew Robinson  
Department of Mathematics and StatisticsTel: +61-3-8344-9763
University of Melbourne, VIC 3010 Australia Fax: +61-3-8344-4599
Email: [EMAIL PROTECTED] http://www.ms.unimelb.edu.au

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Why the contrain does not work for selecting a particular range of data?

2006-07-23 Thread Xin
Dear:

Continuing the issue of 'ifelse'! I selecting the data whose 'x2'=1 for 
maximizing likelihood. I used two way to do this but the results are different.
   
1.Way one I use the data for x2=1 and run the program. It works for me. 
Tthe program is described as below:
function (parameters,y1,x11)
{
p-parameters[1]
alpha1-parameters[2]
beta1-parameters[3]
delta1-parameters[4]
lamda1-parameters[5]

mu-alpha1*((x11)^beta1)*exp(-delta1*(x11^lamda1))

ifelse(y10|x110,

L-lgamma(y1+p)+p*(log(p)-log(mu+p))+y1*(log(mu)-log(mu+p))-lfactorial(y1)-lgamma(p)

,Inf)

L

}

This is working for me.

   2 Way two: I select the data whose x2=1 in the whole range of data. It works 
but it is not right comparing the value of MLE. the program is:
function (parameters,y,x1,x2)

{

p-parameters[1]

alpha1-parameters[2]

beta1-parameters[3]

delta1-parameters[4]

alpha2-parameters[5]

mu-alpha1*((x1)^beta1)*exp(-delta1*(x1^alpha2))

if(x10  x2==1)

{

L-lgamma(y+p)+p*(log(p)-log(mu+p))+y*(log(mu)-log(mu+p))-lfactorial(y)-lgamma(p)

}



L

}

   The reason why I edit the program by the second way is I want to use one 
program for getting results of the different range of data.

Anyone can help? Please!

Thanks!



Xin Shi



My Estimation function for way two is :

function (parameters, y, x1,x2)

{

nx1 - length(x1);

nx2 - length(x2);

ny - length(y);

x1 - matrix(x1,nrow=nx1,ncol=1);

x2 - matrix(x2,nrow=nx2,ncol=1);

y - matrix(y,nrow=ny,ncol=1);

##Likelihood

##--

Lvec - matrix(0,nrow=nx1,ncol=1)

for (i in 1:ny)

{

Lvec[i] - nb_L3(parameters, y[i],x1[i],x2[i])

LL - -sum(Lvec)

}

LL

}

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to pass eval.max from lme() to nlminb?

2006-07-23 Thread Berwin A Turlach
G'day Andrew,

 AR == Andrew Robinson [EMAIL PROTECTED] writes:

AR I'm fitting a complex mixed-effects model that requires
AR numerous iterations and function evaluations.  I note that
AR nlminb accepts a list of control parameters, including
AR eval.max.  Is there a way to change the default eval.max value
AR for nlminb when it is being called from lme?
Looking at the code of lme.formula, I can only find this snippet:

[...]
optRes - if (controlvals$opt == nlminb) {
nlminb(c(coef(lmeSt)), function(lmePars) -logLik(lmeSt, 
lmePars), control = list(iter.max = controlvals$msMaxIter, 
trace = controlvals$msVerbose))
}
else {
optim(c(coef(lmeSt)), function(lmePars) -logLik(lmeSt, 
lmePars), control = list(trace = controlvals$msVerbose, 
maxit = controlvals$msMaxIter, reltol = if (numIter == 
  0) controlvals$msTol else 100 * .Machine$double.eps), 
method = controlvals$optimMethod)
}
[...]

this seems to indicate that you can only change the values for
'iter.max' and 'trace' in the call to 'nlminb()' by setting values for
'msMaxIter' and 'msVerbose', using 'lmeControl', when calling 'lme()'.

Cheers,

Berwin

== Full address 
Berwin A Turlach  Tel.: +61 (8) 6488 3338 (secr)   
School of Mathematics and Statistics+61 (8) 6488 3383 (self)  
The University of Western Australia   FAX : +61 (8) 6488 1028
35 Stirling Highway   
Crawley WA 6009e-mail: [EMAIL PROTECTED]
Australiahttp://www.maths.uwa.edu.au/~berwin

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] diff, POSIXct, POSIXlt, POSIXt

2006-07-23 Thread jim holtman
Try converting to POSIXct:

 str(dts)
'POSIXlt', format: chr [1:10] 2003-04-15 2003-07-15 2003-10-15
2004-04-15 2004-07-15 2004-10-15 2005-04-15 ...
 dts
 [1] 2003-04-15 2003-07-15 2003-10-15 2004-04-15 2004-07-15
2004-10-15 2005-04-15 2005-07-15
 [9] 2005-10-15 2006-04-15
 dts - as.POSIXct(dts)
 dts
 [1] 2003-04-15 EDT 2003-07-15 EDT 2003-10-15 EDT 2004-04-15 EDT
2004-07-15 EDT 2004-10-15 EDT
 [7] 2005-04-15 EDT 2005-07-15 EDT 2005-10-15 EDT 2006-04-15 EDT
 diff(dts)
Time differences of  91,  92, 183,  91,  92, 182,  91,  92, 182 days




On 7/23/06, Patrick Giraudoux [EMAIL PROTECTED] wrote:

 Dear Listers,

 I have encountered a strange problem using diff() and POSIXt:


 dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006)
 dts - strptime(dts, %d/%m/%Y)
 class(dts)

 [1] POSIXt  POSIXlt

 diff(dts)

 Time differences of  7862400,  7948800, 15811200,  7862400,  7948800,
 15724800,  7862400,  7948800,0 secs

 In this case the result is not the one expected: expressed in seconds
 and not in days, and the difference between the two last dates is not 0.

 Now, if one use a vector of 9 dates only (whatever the date removed),
 things come well:

 diff(dts[-1])

 Time differences of  92, 183,  91,  92, 182,  91,  92, 182 days

 Also if one contrains dts to POSIXct


 dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006)
 dts - as.POSIXct(strptime(dts, %d/%m/%Y))
 diff(dts)

 Time differences of  91,  92, 183,  91,  92, 182,  91,  92, 182 days

 Any rational in that?

 Patrick

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] test

2006-07-23 Thread Gregor Gorjanc
just a test
-- 
Lep pozdrav / With regards,
Gregor Gorjanc

--
University of Ljubljana PhD student
Biotechnical Faculty
Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan
Groblje 3   mail: gregor.gorjanc at bfro.uni-lj.si

SI-1230 Domzale tel: +386 (0)1 72 17 861
Slovenia, Europefax: +386 (0)1 72 17 888

--
One must learn by doing the thing; for though you think you know it,
 you have no certainty until you try. Sophocles ~ 450 B.C.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Warning Messages using rq -quantile regressions

2006-07-23 Thread roger koenker

On Jul 23, 2006, at 5:27 AM, roger koenker wrote:

 When computing the median from a sample with an even number of  
 distinct
 values there is inherently some ambiguity about its value:  any  
 value between
 the middle order statistics is a median.  Similarly, in  
 regression settings the
 optimization problem solved by the br version of the simplex  
 algorithm,
 modified to do general quantile regression identifies cases where  
 there may
 be non uniqueness of this type.  When there are continuous  
 covariates this
 is quite rare, when covariates are discrete then it is relatively  
 common, at
 least when tau is chosen from the rationals.  For univariate  
 quantiles R provides
 several methods of resolving this sort of ambiguity by  
 interpolation, br doesn't
 try to do this, instead returning the first vertex solution that it  
 comes to.  Should
 we worry about this?  My answer would be no.  Viewed from an  
 asymptotic
 perspective any choice of a unique value among the multiple  
 solutions is a
 1/n perturbation  -- with 2500 observations this is unlikely to be  
 interesting.
 More to the point, inference about the coefficients of the model,  
 which provides
 O(1/sqrt(n)) intervals is perfectly capable of assessing the  
 meaningful uncertainty
 about these values.  Finally, if you would prefer an estimation  
 procedure that
 produced unique values more like the interpolation procedures in  
 the univariate
 setting, you could try the fn option for the algorithm.  Interior  
 point methods for
 solving linear programming problems have the feature that they  
 tend to converge
 to the centroid of solutions sets when such sets exist.  This  
 approach provides a
 means to assess the magnitude of the non-uniqueness in a particular  
 application.

 I hope that this helps,

 url:www.econ.uiuc.edu/~rogerRoger Koenker
 email   [EMAIL PROTECTED]   Department of  
 Economics
 vox:217-333-4558University of Illinois
 fax:217-244-6678Champaign, IL 61820


 On Jul 22, 2006, at 9:07 PM, Neil KM wrote:

 I am a new to using quantile regressions in R. I have estimated a  
 set of
 coefficients using the method=br algorithm with the rq command  
 at various
 quantiles along the entire distribution.

 My data set contains approximately 2,500 observations and I have 7  
 predictor
 variables. I receive the following warning message:

 Solution may be nonunique in: rq.fit.br(x, y, tau = tau, ...)

 There are 13 warnings of this type after I run a single  model. My  
 results
 are similiar to the results I received in other stat programs  
 using quantile
 reg procedures. I am unclear what these warning messages imply and  
 if there
 are problems with model fit/convergence that I may need to consider.
 Any help would be appreciated. Thanks!

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] diff, POSIXct, POSIXlt, POSIXt

2006-07-23 Thread Patrick Giraudoux
 Try converting to POSIXct:
That's what I did finally (see the previous e-mail).

dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006)
 

dts - as.POSIXct(strptime(dts, %d/%m/%Y))
diff(dts)

Time differences of  91,  92, 183,  91,  92, 182,  91,  92, 182 days

 What is the problem you are trying to solve?
Actually, I don't understand why using diff() and POSIXct provides the 
expected result and not using POSIXlt. Both POSIXct and POSIXlt are of 
class POSIXt. The doc of diff() stresses that 'diff' is a generic 
function with a default method and ones for classes 'ts', 'POSIXt' 
and 'Date'. It does not mention differences between POSIXct and POSIXlt.

Moreover, using diff() with POSIXlt has provided (wrong) numbers... and 
not an error. This may be difficult to detect sometimes along programme 
lines. Must one keep in mind that diff() is reliably applicable only on 
POSIXct? In this case, should not it bve mentionned in the documentation?

All the best,

Patrick







jim holtman a écrit :
 Try converting to POSIXct:
  
  str(dts)
 'POSIXlt', format: chr [1:10] 2003-04-15 2003-07-15 2003-10-15 
 2004-04-15 2004-07-15 2004-10-15 2005-04-15 ...
  dts
  [1] 2003-04-15 2003-07-15 2003-10-15 2004-04-15 2004-07-15 
 2004-10-15 2005-04-15 2005-07-15
  [9] 2005-10-15 2006-04-15
  dts - as.POSIXct(dts)
  dts
  [1] 2003-04-15 EDT 2003-07-15 EDT 2003-10-15 EDT 2004-04-15 
 EDT 2004-07-15 EDT 2004-10-15 EDT
  [7] 2005-04-15 EDT 2005-07-15 EDT 2005-10-15 EDT 2006-04-15 EDT
  diff(dts)
 Time differences of  91,  92, 183,  91,  92, 182,  91,  92, 182 days
 


  
 On 7/23/06, *Patrick Giraudoux* [EMAIL PROTECTED] 
 mailto:[EMAIL PROTECTED] wrote:

 Dear Listers,

 I have encountered a strange problem using diff() and POSIXt:

 
 dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006)

 dts - strptime(dts, %d/%m/%Y)
 class(dts)

 [1] POSIXt  POSIXlt

 diff(dts)

 Time differences of  7862400,  7948800, 15811200,  7862400,  7948800,
 15724800,  7862400,  7948800,0 secs

 In this case the result is not the one expected: expressed in seconds
 and not in days, and the difference between the two last dates is
 not 0.

 Now, if one use a vector of 9 dates only (whatever the date removed),
 things come well:

 diff(dts[-1])

 Time differences of  92, 183,  91,  92, 182,  91,  92, 182 days

 Also if one contrains dts to POSIXct

 
 dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006)

 dts - as.POSIXct(strptime(dts, %d/%m/%Y))
 diff(dts)

 Time differences of  91,  92, 183,  91,  92, 182,  91,  92, 182 days

 Any rational in that?

 Patrick

 __
 R-help@stat.math.ethz.ch mailto:R-help@stat.math.ethz.ch mailing
 list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




 -- 
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem you are trying to solve? 

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Why the contrain does not work for selecting a particular range of data?

2006-07-23 Thread Duncan Murdoch
On 7/23/2006 4:07 AM, Xin wrote:
 Dear:
 
 Continuing the issue of 'ifelse'! I selecting the data whose 'x2'=1 for 
 maximizing likelihood. I used two way to do this but the results are 
 different.

In the first case you used ifelse(), in the second you used if().  They 
behave differently:  ifelse() evaluates all tests in a vector, if() only 
evaluates one.  You probably want ifelse() in both cases.

Duncan Murdoch


 1.Way one I use the data for x2=1 and run the program. It works for me. 
 Tthe program is described as below:
 function (parameters,y1,x11)
 {
 p-parameters[1]
 alpha1-parameters[2]
 beta1-parameters[3]
 delta1-parameters[4]
 lamda1-parameters[5]
 
 mu-alpha1*((x11)^beta1)*exp(-delta1*(x11^lamda1))
 
 ifelse(y10|x110,
 
 L-lgamma(y1+p)+p*(log(p)-log(mu+p))+y1*(log(mu)-log(mu+p))-lfactorial(y1)-lgamma(p)
 
 ,Inf)
 
 L
 
 }
 
 This is working for me.
 
2 Way two: I select the data whose x2=1 in the whole range of data. It 
 works but it is not right comparing the value of MLE. the program is:
 function (parameters,y,x1,x2)
 
 {
 
 p-parameters[1]
 
 alpha1-parameters[2]
 
 beta1-parameters[3]
 
 delta1-parameters[4]
 
 alpha2-parameters[5]
 
 mu-alpha1*((x1)^beta1)*exp(-delta1*(x1^alpha2))
 
 if(x10  x2==1)
 
 {
 
 L-lgamma(y+p)+p*(log(p)-log(mu+p))+y*(log(mu)-log(mu+p))-lfactorial(y)-lgamma(p)
 
 }
 
 
 
 L
 
 }
 
The reason why I edit the program by the second way is I want to use one 
 program for getting results of the different range of data.
 
 Anyone can help? Please!
 
 Thanks!
 
 
 
 Xin Shi
 
 
 
 My Estimation function for way two is :
 
 function (parameters, y, x1,x2)
 
 {
 
 nx1 - length(x1);
 
 nx2 - length(x2);
 
 ny - length(y);
 
 x1 - matrix(x1,nrow=nx1,ncol=1);
 
 x2 - matrix(x2,nrow=nx2,ncol=1);
 
 y - matrix(y,nrow=ny,ncol=1);
 
 ##Likelihood
 
 ##--
 
 Lvec - matrix(0,nrow=nx1,ncol=1)
 
 for (i in 1:ny)
 
 {
 
 Lvec[i] - nb_L3(parameters, y[i],x1[i],x2[i])
 
 LL - -sum(Lvec)
 
 }
 
 LL
 
 }
 
   [[alternative HTML version deleted]]
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] diff, POSIXct, POSIXlt, POSIXt

2006-07-23 Thread Gabor Grothendieck
Moving this to r-devel.

Looking at the diff.POSIXt code we see the problem is that it takes the
length of the input using length which is wrong since in the case
of POSIXlt the length is always 9 (or maybe length should be
defined differently for POSIXlt?).  Try this which gives the same
problem:

   dts[-1] - dts[-length(dts)]

We get a more sensible answer if length is calculated correctly:

  dts[-1] - dts[-length(dts[[1]])]


On 7/23/06, Patrick Giraudoux [EMAIL PROTECTED] wrote:
  Try converting to POSIXct:
 That's what I did finally (see the previous e-mail).

 dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006)

 dts - as.POSIXct(strptime(dts, %d/%m/%Y))
 diff(dts)

 Time differences of  91,  92, 183,  91,  92, 182,  91,  92, 182 days

  What is the problem you are trying to solve?
 Actually, I don't understand why using diff() and POSIXct provides the
 expected result and not using POSIXlt. Both POSIXct and POSIXlt are of
 class POSIXt. The doc of diff() stresses that 'diff' is a generic
 function with a default method and ones for classes 'ts', 'POSIXt'
 and 'Date'. It does not mention differences between POSIXct and POSIXlt.

 Moreover, using diff() with POSIXlt has provided (wrong) numbers... and
 not an error. This may be difficult to detect sometimes along programme
 lines. Must one keep in mind that diff() is reliably applicable only on
 POSIXct? In this case, should not it bve mentionned in the documentation?

 All the best,

 Patrick







 jim holtman a écrit :
  Try converting to POSIXct:
 
   str(dts)
  'POSIXlt', format: chr [1:10] 2003-04-15 2003-07-15 2003-10-15
  2004-04-15 2004-07-15 2004-10-15 2005-04-15 ...
   dts
   [1] 2003-04-15 2003-07-15 2003-10-15 2004-04-15 2004-07-15
  2004-10-15 2005-04-15 2005-07-15
   [9] 2005-10-15 2006-04-15
   dts - as.POSIXct(dts)
   dts
   [1] 2003-04-15 EDT 2003-07-15 EDT 2003-10-15 EDT 2004-04-15
  EDT 2004-07-15 EDT 2004-10-15 EDT
   [7] 2005-04-15 EDT 2005-07-15 EDT 2005-10-15 EDT 2006-04-15 EDT
   diff(dts)
  Time differences of  91,  92, 183,  91,  92, 182,  91,  92, 182 days
  
 
 
 
  On 7/23/06, *Patrick Giraudoux* [EMAIL PROTECTED]
  mailto:[EMAIL PROTECTED] wrote:
 
  Dear Listers,
 
  I have encountered a strange problem using diff() and POSIXt:
 
  
  dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006)
 
  dts - strptime(dts, %d/%m/%Y)
  class(dts)
 
  [1] POSIXt  POSIXlt
 
  diff(dts)
 
  Time differences of  7862400,  7948800, 15811200,  7862400,  7948800,
  15724800,  7862400,  7948800,0 secs
 
  In this case the result is not the one expected: expressed in seconds
  and not in days, and the difference between the two last dates is
  not 0.
 
  Now, if one use a vector of 9 dates only (whatever the date removed),
  things come well:
 
  diff(dts[-1])
 
  Time differences of  92, 183,  91,  92, 182,  91,  92, 182 days
 
  Also if one contrains dts to POSIXct
 
  
  dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006)
 
  dts - as.POSIXct(strptime(dts, %d/%m/%Y))
  diff(dts)
 
  Time differences of  91,  92, 183,  91,  92, 182,  91,  92, 182 days
 
  Any rational in that?
 
  Patrick
 
  __
  R-help@stat.math.ethz.ch mailto:R-help@stat.math.ethz.ch mailing
  list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 
  --
  Jim Holtman
  Cincinnati, OH
  +1 513 646 9390
 
  What is the problem you are trying to solve?

[[alternative HTML version deleted]]



 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Is there anywhere recycle()?

2006-07-23 Thread Gregor Gorjanc
Hello!

I am writting a function, which should recycle one of its arguments if
length of the argument is approprate i.e. something like

foo - function(x, a)
{
  n - length(x)
  if(length(a)  n) { # recycle a
oldA - a
a - vector(length=n)
a[1:n] - oldA
  }
  ## ...
  return(a)
}

foo(c(1, 2), a=c(1, 2))
foo(c(1, 2), a=c(1))

I am now wondering if there is any general/generic functions for such task.

Thanks!

-- 
Lep pozdrav / With regards,
Gregor Gorjanc

--
University of Ljubljana PhD student
Biotechnical Faculty
Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan
Groblje 3   mail: gregor.gorjanc at bfro.uni-lj.si

SI-1230 Domzale tel: +386 (0)1 72 17 861
Slovenia, Europefax: +386 (0)1 72 17 888

--
One must learn by doing the thing; for though you think you know it,
 you have no certainty until you try. Sophocles ~ 450 B.C.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is there anywhere recycle()?

2006-07-23 Thread Gabor Grothendieck
Try:

foo2 - function(x, a) cbind(x,a)[,2]


On 7/23/06, Gregor Gorjanc [EMAIL PROTECTED] wrote:
 Hello!

 I am writting a function, which should recycle one of its arguments if
 length of the argument is approprate i.e. something like

 foo - function(x, a)
 {
  n - length(x)
  if(length(a)  n) { # recycle a
oldA - a
a - vector(length=n)
a[1:n] - oldA
  }
  ## ...
  return(a)
 }

 foo(c(1, 2), a=c(1, 2))
 foo(c(1, 2), a=c(1))

 I am now wondering if there is any general/generic functions for such task.

 Thanks!

 --
 Lep pozdrav / With regards,
Gregor Gorjanc

 --
 University of Ljubljana PhD student
 Biotechnical Faculty
 Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan
 Groblje 3   mail: gregor.gorjanc at bfro.uni-lj.si

 SI-1230 Domzale tel: +386 (0)1 72 17 861
 Slovenia, Europefax: +386 (0)1 72 17 888

 --
 One must learn by doing the thing; for though you think you know it,
  you have no certainty until you try. Sophocles ~ 450 B.C.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is there anywhere recycle()?

2006-07-23 Thread Gregor Gorjanc
Hi,

Gabor Grothendieck wrote:
 Try:
 
 foo2 - function(x, a) cbind(x,a)[,2]
 

thank you for this. It does work to some extent, but not much better
than mine foo.

foo2(c(1, 2, 3), a=1)
[1] 1 1 1

18:14:08
R foo2(c(1, 2, 3), a=c(1,2,3,4))
[1] 1 2 3 4
Warning message:
number of rows of result
is not a multiple of vector length (arg 1) in: cbind(1, x, a)

18:14:13
R foo2(c(1, 2, 3), a=c(1,2,3))
[1] 1 2 3

18:14:18
R foo2(c(1, 2, 3), a=c(1,2))
[1] 1 2 1
Warning message:
number of rows of result
is not a multiple of vector length (arg 2) in: cbind(1, x, a)



 On 7/23/06, Gregor Gorjanc [EMAIL PROTECTED] wrote:
 Hello!

 I am writting a function, which should recycle one of its arguments if
 length of the argument is approprate i.e. something like

 foo - function(x, a)
 {
  n - length(x)
  if(length(a)  n) { # recycle a
oldA - a
a - vector(length=n)
a[1:n] - oldA
  }
  ## ...
  return(a)
 }

 foo(c(1, 2), a=c(1, 2))
 foo(c(1, 2), a=c(1))

 I am now wondering if there is any general/generic functions for such
 task.

 Thanks!

 -- 
 Lep pozdrav / With regards,
Gregor Gorjanc

 --
 University of Ljubljana PhD student
 Biotechnical Faculty
 Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan
 Groblje 3   mail: gregor.gorjanc at bfro.uni-lj.si

 SI-1230 Domzale tel: +386 (0)1 72 17 861
 Slovenia, Europefax: +386 (0)1 72 17 888

 --
 One must learn by doing the thing; for though you think you know it,
  you have no certainty until you try. Sophocles ~ 450 B.C.

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Lep pozdrav / With regards,
Gregor Gorjanc

--
University of Ljubljana PhD student
Biotechnical Faculty
Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan
Groblje 3   mail: gregor.gorjanc at bfro.uni-lj.si

SI-1230 Domzale tel: +386 (0)1 72 17 861
Slovenia, Europefax: +386 (0)1 72 17 888

--
One must learn by doing the thing; for though you think you know it,
 you have no certainty until you try. Sophocles ~ 450 B.C.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Is there anywhere recycle()?

2006-07-23 Thread Gabor Grothendieck
Here is another possibility:

rep(a, length = length(x))

On 7/23/06, Gregor Gorjanc [EMAIL PROTECTED] wrote:
 Hi,

 Gabor Grothendieck wrote:
  Try:
 
  foo2 - function(x, a) cbind(x,a)[,2]
 

 thank you for this. It does work to some extent, but not much better
 than mine foo.

 foo2(c(1, 2, 3), a=1)
 [1] 1 1 1

 18:14:08
 R foo2(c(1, 2, 3), a=c(1,2,3,4))
 [1] 1 2 3 4
 Warning message:
 number of rows of result
is not a multiple of vector length (arg 1) in: cbind(1, x, a)

 18:14:13
 R foo2(c(1, 2, 3), a=c(1,2,3))
 [1] 1 2 3

 18:14:18
 R foo2(c(1, 2, 3), a=c(1,2))
 [1] 1 2 1
 Warning message:
 number of rows of result
is not a multiple of vector length (arg 2) in: cbind(1, x, a)



  On 7/23/06, Gregor Gorjanc [EMAIL PROTECTED] wrote:
  Hello!
 
  I am writting a function, which should recycle one of its arguments if
  length of the argument is approprate i.e. something like
 
  foo - function(x, a)
  {
   n - length(x)
   if(length(a)  n) { # recycle a
 oldA - a
 a - vector(length=n)
 a[1:n] - oldA
   }
   ## ...
   return(a)
  }
 
  foo(c(1, 2), a=c(1, 2))
  foo(c(1, 2), a=c(1))
 
  I am now wondering if there is any general/generic functions for such
  task.
 
  Thanks!
 
  --
  Lep pozdrav / With regards,
 Gregor Gorjanc
 
  --
  University of Ljubljana PhD student
  Biotechnical Faculty
  Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan
  Groblje 3   mail: gregor.gorjanc at bfro.uni-lj.si
 
  SI-1230 Domzale tel: +386 (0)1 72 17 861
  Slovenia, Europefax: +386 (0)1 72 17 888
 
  --
  One must learn by doing the thing; for though you think you know it,
   you have no certainty until you try. Sophocles ~ 450 B.C.
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 


 --
 Lep pozdrav / With regards,
Gregor Gorjanc

 --
 University of Ljubljana PhD student
 Biotechnical Faculty
 Zootechnical Department URI: http://www.bfro.uni-lj.si/MR/ggorjan
 Groblje 3   mail: gregor.gorjanc at bfro.uni-lj.si

 SI-1230 Domzale tel: +386 (0)1 72 17 861
 Slovenia, Europefax: +386 (0)1 72 17 888

 --
 One must learn by doing the thing; for though you think you know it,
  you have no certainty until you try. Sophocles ~ 450 B.C.
 --


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Iterated Data Input/Output with Random Forests

2006-07-23 Thread johnzhou
Hi,

I am currently writing code to input a few thousand files, run them through the
Random Forests package, and then output corresponding results.

When I use the code below:

zz-textConnection(ex.lm.out, w)
sink(zz)
tempData-read.delim(paste(allSnps,1,Phenotype.phn,sep=),header=TRUE,sep=,,quote=\,dec=.)
tempData[[1]]-factor(tempData[[1]])
tempData.rf-randomForest(tempData[[1]]~.,data=tempData,importance=TRUE,proximity=TRUE,outscale=TRUE,replace=TRUE)
tempData.rf
zz-file(paste(ex,1,.data,sep=), w)
cat(ex.lm.out, sep=\n, file=zz)
sink()
close(zz)

I am able to successfully input and output for one file. However, if I try to
use a for loop or a while statement e.g.

for(i in 1:2)
{
zz-textConnection(ex.lm.out, w)
sink(zz)
tempData-read.delim(paste(allSnps,i,Phenotype.phn,sep=),header=TRUE,sep=,,quote=\,dec=.)
tempData[[1]]-factor(tempData[[1]])
tempData.rf-randomForest(tempData[[1]]~.,data=tempData,importance=TRUE,proximity=TRUE,outscale=TRUE,replace=TRUE)
tempData.rf
zz-file(paste(ex,i,.data,sep=), w)
cat(ex.lm.out, sep=\n, file=zz)
sink()
close(zz)
}

I get no error statements but the output is blank. Without the for statement,
setting i-1 works fine.

One other related question is that right now I am trying to gett the loop to
work by using the paste() function with a variable (i). However, the paste
function returns a string.

If I wanted to make a loop of

tempData$pheno1
tempData$pheno2
tempData$pheno3
...

the paste() function will not work. Is there some other method to achieve the
desired effect? Thank you in advance! I have only been working with R for a few
days so please bear with my lack of knowledge!

John Zhou

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] constructing a dataframe from a database of newspaper articles

2006-07-23 Thread jim holtman
I was going to suggest the you use PERL, but here is my attempt at keeping
it in R.  This reads in each line, tried to determine if it has one of the
'separator' words at the beginning of the line and then constructs the
output.


mySeps - c(HD, BY, WC, PD, SC, PG, LP, TD ,
SN, LA, CY)  # section separators
myInc - 0   # record number
foundHD - FALSE
myFile - file('c:/datafile.txt', 'r')
myRec - list()  # contains the data from each record
myOutput - list()  # list with each record
while(length(x - readLines(myFile, n=1))  0){
first - gsub(^\\s*(\\w file://w/+).*, \\1 file://0.0.0.1/, x)
# get the first word
if (!foundHD){  # skip till HD found (assumes this is the start of the
article
if (first == HD) foundHD - TRUE
else next
}
if (first == NS){ # skip to next HD (assumes ND ignores the rest
foundHD - FALSE
myOutput[[myInc - myInc + 1]] - myRec
myRec - list()
next
}
if (first %in% mySeps){
myKey - first  # use at key to myRec
x - sub(first, '', x)
}
myRec[[myKey]] - paste(myRec[[myKey]], x)  # collect data from each
mySep
}
# convert the list to 'long' dataframe for reshape
myResult - NULL
for (i in 1:length(myOutput)){
.x - cbind(i, names(myOutput[[i]]),
unlist(myOutput[[i]][names(myOutput[[i]])]))
myResult - rbind(myResult, .x)
}
myDF - as.data.frame(myResult)
myWide - reshape(myDF, timevar=V2, idvar='i', direction='wide')





On 7/23/06, Bob Green [EMAIL PROTECTED] wrote:



 I am hoping for some assistance with formatting a large text file which
 consists of a series of individual records. Each record includes specific
 labels/field names (a sample of 1 record (one of the longest ones) is
 below  - at end of post. What I want to do is reformat the data, so that
 each individual record becomes a row (some cells will have a lot of text).
 For example, the column variables I want are (a) HD  in one column
 (b)BY in one column (c) WC data in one column,  (d) PD data in one
 column, (e) SC data in one column (f) PG data in one column   g) LP and
 TD
 text in one column  - this column can contain quite a lot of text, e.g1900
 words. The other fields are unwanted

 If there were 150 individual records, when formatted this would be a 7
 column by 150 row dataset.

 I was advised to:

 1. read in the file using readLines giving a character vector one element
 per input line.
 2. convert that to lines of the form:
 id op text
 where each such line is a field and multiline fields have been collapsed
 into a single line of text. This step involves
 detailed processing and you could do it in a loop or you could try a
 vectorized approach. A vectorized approach
 will likely involve using
 3. the lines created above could be converted to a data frame with three
 columns and
 4. reshape used to create a wide data frame.
 5. then write it out using write.csv.

 I have got as far as being able to read the text into R  - I am unsure if
 the warning is a problem. I am however, not at all sure what I need to do
 next.

 Any assistance is much appreciated,


 Bob

 (A) syntax

 mht - scan(what=c:\\cm-mht1.txt).
 readLines(c:\\cm-mht1.txt,n = -1)

 [8376] (c) 2006 Dow Jones Reuters Business Interactive LLC (trading as
 Factiva). All 
 [8377] rights reserved.
 
 Warning message:
 incomplete final line found by readLines on 'c:\cm-mht1.txt'

 (B) sample data

   HD Was Charles Manson temporarily insane when he led a wild killing
   rampage in the US in 1969?
   BY By Deborah Cassrels.
   WC 1834 words
   PD 23 June 2001
   SN Courier Mail
   SC COUMAI
   PG 30
   LA English
   CY (c) 2001 Queensland Newspapers Pty Ltd

   LP Was Charles Manson temporarily insane when he led a wild killing
   rampage in the US in 1969? Clearly he was mad and bad. But would
   Queensland have placed him before its Mental Health Tribunal, found
 him of
   unsound mind at the time of his crimes, institutionalised him and
   treated his illness? WHY is Queensland the only jurisdiction in
 the
   Commonwealth with a Mental Health Tribunal which establishes if an
 accused
   is fit to face trial or of unsound mind at the time of an alleged
 offence?
   Why is mental incompetence not determined in an adversarial court by
 a
   jury? Under the Mental Health Act 1974, the tribunal, a statutory
 body
   operating since 1985, comprises three-yearly appointments of a
 Supreme
   Court judge and two assisting psychiatrists, whose advice does not
 have to
   be accepted. The judge alone constitutes the tribunal, an
 inquisitorial
   process conducted in the Supreme Court in Brisbane.

   TD Victims or family are not notified of hearings or allowed to
 submit
   victim impact statements. They are prohibited from talking to the
 media
   until 28 days after the decision. And when patients return to the
   community there is no 

Re: [R] Iterated Data Input/Output with Random Forests

2006-07-23 Thread jim holtman
For your last question of the 'paste', try

tempdata[paste('pheno', i, sep='')]



On 7/23/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:

 Hi,

 I am currently writing code to input a few thousand files, run them
 through the
 Random Forests package, and then output corresponding results.

 When I use the code below:

 zz-textConnection(ex.lm.out, w)
 sink(zz)
 tempData-read.delim(paste(allSnps,1,Phenotype.phn
 ,sep=),header=TRUE,sep=,,quote=\,dec=.)
 tempData[[1]]-factor(tempData[[1]])
 tempData.rf
 -randomForest(tempData[[1]]~.,data=tempData,importance=TRUE,proximity=TRUE,outscale=TRUE,replace=TRUE)
 tempData.rf
 zz-file(paste(ex,1,.data,sep=), w)
 cat(ex.lm.out, sep=\n, file=zz)
 sink()
 close(zz)

 I am able to successfully input and output for one file. However, if I try
 to
 use a for loop or a while statement e.g.

 for(i in 1:2)
 {
 zz-textConnection(ex.lm.out, w)
 sink(zz)
 tempData-read.delim(paste(allSnps,i,Phenotype.phn
 ,sep=),header=TRUE,sep=,,quote=\,dec=.)
 tempData[[1]]-factor(tempData[[1]])
 tempData.rf
 -randomForest(tempData[[1]]~.,data=tempData,importance=TRUE,proximity=TRUE,outscale=TRUE,replace=TRUE)
 tempData.rf
 zz-file(paste(ex,i,.data,sep=), w)
 cat(ex.lm.out, sep=\n, file=zz)
 sink()
 close(zz)
 }

 I get no error statements but the output is blank. Without the for
 statement,
 setting i-1 works fine.

 One other related question is that right now I am trying to gett the loop
 to
 work by using the paste() function with a variable (i). However, the paste
 function returns a string.

 If I wanted to make a loop of

 tempData$pheno1
 tempData$pheno2
 tempData$pheno3
 ...

 the paste() function will not work. Is there some other method to achieve
 the
 desired effect? Thank you in advance! I have only been working with R for
 a few
 days so please bear with my lack of knowledge!

 John Zhou

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Iterated Data Input/Output with Random Forests

2006-07-23 Thread johnzhou
While tempData[paste('pheno',i,sep='')] does give the appropriate column, when I
try to use that expression in the factor function:

factor(tempData[paste('pheno',i,sep='')]) I get

Error in sort(unique.default(x), na.last=TRUE : 'x' must be atomic.


Quoting jim holtman [EMAIL PROTECTED]:

 For your last question of the 'paste', try

 tempdata[paste('pheno', i, sep='')]



 On 7/23/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 
  Hi,
 
  I am currently writing code to input a few thousand files, run them
  through the
  Random Forests package, and then output corresponding results.
 
  When I use the code below:
 
  zz-textConnection(ex.lm.out, w)
  sink(zz)
  tempData-read.delim(paste(allSnps,1,Phenotype.phn
  ,sep=),header=TRUE,sep=,,quote=\,dec=.)
  tempData[[1]]-factor(tempData[[1]])
  tempData.rf
 

-randomForest(tempData[[1]]~.,data=tempData,importance=TRUE,proximity=TRUE,outscale=TRUE,replace=TRUE)
  tempData.rf
  zz-file(paste(ex,1,.data,sep=), w)
  cat(ex.lm.out, sep=\n, file=zz)
  sink()
  close(zz)
 
  I am able to successfully input and output for one file. However, if I try
  to
  use a for loop or a while statement e.g.
 
  for(i in 1:2)
  {
  zz-textConnection(ex.lm.out, w)
  sink(zz)
  tempData-read.delim(paste(allSnps,i,Phenotype.phn
  ,sep=),header=TRUE,sep=,,quote=\,dec=.)
  tempData[[1]]-factor(tempData[[1]])
  tempData.rf
 

-randomForest(tempData[[1]]~.,data=tempData,importance=TRUE,proximity=TRUE,outscale=TRUE,replace=TRUE)
  tempData.rf
  zz-file(paste(ex,i,.data,sep=), w)
  cat(ex.lm.out, sep=\n, file=zz)
  sink()
  close(zz)
  }
 
  I get no error statements but the output is blank. Without the for
  statement,
  setting i-1 works fine.
 
  One other related question is that right now I am trying to gett the loop
  to
  work by using the paste() function with a variable (i). However, the paste
  function returns a string.
 
  If I wanted to make a loop of
 
  tempData$pheno1
  tempData$pheno2
  tempData$pheno3
  ...
 
  the paste() function will not work. Is there some other method to achieve
  the
  desired effect? Thank you in advance! I have only been working with R for
  a few
  days so please bear with my lack of knowledge!
 
  John Zhou
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 



 --
 Jim Holtman
 Cincinnati, OH
 +1 513 646 9390

 What is the problem you are trying to solve?


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Matthew Nash has left the SGDP

2006-07-23 Thread M . Nash




I will be out of the office starting Wed 02/01/2006 and will not return
until Sat 02/07/2060.

I have left the SGDP. I am contactable at [EMAIL PROTECTED]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] constructing a dataframe from a database of newspaper articles

2006-07-23 Thread jim holtman
Gabor indicated that a line was corrupted.  Here is what it should be:

 Here is the line again:

first - gsub(^\\s*(\\w+).*, \\1, x)  # get the first word

It as suppose to be '\\w+' and '\\1'

For some reason, my browser must have substituted the extreneous references.


On 7/23/06, Bob Green [EMAIL PROTECTED] wrote:



 I am hoping for some assistance with formatting a large text file which
 consists of a series of individual records. Each record includes specific
 labels/field names (a sample of 1 record (one of the longest ones) is
 below  - at end of post. What I want to do is reformat the data, so that
 each individual record becomes a row (some cells will have a lot of text).
 For example, the column variables I want are (a) HD  in one column
 (b)BY in one column (c) WC data in one column,  (d) PD data in one
 column, (e) SC data in one column (f) PG data in one column   g) LP and
 TD
 text in one column  - this column can contain quite a lot of text, e.g1900
 words. The other fields are unwanted

 If there were 150 individual records, when formatted this would be a 7
 column by 150 row dataset.

 I was advised to:

 1. read in the file using readLines giving a character vector one element
 per input line.
 2. convert that to lines of the form:
 id op text
 where each such line is a field and multiline fields have been collapsed
 into a single line of text. This step involves
 detailed processing and you could do it in a loop or you could try a
 vectorized approach. A vectorized approach
 will likely involve using
 3. the lines created above could be converted to a data frame with three
 columns and
 4. reshape used to create a wide data frame.
 5. then write it out using write.csv.

 I have got as far as being able to read the text into R  - I am unsure if
 the warning is a problem. I am however, not at all sure what I need to do
 next.

 Any assistance is much appreciated,


 Bob

 (A) syntax

 mht - scan(what=c:\\cm-mht1.txt).
 readLines(c:\\cm-mht1.txt,n = -1)

 [8376] (c) 2006 Dow Jones Reuters Business Interactive LLC (trading as
 Factiva). All 
 [8377] rights reserved.
 
 Warning message:
 incomplete final line found by readLines on 'c:\cm-mht1.txt'

 (B) sample data

   HD Was Charles Manson temporarily insane when he led a wild killing
   rampage in the US in 1969?
   BY By Deborah Cassrels.
   WC 1834 words
   PD 23 June 2001
   SN Courier Mail
   SC COUMAI
   PG 30
   LA English
   CY (c) 2001 Queensland Newspapers Pty Ltd

   LP Was Charles Manson temporarily insane when he led a wild killing
   rampage in the US in 1969? Clearly he was mad and bad. But would
   Queensland have placed him before its Mental Health Tribunal, found
 him of
   unsound mind at the time of his crimes, institutionalised him and
   treated his illness? WHY is Queensland the only jurisdiction in
 the
   Commonwealth with a Mental Health Tribunal which establishes if an
 accused
   is fit to face trial or of unsound mind at the time of an alleged
 offence?
   Why is mental incompetence not determined in an adversarial court by
 a
   jury? Under the Mental Health Act 1974, the tribunal, a statutory
 body
   operating since 1985, comprises three-yearly appointments of a
 Supreme
   Court judge and two assisting psychiatrists, whose advice does not
 have to
   be accepted. The judge alone constitutes the tribunal, an
 inquisitorial
   process conducted in the Supreme Court in Brisbane.

   TD Victims or family are not notified of hearings or allowed to
 submit
   victim impact statements. They are prohibited from talking to the
 media
   until 28 days after the decision. And when patients return to the
   community there is no requirement for neighbours or victims to be
   notified. Is this legislation enlightened or are we just suckers,
 falling
   for time and money-saving strategies? The tribunal has earned a
 reputation
   as progressive, humane and economical among some judges who have
 presided
   over it. The inaugural chair, former Supreme Court judge Angelo
 Vasta QC,
   thinks the tribunal system is enlightened and it saves an
 enormous
   amount of expenditure. He points to the humane side of treating the
 ill
   in a secure hospital rather than punishing them for offences but is
   uncomfortable with borderline cases. Whether people are mad or bad
 ought
   to be established by a very thorough investigation.
   The associated Patient Review Tribunals (of which there are five)
 consist
   of three to six members, including the chair who is a legal officer,
 a
   medical practitioner and a mental health professional. A
 psychiatrist is
   not required. The other three have no specific qualifications and
 can
   include former patients. The tribunals operate in closed hearings
 and
   patients of unsound mind or unfit for trial are 

Re: [R] diff, POSIXct, POSIXlt, POSIXt

2006-07-23 Thread Spencer Graves
Hi, Gabor:

  For my 0.02 euros, I vote to make length(POSIXlt) = length of the 
series, NOT the length of the list = 9 always.  I've stubbed my toe on 
that one many times.  I always fix it by converting first to POSIXct.

  The key question is what would users naively expect to get from 
length(a_time_series)?  I think most people not familiar with the 
POSIXlt format would expect the number of observations.  After 
struggling for a while with code that did not perform as expected, I 
finally traced one such problem to the fact that length(a_time_series)= 
9 if class(a_time_series) = POSIXlt, independent of the number of 
observations.

  How much code would break if this was changed?  Each use of 
length(POSIXlt_object) would have to be replaced by something like 
length(as.list(POSIXlt_object)).  However, since length(POSIXlt_object) 
is always 9, I doubt if length(POSIXlt_object) occurs very often.

  Currently to get the number of observations in a POSIXlt_object, you 
might find constructs like length(POSIXlt_object[[1]]).  Or you will 
find people converting the POSIXlt to POSIXct and then computing the 
length.  In either case, changing length(POSIXlt_object) to the number 
of observations would not break any of this code.

  Thanks for raising this question.
  Spencer Graves

Gabor Grothendieck wrote:
 Moving this to r-devel.
 
 Looking at the diff.POSIXt code we see the problem is that it takes the
 length of the input using length which is wrong since in the case
 of POSIXlt the length is always 9 (or maybe length should be
 defined differently for POSIXlt?).  Try this which gives the same
 problem:
 
dts[-1] - dts[-length(dts)]
 
 We get a more sensible answer if length is calculated correctly:
 
   dts[-1] - dts[-length(dts[[1]])]
 
 
 On 7/23/06, Patrick Giraudoux [EMAIL PROTECTED] wrote:
 Try converting to POSIXct:
 That's what I did finally (see the previous e-mail).

 dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006)

 dts - as.POSIXct(strptime(dts, %d/%m/%Y))
 diff(dts)

 Time differences of  91,  92, 183,  91,  92, 182,  91,  92, 182 days

 What is the problem you are trying to solve?
 Actually, I don't understand why using diff() and POSIXct provides the
 expected result and not using POSIXlt. Both POSIXct and POSIXlt are of
 class POSIXt. The doc of diff() stresses that 'diff' is a generic
 function with a default method and ones for classes 'ts', 'POSIXt'
 and 'Date'. It does not mention differences between POSIXct and POSIXlt.

 Moreover, using diff() with POSIXlt has provided (wrong) numbers... and
 not an error. This may be difficult to detect sometimes along programme
 lines. Must one keep in mind that diff() is reliably applicable only on
 POSIXct? In this case, should not it bve mentionned in the documentation?

 All the best,

 Patrick







 jim holtman a écrit :
 Try converting to POSIXct:

 str(dts)
 'POSIXlt', format: chr [1:10] 2003-04-15 2003-07-15 2003-10-15
 2004-04-15 2004-07-15 2004-10-15 2005-04-15 ...
 dts
  [1] 2003-04-15 2003-07-15 2003-10-15 2004-04-15 2004-07-15
 2004-10-15 2005-04-15 2005-07-15
  [9] 2005-10-15 2006-04-15
 dts - as.POSIXct(dts)
 dts
  [1] 2003-04-15 EDT 2003-07-15 EDT 2003-10-15 EDT 2004-04-15
 EDT 2004-07-15 EDT 2004-10-15 EDT
  [7] 2005-04-15 EDT 2005-07-15 EDT 2005-10-15 EDT 2006-04-15 EDT
 diff(dts)
 Time differences of  91,  92, 183,  91,  92, 182,  91,  92, 182 days


 On 7/23/06, *Patrick Giraudoux* [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED] wrote:

 Dear Listers,

 I have encountered a strange problem using diff() and POSIXt:

 
 dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006)

 dts - strptime(dts, %d/%m/%Y)
 class(dts)

 [1] POSIXt  POSIXlt

 diff(dts)

 Time differences of  7862400,  7948800, 15811200,  7862400,  7948800,
 15724800,  7862400,  7948800,0 secs

 In this case the result is not the one expected: expressed in seconds
 and not in days, and the difference between the two last dates is
 not 0.

 Now, if one use a vector of 9 dates only (whatever the date removed),
 things come well:

 diff(dts[-1])

 Time differences of  92, 183,  91,  92, 182,  91,  92, 182 days

 Also if one contrains dts to POSIXct

 
 dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006)

 dts - as.POSIXct(strptime(dts, %d/%m/%Y))
 diff(dts)

 Time differences of  91,  92, 183,  91,  92, 182,  91,  92, 182 days

 Any rational in that?

 Patrick

 __
 R-help@stat.math.ethz.ch mailto:R-help@stat.math.ethz.ch mailing
 list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and 

Re: [R] dotchart with log scale?

2006-07-23 Thread Johannes Hüsing
 Dear all,
 I would like to draw a dot chart on a log scale.
 What is the syntax for this? A barchart may use
 log=x, but trying this with dotchart() leads
 to an error message.

ok, I extended dotchart() with a log option.
The main reason I am sticking to dotchart() is
adjusting of axes with respect to long text
labels.

This is the definition I am using now:


mydotchart -
  function (x, labels = NULL, groups = NULL, gdata = NULL, cex = par(cex),
pch = 21, gpch = 21, bg = par(bg), color = par(fg), gcolor
= par(fg),
lcolor = gray, xlim = range(x[is.finite(x)]), main = NULL,
###   new option
xlab = NULL, ylab = NULL, log=FALSE, ...) {
opar - par(mai, cex, yaxs)
on.exit(par(opar))
par(cex = cex, yaxs = i)
n - length(x)
if (is.matrix(x)) {
  if (is.null(labels))
labels - rownames(x)
  if (is.null(labels))
labels - as.character(1:nrow(x))
  labels - rep(labels, length.out = n)
  if (is.null(groups))
groups - col(x, as.factor = TRUE)
  glabels - levels(groups)
}
else {
  if (is.null(labels))
labels - names(x)
  glabels - if (!is.null(groups))
levels(groups)
}
plot.new()
linch - if (!is.null(labels))
  max(strwidth(labels, inch), na.rm = TRUE)
else 0
if (is.null(glabels)) {
  ginch - 0
  goffset - 0
}
else {
  ginch - max(strwidth(glabels, inch), na.rm = TRUE)
  goffset - 0.4
}
if (!(is.null(labels)  is.null(glabels))) {
  nmai - par(mai)
  nmai[2] - nmai[4] + max(linch + goffset, ginch) + 0.1
  par(mai = nmai)
}
if (is.null(groups)) {
  o - 1:n
  y - o
  ylim - c(0, n + 1)
}
else {
  o - sort.list(as.numeric(groups), decreasing = TRUE)
  x - x[o]
  groups - groups[o]
  color - rep(color, length.out = length(groups))[o]
  lcolor - rep(lcolor, length.out = length(groups))[o]
  offset - cumsum(c(0, diff(as.numeric(groups)) != 0))
  y - 1:n + 2 * offset
  ylim - range(0, y + 2)
}
###   instead of log=
plot.window(xlim = xlim, ylim = ylim, log = ifelse(log, x, ))
lheight - par(csi)
if (!is.null(labels)) {
  linch - max(strwidth(labels, inch), na.rm = TRUE)
  loffset - (linch + 0.1)/lheight
  labs - labels[o]
  mtext(labs, side = 2, line = loffset, at = y, adj = 0,
col = color, las = 2, cex = cex, ...)
}
abline(h = y, lty = dotted, col = lcolor)
points(x, y, pch = pch, col = color, bg = bg)
if (!is.null(groups)) {
  gpos - rev(cumsum(rev(tapply(groups, groups, length)) +
 2) - 1)
  ginch - max(strwidth(glabels, inch), na.rm = TRUE)
  goffset - (max(linch + 0.2, ginch, na.rm = TRUE) + 0.1)/lheight
  mtext(glabels, side = 2, line = goffset, at = gpos, adj = 0,
col = gcolor, las = 2, cex = cex, ...)
  if (!is.null(gdata)) {
abline(h = gpos, lty = dotted)
points(gdata, gpos, pch = gpch, col = gcolor, bg = bg,
   ...)
  }
}
axis(1)
box()
title(main = main, xlab = xlab, ylab = ylab, ...)
invisible()
}

Many thanks for your attention.

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Using DDF With Data Import into R

2006-07-23 Thread Data Meister
I am a brand-new R user.  The first project I am looking at  requires the use 
of a US Census data file.  This data file is very  large, and comes with an 
associated data definition file (DDF). I would  like to know if I can use the 
DDF file in association with the import  of the data file (as opposed to 
creating a very large scan  statement). This was do-able in SAS, but I have 
been unable to find an  equivalent capability in R.  Thanks very much for the 
help!   Kat---
  

-


[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Re constructing a dataframe from a database of newspaper articles

2006-07-23 Thread David Duffy
 From: Bob Green [EMAIL PROTECTED]

 I am hoping for some assistance with formatting a large text file which
 consists of a series of individual records. Each record includes specific
 labels/field names (a sample of 1 record (one of the longest ones) is
 below  - at end of post. What I want to do is reformat the data, so that
 each individual record becomes a row (some cells will have a lot of text).
 For example, the column variables I want are (a) HD  in one column
 (b)BY in one column (c) WC data in one column,  (d) PD data in one
 column, (e) SC data in one column (f) PG data in one column   g) LP and TD
 text in one column  - this column can contain quite a lot of text, e.g 1900
 words. The other fields are unwanted

 If there were 150 individual records, when formatted this would be a 7
 column by 150 row dataset.

Most transparently,

txt - readLines(c:\\cm-mht1.txt)
no_of_records - length(grep(^HD,txt)
res - matrix(nr=no_of_records, nc=8)
idx - 0
for (i in 1:length(txt)) {
  if (regexpr(^HD, txt[i])!=-1) idx - idx+1

  if (regexpr(^HD, txt[i])!=-1) res[idx, 1] - txt[i]
  if (regexpr(^BY, txt[i])!=-1) res[idx, 2] - txt[i]
  ...
  if (regexpr(^TD, txt[i])!=-1) res[idx, 8] - txt[i]
}
res[,7] - paste(res[,7], res[,8], sep=; )
res - res[,-8]


| David Duffy (MBBS PhD) ,-_|\
| email: [EMAIL PROTECTED]  ph: INT+61+7+3362-0217 fax: -0101  / *
| Epidemiology Unit, Queensland Institute of Medical Research   \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia  GPG 4D0B994A v

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Re constructing a dataframe from a database of newspaper articles

2006-07-23 Thread David Duffy
On Mon, 24 Jul 2006, David Duffy wrote:

  From: Bob Green [EMAIL PROTECTED]
 
  I am hoping for some assistance with formatting a large text file which
  consists of a series of individual records. Each record includes specific
  labels/field names (a sample of 1 record (one of the longest ones) is
  below  - at end of post. What I want to do is reformat the data, so that
  each individual record becomes a row (some cells will have a lot of text).
  For example, the column variables I want are (a) HD  in one column
  (b)BY in one column (c) WC data in one column,  (d) PD data in one
  column, (e) SC data in one column (f) PG data in one column   g) LP and TD
  text in one column  - this column can contain quite a lot of text, e.g 1900
  words. The other fields are unwanted
 
  If there were 150 individual records, when formatted this would be a 7
  column by 150 row dataset.

Oops, I forgot to add the bit about multiple lines per field...

txt - readLines(c:\\cm-mht1.txt)
txt - gsub([ ]+, ,txt)
txt - gsub(^[ ]+,,txt)
no_of_records - length(grep(^HD,txt)
res - matrix(, nr=no_of_records, nc=7)
idx - 0
typ - 0
for (i in 1:length(txt)) {
  if (regexpr(^HD, txt[i])!=-1) {
idx - idx+1
typ - 1
  }else if (regexpr(^BY, txt[i])!=-1) {
typ - 2
  }
  ...
  } else if (regexpr((^LP)|(^TD), txt[i])!=-1) {
typ - 7
  } else if (regexpr(^[A-Z][A-Z], txt[i])!=-1) {
typ - 0
  }
  if (typ0) {
res[idx,typ] - paste(res[idx,typ], txt[i], sep= )
  }
}


| David Duffy (MBBS PhD) ,-_|\
| email: [EMAIL PROTECTED]  ph: INT+61+7+3362-0217 fax: -0101  / *
| Epidemiology Unit, Queensland Institute of Medical Research   \_,-._/
| 300 Herston Rd, Brisbane, Queensland 4029, Australia  GPG 4D0B994A v

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Saving R objects

2006-07-23 Thread Nair, Murlidharan T
I am trying to find the best way to save the follwoing object I am creating
 
library(multcomp)
data(recovery)
Dcirec-simint(minutes~blanket, data=recovery, conf.level=0.9, 
alternative=less)
 
I am probably not doing it the most efficient way I think. 
Here is what I am doing 
 
a-print(Dcirec)
write(a,file=mult_test.dat, append=T)
or
save(Dcirec, file=mult.out)
 
Which is the best way to save it, so that I can access its contents outside the 
R environment?

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Saving R objects

2006-07-23 Thread Gabor Grothendieck
It depends on what information you want to save and how the
program on the other end needs it.

For the save version I would at least use ascii = TRUE to get it
in a more readable fashion.

Look at

file.show(mult_test.dat)
file.show(mult.out)  # but use ascii=TRUE on your save statement.

to see what you are getting.

Other possibilities are to use R2HTML or XML packages to output
to HTML or XML.   You might want to handle the various components
of Dcirec separately.  To see what's inside:

   unclass(Dcirec)
   str(Dcirec)
   dput(Dcirec)

and use cat statements to output the components in the format of
your choice possibly in conjunction with sprintf.


On 7/23/06, Nair, Murlidharan T [EMAIL PROTECTED] wrote:
 I am trying to find the best way to save the follwoing object I am creating

 library(multcomp)
 data(recovery)
 Dcirec-simint(minutes~blanket, data=recovery, conf.level=0.9, 
 alternative=less)

 I am probably not doing it the most efficient way I think.
 Here is what I am doing

 a-print(Dcirec)
 write(a,file=mult_test.dat, append=T)
 or
 save(Dcirec, file=mult.out)

 Which is the best way to save it, so that I can access its contents outside 
 the R environment?

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to obtain 95th percentile of a normal distribution of a continuous variable

2006-07-23 Thread jenny tan
Hi,

How do I get R to output the 95% cutoff from a distribution of a continous 
variable?
summary() only displays a few statistics

Thanks!

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to obtain 95th percentile of a normal distribution of a continuous variable

2006-07-23 Thread Wensui Liu
?quantile

On 7/23/06, jenny tan [EMAIL PROTECTED] wrote:

 Hi,

 How do I get R to output the 95% cutoff from a distribution of a continous
 variable?
 summary() only displays a few statistics

 Thanks!

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
WenSui Liu
(http://spaces.msn.com/statcompute/blog)
Senior Decision Support Analyst
Health Policy and Clinical Effectiveness
Cincinnati Children Hospital Medical Center

[[alternative HTML version deleted]]

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to obtain 95th percentile of a normal distribution of a continuous variable

2006-07-23 Thread Simon Blomberg
?quantile

jenny tan wrote:
 Hi,

 How do I get R to output the 95% cutoff from a distribution of a continous 
 variable?
 summary() only displays a few statistics

 Thanks!

 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

   


-- 
Simon Blomberg, B.Sc.(Hons.), Ph.D, M.App.Stat.
Centre for Resource and Environmental Studies
The Australian National University
Canberra ACT 0200
Australia
T: +61 2 6125 7800 email: Simon.Blomberg_at_anu.edu.au
F: +61 2 6125 0757
CRICOS Provider # 00120C

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] diff, POSIXct, POSIXlt, POSIXt

2006-07-23 Thread Gabor Grothendieck
Just one more comment. It is possible to define length.POSIXlt yourself
in which case diff works with POSIXlt objects.

 length.POSIXlt - function(x) length(x[[1]])
 diff(dts)
Time differences of  91,  92, 183,  91,  92, 182,  91,  92, 182 days


On 7/23/06, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 Moving this to r-devel.

 Looking at the diff.POSIXt code we see the problem is that it takes the
 length of the input using length which is wrong since in the case
 of POSIXlt the length is always 9 (or maybe length should be
 defined differently for POSIXlt?).  Try this which gives the same
 problem:

   dts[-1] - dts[-length(dts)]

 We get a more sensible answer if length is calculated correctly:

  dts[-1] - dts[-length(dts[[1]])]


 On 7/23/06, Patrick Giraudoux [EMAIL PROTECTED] wrote:
   Try converting to POSIXct:
  That's what I did finally (see the previous e-mail).
 
  dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006)
 
  dts - as.POSIXct(strptime(dts, %d/%m/%Y))
  diff(dts)
 
  Time differences of  91,  92, 183,  91,  92, 182,  91,  92, 182 days
 
   What is the problem you are trying to solve?
  Actually, I don't understand why using diff() and POSIXct provides the
  expected result and not using POSIXlt. Both POSIXct and POSIXlt are of
  class POSIXt. The doc of diff() stresses that 'diff' is a generic
  function with a default method and ones for classes 'ts', 'POSIXt'
  and 'Date'. It does not mention differences between POSIXct and POSIXlt.
 
  Moreover, using diff() with POSIXlt has provided (wrong) numbers... and
  not an error. This may be difficult to detect sometimes along programme
  lines. Must one keep in mind that diff() is reliably applicable only on
  POSIXct? In this case, should not it bve mentionned in the documentation?
 
  All the best,
 
  Patrick
 
 
 
 
 
 
 
  jim holtman a écrit :
   Try converting to POSIXct:
  
str(dts)
   'POSIXlt', format: chr [1:10] 2003-04-15 2003-07-15 2003-10-15
   2004-04-15 2004-07-15 2004-10-15 2005-04-15 ...
dts
[1] 2003-04-15 2003-07-15 2003-10-15 2004-04-15 2004-07-15
   2004-10-15 2005-04-15 2005-07-15
[9] 2005-10-15 2006-04-15
dts - as.POSIXct(dts)
dts
[1] 2003-04-15 EDT 2003-07-15 EDT 2003-10-15 EDT 2004-04-15
   EDT 2004-07-15 EDT 2004-10-15 EDT
[7] 2005-04-15 EDT 2005-07-15 EDT 2005-10-15 EDT 2006-04-15 EDT
diff(dts)
   Time differences of  91,  92, 183,  91,  92, 182,  91,  92, 182 days
   
  
  
  
   On 7/23/06, *Patrick Giraudoux* [EMAIL PROTECTED]
   mailto:[EMAIL PROTECTED] wrote:
  
   Dear Listers,
  
   I have encountered a strange problem using diff() and POSIXt:
  
   
   dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006)
  
   dts - strptime(dts, %d/%m/%Y)
   class(dts)
  
   [1] POSIXt  POSIXlt
  
   diff(dts)
  
   Time differences of  7862400,  7948800, 15811200,  7862400,  7948800,
   15724800,  7862400,  7948800,0 secs
  
   In this case the result is not the one expected: expressed in seconds
   and not in days, and the difference between the two last dates is
   not 0.
  
   Now, if one use a vector of 9 dates only (whatever the date removed),
   things come well:
  
   diff(dts[-1])
  
   Time differences of  92, 183,  91,  92, 182,  91,  92, 182 days
  
   Also if one contrains dts to POSIXct
  
   
   dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006)
  
   dts - as.POSIXct(strptime(dts, %d/%m/%Y))
   diff(dts)
  
   Time differences of  91,  92, 183,  91,  92, 182,  91,  92, 182 days
  
   Any rational in that?
  
   Patrick
  
   __
   R-help@stat.math.ethz.ch mailto:R-help@stat.math.ethz.ch mailing
   list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.
  
  
  
  
   --
   Jim Holtman
   Cincinnati, OH
   +1 513 646 9390
  
   What is the problem you are trying to solve?
 
 [[alternative HTML version deleted]]
 
 
 
  __
  R-help@stat.math.ethz.ch mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] diff, POSIXct, POSIXlt, POSIXt

2006-07-23 Thread Patrick Giraudoux
OK. Got it. Thanks a lot everybody.

I however feel that although the problem can be technically handled by 
any user aware of it, it should be fixed in R in a more general way, 
either by modifying the diff() code so that it really handles any kind 
of POSIXt (POSIXlt and POSIXct) with the same final result (as claimed 
in the documentation), or mentioning explicitely in de documentation 
that diff(), as it is written currently, can handle correctly only 
POSIXct (and not any POSIXt or POSIXlt).

There is a kind of danger of wrong output for users (even those reading 
the documentation) if things are left as they are, and I have detected 
this problem just by chance.

All the best,

Patrick

Gabor Grothendieck a écrit :
 Just one more comment. It is possible to define length.POSIXlt yourself
 in which case diff works with POSIXlt objects.

 length.POSIXlt - function(x) length(x[[1]])
 diff(dts)
 Time differences of  91,  92, 183,  91,  92, 182,  91,  92, 182 days


 On 7/23/06, Gabor Grothendieck [EMAIL PROTECTED] wrote:
 Moving this to r-devel.

 Looking at the diff.POSIXt code we see the problem is that it takes the
 length of the input using length which is wrong since in the case
 of POSIXlt the length is always 9 (or maybe length should be
 defined differently for POSIXlt?).  Try this which gives the same
 problem:

   dts[-1] - dts[-length(dts)]

 We get a more sensible answer if length is calculated correctly:

  dts[-1] - dts[-length(dts[[1]])]


 On 7/23/06, Patrick Giraudoux [EMAIL PROTECTED] wrote:
   Try converting to POSIXct:
  That's what I did finally (see the previous e-mail).
 
  
 dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006)
  

 
  dts - as.POSIXct(strptime(dts, %d/%m/%Y))
  diff(dts)
 
  Time differences of  91,  92, 183,  91,  92, 182,  91,  92, 182 days
 
   What is the problem you are trying to solve?
  Actually, I don't understand why using diff() and POSIXct provides the
  expected result and not using POSIXlt. Both POSIXct and POSIXlt are of
  class POSIXt. The doc of diff() stresses that 'diff' is a generic
  function with a default method and ones for classes 'ts', 'POSIXt'
  and 'Date'. It does not mention differences between POSIXct and 
 POSIXlt.
 
  Moreover, using diff() with POSIXlt has provided (wrong) numbers... 
 and
  not an error. This may be difficult to detect sometimes along 
 programme
  lines. Must one keep in mind that diff() is reliably applicable 
 only on
  POSIXct? In this case, should not it bve mentionned in the 
 documentation?
 
  All the best,
 
  Patrick
 
 
 
 
 
 
 
  jim holtman a écrit :
   Try converting to POSIXct:
  
str(dts)
   'POSIXlt', format: chr [1:10] 2003-04-15 2003-07-15 2003-10-15
   2004-04-15 2004-07-15 2004-10-15 2005-04-15 ...
dts
[1] 2003-04-15 2003-07-15 2003-10-15 2004-04-15 
 2004-07-15
   2004-10-15 2005-04-15 2005-07-15
[9] 2005-10-15 2006-04-15
dts - as.POSIXct(dts)
dts
[1] 2003-04-15 EDT 2003-07-15 EDT 2003-10-15 EDT 2004-04-15
   EDT 2004-07-15 EDT 2004-10-15 EDT
[7] 2005-04-15 EDT 2005-07-15 EDT 2005-10-15 EDT 
 2006-04-15 EDT
diff(dts)
   Time differences of  91,  92, 183,  91,  92, 182,  91,  92, 182 days
   
  
  
  
   On 7/23/06, *Patrick Giraudoux* [EMAIL PROTECTED]
   mailto:[EMAIL PROTECTED] wrote:
  
   Dear Listers,
  
   I have encountered a strange problem using diff() and POSIXt:
  
   
 dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006)
  

  
   dts - strptime(dts, %d/%m/%Y)
   class(dts)
  
   [1] POSIXt  POSIXlt
  
   diff(dts)
  
   Time differences of  7862400,  7948800, 15811200,  7862400,  
 7948800,
   15724800,  7862400,  7948800,0 secs
  
   In this case the result is not the one expected: expressed in 
 seconds
   and not in days, and the difference between the two last 
 dates is
   not 0.
  
   Now, if one use a vector of 9 dates only (whatever the date 
 removed),
   things come well:
  
   diff(dts[-1])
  
   Time differences of  92, 183,  91,  92, 182,  91,  92, 182 days
  
   Also if one contrains dts to POSIXct
  
   
 dts-c(15/4/2003,15/7/2003,15/10/2003,15/04/2004,15/07/2004,15/10/2004,15/4/2005,15/07/2005,15/10/2005,15/4/2006)
  

  
   dts - as.POSIXct(strptime(dts, %d/%m/%Y))
   diff(dts)
  
   Time differences of  91,  92, 183,  91,  92, 182,  91,  92, 
 182 days
  
   Any rational in that?
  
   Patrick
  
   __
   R-help@stat.math.ethz.ch mailto:R-help@stat.math.ethz.ch 
 mailing
   list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible 
 code.
  
  
  
  
   --
   Jim Holtman
   Cincinnati, OH
   +1 513