[R] Inverting a scale(X)

2010-07-03 Thread Godfrey van der Linden
G'day, All.

I have been trying to trackdown a problem in my R analysis script. I perform a 
scale() operation on a matrix then do further work.

Is there any way of inverting the scale() such that
sX - scale(X)
Xprime - inv.scale(x); # does inv.scale exist?

resulting in Xprime_{ij} == X_{ij} where Xprime_{ij} \in R

There must be some way of doing it but I'm such a newb that I haven't been able 
to find it.

Thanks

Godfrey

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Inverting a scale(X)

2010-07-03 Thread Peter Ehlers


On 2010-07-03 0:05, Godfrey van der Linden wrote:

G'day, All.

I have been trying to trackdown a problem in my R analysis script. I perform a 
scale() operation on a matrix then do further work.

Is there any way of inverting the scale() such that
 sX- scale(X)
 Xprime- inv.scale(x);  # does inv.scale exist?

resulting in Xprime_{ij} == X_{ij} where Xprime_{ij} \in R

There must be some way of doing it but I'm such a newb that I haven't been able 
to find it.

Thanks

Godfrey



If your sX hasn't lost the scaled:center and
scaled:scale attributes that it got from the
scale() operation, then you can just reverse
the scaling procedure using those. Multiply
columns by the scale attribute, then add the
center attribute. Something like:

 MN - attr(sx, scaled:center)
 SD - attr(sx, scaled:scale)
 Xprime - t(apply(sx, 1, function(x){x * SD + MN}))

If the attributes have been lost by your further
work, then I'm afraid you're out of luck.

  -Peter Ehlers

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Assigning entries to categories

2010-07-03 Thread LogLord

Thanks for your help!
You are right it is not one-to-one assigned that would be indeed very
easy... its more like assigning 1000 entries to 60 categories...

Unfortunately, the ?match and ?merge did not help me a lot... I am a newbie
to such programming stuff in R.

It would be great if you could help me again to set this up.
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Assigning-entries-to-categories-tp2272697p2277140.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Best way to compute a sum

2010-07-03 Thread Roger Deangelis


  Although it does not apply to your series and is impractical,  it seems to
me that the most accurate algorithm might be to add all the rational numbers
whose sum and components can be represented  without error in binary first,
ie 2.5 + .5 or 1/16 + 1/16 + 1/8.

  You could also get very clever and investigate a sum that should have an
exact binary representation when the individual components do not, ie .1 +
.2 + .2 = .5 and correct the sum.

Roger
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Best-way-to-compute-a-sum-tp2267566p2277096.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] conditional dataframe search and find

2010-07-03 Thread oscar linares
 After some processing...
   ct.df- data.frame(time,conc)
   ct.df
   gives
  time   conc
   1 0 164.495456
   2 1 133.671185
   3 2 108.622975
   4 3  88.268468
   5 4  71.728126
   6 5  58.287225
   7 6  47.364971
   8 7  38.489403
   9 8  31.276998
   109  25.416103
   11   10  20.653462
   12   11  16.783276
   13   12  13.638313
   14   13  11.082674
   15   14   9.005927
   16   15   7.318336

- Ignored:
   17   16   5.946977
   18   17   4.832592
   19   18   3.927028
   20   19   3.191155
   21   20   2.593175
   22   21   2.107248
   23   22   1.712378
   24   23   1.391501
   25   24   1.130752

   I need to find the time when conc  25. I can read it off the table but I
am

   looking for a programmatic solution.

   Thanks.
   Oscar


-- 
Oscar A. Linares, MD
Clinical Assistant Professor of Medicine
Department of Medicine
University of Toledo College of Medicine
Toledo, Ohio 43606-3390

Attending Physician
The Detroit Medical Center (DMC)
Harper University Hospital
Wayne State University School of Medicine
Detroit, Michigan 48201

Director
Translational Pharmacokinetics  Pharmacogenomics Unit,
La Plaisance Bay, Bolles Harbor, MI

Medical Director
Monroe Pain Center
Monroe, MI 48162
(http:www.monroepaincenter.com)

Phone (734) 240-8400
Cell (734) 637-7997
Fax (734) 243-6254

oalinare...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Error in solve.default

2010-07-03 Thread Dmitrij Kudriavcev
Hello

I use c++ program, what call R-project to solve matrix multiplications. Some
times, I get an error in R:

Error in solve.default(V, R) :
  system is computationally singular: reciprocal condition number =
2.20828e-19
Execution halted

After that, the program crash. The code, what i execute, is:

R.assign(arrMeans, string(R));
R.assign(arrCov, string(V));

SEXP ans;

int iRet = R.parseEval(solve(V, R), ans);

Where R - vector of n size and V - matrix of n,n.

Can anyone tell me, what this error means? I have check my matrix and didn't
found this number. Is it because matrix too big?

How can I, at last, avoid program crashing in R (it crash inside parseEval
function)?

WBR,
Dima

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help on bar chart

2010-07-03 Thread ppcrystal

Hey guys,

This is the bar chart that I am working on:

library(lattice);
data - data.frame(
X1 = c(2300, 1300, 1300, 450),
X2 = c(2110, 2220, 1100, 660),
Y = factor(c(sample1, sample2, sample3, sample4))
);
barchart(
Y ~ X1 + X2,
data,
stack = TRUE,
horiz = TRUE,
lwd = 1.5,
xlab = expression(bold(Sample size)),
col = colors()[c(24,1)],
xlim = c(0,5000),
xat = seq(0,5000,1000)
);

I wanted to make a bar chart that has hatching lines inside the bar: with
sample 2 and 4 having vertical lines and sample 1 and 3 having horizontal
lines, like the following (I kind of photoshopped the image to demonstrate
what I wanted it to look like):

http://r.789695.n4.nabble.com/file/n2277107/test.png 

Anyone knows how I can add hatching to the bar charts?

Thanks very much for your time!!!
-- 
View this message in context: 
http://r.789695.n4.nabble.com/help-on-bar-chart-tp2277107p2277107.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] conditional dataframe search and find

2010-07-03 Thread Allan Engelhardt

 ct.df[ct.df$conc  25,time]
[1] 10 11 12 13 14 15
 ct.df[ct.df$conc  25,time][1]
 df[df$conc  25,time][1]
[1] 10

See also help(order) if conc is not ordered.

On 02/07/10 22:50, oscar linares wrote:

   time   conc
1 0 164.495456
2 1 133.671185
3 2 108.622975
4 3  88.268468
5 4  71.728126
6 5  58.287225
7 6  47.364971
8 7  38.489403
9 8  31.276998
109  25.416103
11   10  20.653462
12   11  16.783276
13   12  13.638313
14   13  11.082674
15   14   9.005927
16   15   7.318336



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Double Integration

2010-07-03 Thread Christos Argyropoulos

There used to be an adapt package with an integrate function (I inverted 
the function/package name by mistake) in CRAN but it has been removed.
Anyone knows why?

Christos

 CC: argch...@hotmail.com; sarah_sanche...@yahoo.com; r-help@r-project.org
 From: dwinsem...@comcast.net
 To: rvarad...@jhmi.edu
 Subject: Re: [R] Double Integration
 Date: Fri, 2 Jul 2010 20:40:00 -0400
 
 And an adapt() in fCopulae.
 
 --  
 David.
 On Jul 2, 2010, at 7:06 PM, Ravi Varadhan wrote:
 
  There is no package called `integrate', but there is a function called
  `adaptIntegrate' in the cubature package.
 
  Ravi.
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org 
  ] On
  Behalf Of Christos Argyropoulos
  Sent: Friday, July 02, 2010 8:41 AM
  To: sarah_sanche...@yahoo.com; r-help@r-project.org
  Subject: Re: [R] Double Integration
 
 
  Function adapt in package integrate maybe?
 
  Date: Thu, 1 Jul 2010 05:30:25 -0700
  From: sarah_sanche...@yahoo.com
  To: r-help@r-project.org
  Subject: [R] Double Integration
 
  Dear R helpers
 
  I am working on the Bi-variate Normal distribution probabilities. I  
  need
  to double integrate the following function (actually simplified form  
  of
  bivariate normal distribution)
 
  f(x, y) = exp [ - 0.549451 * (x^2 + y^2 - 0.6 * x * y) ]
 
  where 2.696  x  3.54 and -1.51  y  1.98
 
  I need to solve something like
 
 
  INTEGRATE (2.696 to 3.54) dx INTEGRATE [(-1.51 to 1.98)] f(x, y) dy
 
  I have referred to stats::integrate but it deals with only one  
  variable.
 
  This example appears in Internal Credit Risk Model by Michael Ong  
  (page
  no. 160).
 
  Kindly guide.
 
  Regards
 
  Sarah
 
 
 
 
 
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
  
  _
  Hotmail: Trusted email with powerful SPAM protection.
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 David Winsemius, MD
 West Hartford, CT
 
  
_
Hotmail: Free, trusted and rich email service.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] merging plot labels in a lattice plot

2010-07-03 Thread Dennis Murphy
Hi:

On Fri, Jul 2, 2010 at 8:57 AM, Rajarshi Guha rajarshi.g...@gmail.comwrote:

 Hi, I have a lattice lot conditioned on two variables. Example code is:

 library(lattice)
 x - data.frame(d=runif(100),
f1=sample(c('yes', 'no'),100,replace=TRUE),
f2=c(rep('Run1',30),rep('Run2',30),rep('Run3',40)))
 histogram(~d | f1 + f2, x)

 In the plot, for a given value of f2, there are two panels, one for
 'n' and one for 'yes'. But above each panel I get the value of f2.

 What I'd like to be able to do is to have the value of f2 span the two
 panels (ie merge the green rows and use a single label).


One alternative to Peter's suggestion is to use the strip.combined()
function in the
Lattice book (p. 197) which merges the two strip labels into one:

strip.combined - function(which.given, which.panel, factor.levels, ...) {
  if (which.given == 1) {
  panel.rect(0, 0, 1, 1, col = grey90, border = 1)
  panel.text(x = 0, y = 0.5, pos = 4, lab =
factor.levels[which.panel[which.given]])
}
  if (which.given == 2) {
  panel.text(x = 1, y = 0.5, pos = 2, lab =
factor.levels[which.panel[which.given]])
} }

and then call the histogram function as follows:

histogram(~ d | f1 + f2, data = x, strip = strip.combined)
or
histogram(~ d | f1 + f2, data = x, strip = strip.combined, as.table = TRUE)

if you prefer the Run* values to go from top to bottom instead.

If you'd prefer a different layout for the strip labels but of this general
form,
you could write your own panel function modeled on the strip.combined()
function given above.


HTH,
Dennis

Any pointers as to how I could acheive this would be appreciated

 Thanks,

 --
 Rajarshi Guha
 NIH Chemical Genomics Center

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Change the frequency of a ts?

2010-07-03 Thread Nicholas R Frazier
I'm trying to convert a column of a table into a ts object.  The data is
monthly, so I want the ts frequency to be 12.

I did this ...

 filings.ts = as.ts(Filings.100K, frequency=12)
 filings.ts

Time Series:
Start = 1
End = 311
Frequency = 1
  [1] 246.9336 305.6789 ... ...

 tsp(filings.ts)
[1]   1 311   1
 tsp(filings.ts) - c(1,311,12)
Error in attr(x, tsp) - value :
  invalid time series parameters specified

What am I doing wrong here?  I can't seem to be able to change the frequency
from 1 to 12.  Thanks!

Nick Frazier

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Change the frequency of a ts?

2010-07-03 Thread Achim Zeileis

On Sat, 3 Jul 2010, Nicholas R Frazier wrote:


I'm trying to convert a column of a table into a ts object.  The data is
monthly, so I want the ts frequency to be 12.

I did this ...


filings.ts = as.ts(Filings.100K, frequency=12)


Use the constructor function ts(), not the coercion function as.ts(). The 
latter does not have a frequency argument. See ?ts.



filings.ts


Time Series:
Start = 1
End = 311
Frequency = 1
 [1] 246.9336 305.6789 ... ...


tsp(filings.ts)

[1]   1 311   1

tsp(filings.ts) - c(1,311,12)

Error in attr(x, tsp) - value :
 invalid time series parameters specified

What am I doing wrong here?


Not reading the documentation?

c(1, 311, 12) are not valid time series properties because it would imply 
that your series as length 311 * 12 + 1, which is not the case.

Z


I can't seem to be able to change the frequency
from 1 to 12.  Thanks!

Nick Frazier

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Odp: Problem with aggregating data across time points

2010-07-03 Thread Chris Beeley
Thanks for all your help, that has worked a treat. To answer your questions, I 
want to include the zero rows because I am going to analyse using mixed models 
(with dummies for day of week, location etc.) and I thought it was necessary to 
include a complete list of time variables, but now I'm wondering if it is 
necessary. As for the empty rows, the database is generated automatically by 
the incidents reporting system and is a bit of a mess, so I want to make sure 
that the code doesn't stumble over such things. 

Thanks again all!



On 2 Jul 2010, at 17:14, David Winsemius dwinsem...@comcast.net wrote:

 
 On Jul 2, 2010, at 11:55 AM, Petr PIKAL wrote:
 
 Hi
 
 did you try aggregate?
 
 aggregate(data[, 5:8],list(data$Date), sum, na.rm=T)
  Group.1 verbal self.harm violence_objects violence
 1   0 000
 2 01/04/07 251539
 3 02/04/07 24 68   13
 4 03/04/07 17130   10
 aggregate(data[, 5:8],list(data$Location,data$Date), sum, na.rm=T)
 
 That address his A) request:
 
 Here is the application of aggregate to his B) request (I think):
 
 # Not e that Date is not of class Date but is rather a factor that includes 
  as a level.
 
  aggregate(series[, 5:8],list(series$Date, series$Location), sum, na.rm=T)
Group.1 Group.2 verbal self.harm violence_objects violence
 10 000
 2 A  0 000
 3  01/04/07   A  7 103
 4  02/04/07   A  8 201
 5  03/04/07   A  0 002
 6 B  0 000
 7  01/04/07   B  3 201
 8  02/04/07   B  4 200
 9  03/04/07   B  4 003
 10C  0 000
 11 01/04/07   C  4 232
 12 02/04/07   C  0 042
 13 03/04/07   C  1 105
 14D  0 000
 15 01/04/07   D  7 603
 16 02/04/07   D  0 009
 17 03/04/07   D  41100
 18E  0 000
 19 01/04/07   E  4 300
 20 02/04/07   E  4 040
 21 03/04/07   E  8 100
 22F  0 000
 23 01/04/07   F  0 100
 24 02/04/07   F  8 201
 
 So perhaps an output with less extraneous input would be better:
 
  with(series[series$Date != , ],
aggregate(list(verbal=verbal, self.harm=self.harm, 
 viol_obj=violence_objects, violence=violence),
  list(Date, Location),
  sum, na.rm=T)
   )
 
Group.1 Group.2 verbal self.harm viol_obj violence
 1  01/04/07   A  7 103
 2  02/04/07   A  8 201
 3  03/04/07   A  0 002
 4  01/04/07   B  3 201
 5  02/04/07   B  4 200
 6  03/04/07   B  4 003
 7  01/04/07   C  4 232
 8  02/04/07   C  0 042
 9  03/04/07   C  1 105
 10 01/04/07   D  7 603
 11 02/04/07   D  0 009
 12 03/04/07   D  41100
 13 01/04/07   E  4 300
 14 02/04/07   E  4 040
 15 03/04/07   E  8 100
 16 01/04/07   F  0 100
 17 02/04/07   F  8 201
 
 BTW, why do you have empty rows?
 
 Regards
 Petr
 
 
 
 
 Hello-
 
 I have a dataset which basically looks like this:
 
 Location   Sex   Date  Time   VerbalSelf harm
 Violence_objects   Violence
 A 1  1-4-2007   1800  3 0
   1   3
 A 1  1-4-2007   1230  21
  2   4
 D 2  2-4-2007   1100  04
  0   0
 ...
 
 I've put a dput of the first section of the data at the end of this
 email. Basically I have these data for several days across all of the
 dates, so 2 or more on 1-4-2007, 2 or more on 2-4-2007, and so on
 

Re: [R] Change the frequency of a ts?

2010-07-03 Thread Stefan Grosse
Am 03.07.2010 13:55, schrieb Nicholas R Frazier:
 I'm trying to convert a column of a table into a ts object.  The data is
 monthly, so I want the ts frequency to be 12.
 
 I did this ...
 
 filings.ts = as.ts(Filings.100K, frequency=12)

try:

filings.ts - ts(Filings.100K, frequency=12)

example:
test-runif(312)
test.ts-ts(test, frequency=12)
tsp(test.ts)
plot(test.ts)

Oh I am late, Achim was faster...

Cheers
Stefan

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Non-exported data sets?

2010-07-03 Thread Hadley Wickham
 Sure.  The code uses objects() to find the exported objects in the
 package, so I guess the offending object will be there.  You can check
 for yourself by loading the package and calling objects() on the package
 environment.

So I guess my question then is how do data sets and namespaces
interact?  All data objects are automatically exposed and cannot be
controlled through a namespace?

Following the hint Two exceptions are allowed: if the R subdirectory
contains a file sysdata.rda (a saved image of R objects) this will be
lazy-loaded into the name space/package environment – this is intended
for system datasets that are not intended to be user-accessible via
data.  I also tried using sysdata.rda, but the contents still appear
to be exported.

Hadley


-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] ggplot qplot bar removing bars when truncating scale

2010-07-03 Thread Hadley Wickham
This is possible in ggplot2, but it's an not appropriate use of a bar
chart - because length is used to convey value, chopping the bottoms
of the bars of will give a misleading impression of the data.
Instead, use a dot plot:

data$Q - unlist(lapply(data$Q, function(x) paste(strwrap(x, 20),
collapse = \n)))

qplot(mean, Q, data = data, colour = variable, xlab = NULL, ylab = NULL)

Hadley

On Wed, Jun 30, 2010 at 10:12 AM, ml692787 matthew.lester@gmail.com wrote:

 I'm having problems with this example, it is posted with reproduceable code
 below, both with the normal 0-6 scale and the desired 3-6 scale (with bars
 removed). How can I get the graph to have the desired 3-6 scale without
 removing the bars. Thanks!

 #Data
 mean=as.numeric(c(5.117647059,5,4.947368421,4.85,4.6875,4.545454545,4.473684211,4.470588235,4.428571429,4.08333,3.421052632,3.235294118))
 data=as.data.frame(cbind(mean,c(Achievement,Achievement,Achievement,Impact,Achievement,Achievement,Achievement,Impact,Impact,Impact,Impact,Impact),c(Update
 knowledge and skills,Meet requirements for current position,Discover new
 job opportunities,Discover new job opportunities,Transition to a new
 job,Meet requirements for certificaiton,Personal enrichment,Update
 knowledge and skills,Meet requirements for current position,Meet
 requirements for certificaiton,Personal enrichment,Transition to a new
 job)))
 colnames(data)=c(mean,variable,Q)
 data[,1]=mean

 #Plot
 p=qplot(data=data,data$Q,data$mean,fill=data$variable,geom=bar,stat=identity,position=dodge,binwidth=2,ylab=NULL,xlab=NULL,width=.75)

 #With 0-6 Scale
 p + scale_x_discrete(expand=c(0,0)) +
 scale_y_continuous(limits=c(0,7),breaks=seq(from=0,to=6,by=.5),expand=c(0,0))
 +
 coord_flip() +
 scale_fill_manual(values=c(darkmagenta,lightgoldenrod1)) +
                opts(
                        panel.background = theme_rect(colour = NA),
                        panel.background = theme_blank(),
                        panel.grid.minor = theme_blank(),
                        axis.title.x= theme_blank(),
                        axis.title.y= theme_blank(),
                        axis.text.y=theme_text(size=12,hjust=1),
                        legend.text=theme_text(size=14)
                        )

 #With 3-6 Scale (Bars Deleted)
 p + scale_x_discrete(expand=c(0,0)) +
 scale_y_continuous(limits=c(3,6),breaks=seq(from=3,to=6,by=.5),expand=c(0,0))
 +
 coord_flip() +
 scale_fill_manual(values=c(darkmagenta,lightgoldenrod1)) +
                opts(
                        panel.background = theme_rect(colour = NA),
                        panel.background = theme_blank(),
                        panel.grid.minor = theme_blank(),
                        axis.title.x= theme_blank(),
                        axis.title.y= theme_blank(),
                        axis.text.y=theme_text(size=12,hjust=1),
                        legend.text=theme_text(size=14)
                        )

 There is probably an option I'm missing or maybe my data should be set up
 differently, any help would be much appreciated!!
 --
 View this message in context: 
 http://r.789695.n4.nabble.com/ggplot-qplot-bar-removing-bars-when-truncating-scale-tp2272735p2272735.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Double Integration

2010-07-03 Thread Bogaso Christofer
Hi Ravi, your suggestion helped me as well a lot. If I look into that
function, I see this function is calling another function  :

.Call(doCubature, as.integer(fDim), body(f.check), 
as.double(lowerLimit), as.double(upperLimit), as.integer(maxEval), 
as.double(absError), as.double(tol), new.env(), PACKAGE =
cubature)

How I can see the interior of this doCubature?

Thanks,

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Ravi Varadhan
Sent: 03 July 2010 04:36
To: 'Christos Argyropoulos'; sarah_sanche...@yahoo.com; r-help@r-project.org
Subject: Re: [R] Double Integration

There is no package called `integrate', but there is a function called
`adaptIntegrate' in the cubature package.

Ravi.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
Behalf Of Christos Argyropoulos
Sent: Friday, July 02, 2010 8:41 AM
To: sarah_sanche...@yahoo.com; r-help@r-project.org
Subject: Re: [R] Double Integration


Function adapt in package integrate maybe?

 Date: Thu, 1 Jul 2010 05:30:25 -0700
 From: sarah_sanche...@yahoo.com
 To: r-help@r-project.org
 Subject: [R] Double Integration
 
 Dear R helpers
 
 I am working on the Bi-variate Normal distribution probabilities. I 
 need
to double integrate the following function (actually simplified form of
bivariate normal distribution)
 
 f(x, y) = exp [ - 0.549451 * (x^2 + y^2 - 0.6 * x * y) ]
 
 where 2.696  x  3.54 and -1.51  y  1.98
 
 I need to solve something like
 
 
 INTEGRATE (2.696 to 3.54) dx INTEGRATE [(-1.51 to 1.98)] f(x, y) dy
 
 I have referred to stats::integrate but it deals with only one variable. 
 
 This example appears in Internal Credit Risk Model by Michael Ong 
 (page
no. 160).
 
 Kindly guide.
 
 Regards
 
 Sarah
 
 
 
 
 
   
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
  
_
Hotmail: Trusted email with powerful SPAM protection.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Double Integration

2010-07-03 Thread Prof Brian Ripley

On Sat, 3 Jul 2010, Christos Argyropoulos wrote:

There used to be an adapt package with an integrate function (I 
inverted the function/package name by mistake) in CRAN but it has 
been removed. Anyone knows why?


It lacked a valid licence.  It wasn't actually removed, rather 
archived: see http://cran.r-project.org/src/contrib/Archive/adapt/




Christos


CC: argch...@hotmail.com; sarah_sanche...@yahoo.com; r-help@r-project.org
From: dwinsem...@comcast.net
To: rvarad...@jhmi.edu
Subject: Re: [R] Double Integration
Date: Fri, 2 Jul 2010 20:40:00 -0400

And an adapt() in fCopulae.

--
David.
On Jul 2, 2010, at 7:06 PM, Ravi Varadhan wrote:


There is no package called `integrate', but there is a function called
`adaptIntegrate' in the cubature package.

Ravi.

-Original Message-
From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org
] On
Behalf Of Christos Argyropoulos
Sent: Friday, July 02, 2010 8:41 AM
To: sarah_sanche...@yahoo.com; r-help@r-project.org
Subject: Re: [R] Double Integration


Function adapt in package integrate maybe?


Date: Thu, 1 Jul 2010 05:30:25 -0700
From: sarah_sanche...@yahoo.com
To: r-help@r-project.org
Subject: [R] Double Integration

Dear R helpers

I am working on the Bi-variate Normal distribution probabilities. I
need

to double integrate the following function (actually simplified form
of
bivariate normal distribution)


f(x, y) = exp [ - 0.549451 * (x^2 + y^2 - 0.6 * x * y) ]

where 2.696  x  3.54 and -1.51  y  1.98

I need to solve something like


INTEGRATE (2.696 to 3.54) dx INTEGRATE [(-1.51 to 1.98)] f(x, y) dy

I have referred to stats::integrate but it deals with only one
variable.

This example appears in Internal Credit Risk Model by Michael Ong
(page

no. 160).


Kindly guide.

Regards

Sarah






[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


_
Hotmail: Trusted email with powerful SPAM protection.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT



_
Hotmail: Free, trusted and rich email service.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression - glm() - example in Dalgaard's book ISwR

2010-07-03 Thread Juliet Hannah
You may find both of Alan Agresti's books on categorcial data analysis
useful. Try googling both books and then
search the word grouped within each book. Agresti refers to the
difference you describe as
grouped versus ungrouped data. The likelihoods differ and all
summaries based on the likelihood will
also differ.

On Fri, Jul 2, 2010 at 11:33 PM, Paulo Barata pbar...@infolink.com.br wrote:

 Dear R-list members,

 I would like to pose a question about the use and results
 of the glm() function for logistic regression calculations.

 The question is based on an example provided on p. 229
 in P. Dalgaard, Introductory Statistics with R, 2nd. edition,
 Springer, 2008. By means of this example, I was trying to
 practice the different ways of entering data in glm().

 In his book, Dalgaard provides data from a case-based study
 about hypertension summarized in the form of a table. He shows
 two ways of entering the response (dependent) variable data
 in glm(): (1) as a matrix of successes/failures (diseased/
 healthy); (2) as the proportion of people diseased for each
 combination of independent variable's categories.

 I tried to enter the response variable in glm() in a third
 way: by reconstructing, from the table, the original data
 in a case-based format, that is, a data frame in which
 each row shows the data for one person. In this situation,
 the response variable would be coded as a numeric 0/1 vector,
 0=failure, 1=success, and so it would be entered in glm() as
 a numeric 0/1 vector.

 The program below presents the calculations for each of the
 three ways of entering data - the first and second methods
 were just copied from Dalgaard's book.

 The three methods produce the same results with regard
 to the estimated coefficients, when the output is seen
 with five decimals (although regression 3 seems to have
 produced slightly different std.errors).

 My main question is: Why are the residual deviance, its
 degrees of freedom and the AIC produced by regression 3
 completely different when compared to those produced by
 regressions 1 and 2 (which seem to produce equal results)?
 It seems that the degrees of freedom in regressions 1
 and 2 are based on the size (number of rows) of table d
 (see the output of the program below), but this table is
 just a way of summarizing the data. The degrees of
 freedom in regressions 1 and 2 are not based on the
 actual number of cases (people) examined, which is n=433.

 I understand that no matter the way of entering the data
 in glm(), we are always analyzing the same data, which
 are those presented in table format on Dalgaard's page
 229 (these data are in data.frame d in the program below).
 So I understand that the three ways of entering data
 in glm() should produce the same results.

 Secondarily, why are the std.errors in regression 3 slightly
 different from those calculated in regressions 1 and 2?

 I am using R version 2.11.1 on Windows XP.

 Thank you very much.

 Paulo Barata

 ##== begin =

 ## data in: P. Dalgaard, Introductory Statistics with R,
 ## 2nd. edition, Springer, 2008
 ## logistic regression - example in Dalgaard's Section 13.2,
 ## page 229

 rm(list=ls())

 ## data provided on Dalgaard's page 229:
 no.yes - c(No,Yes)
 smoking - gl(2,1,8,no.yes)
 obesity - gl(2,2,8,no.yes)
 snoring - gl(2,4,8,no.yes)
 n.tot - c(60,17,8,2,187,85,51,23)
 n.hyp - c(5,2,1,0,35,13,15,8)

 d - data.frame(smoking,obesity,snoring,n.tot,n.hyp)
 ## d is the data to be analyzed, in table format
 ## d is the first table on Dalgaard's page 229
 ## n.tot = total number of cases
 ## n.hyp = people with hypertension
 d

 ## regression 1: Dalgaard's page 230
 ## response variable entered in glm() as a matrix of
 ## successes/failures
 hyp.tbl - cbind(n.hyp,n.tot-n.hyp)
 regression1 - glm(hyp.tbl~smoking+obesity+snoring,
                   family=binomial(logit))

 ## regression 2: Dalgaard's page 230
 ## response variable entered in glm() as proportions
 prop.hyp - n.hyp/n.tot
 regression2 - glm(prop.hyp~smoking+obesity+snoring,
                   weights=n.tot,family=binomial(logit))

 ## regression 3 (well below): data entered in glm()
 ## by means of 'reconstructed' variables
 ## variables with names beginning with 'r' are
 ## 'reconstructed' from data in data.frame d.
 ## The objective is to reconstruct the original
 ## data from which the table on Dalgaard's page 229
 ## has been produced

 rsmoking - c(rep('No',d[1,4]),rep('Yes',d[2,4]),
              rep('No',d[3,4]),rep('Yes',d[4,4]),
              rep('No',d[5,4]),rep('Yes',d[6,4]),
              rep('No',d[7,4]),rep('Yes',d[8,4]))
 rsmoking - factor(rsmoking)
 length(rsmoking)  # just a check

 robesity - c(rep('No', d[1,4]),rep('No', d[2,4]),
              rep('Yes',d[3,4]),rep('Yes',d[4,4]),
              rep('No', d[5,4]),rep('No', d[6,4]),
              rep('Yes',d[7,4]),rep('Yes',d[8,4]))
 robesity - factor(robesity)
 length(robesity)  # just a check

 

Re: [R] logistic regression - glm() - example in Dalgaard's book ISwR

2010-07-03 Thread David Winsemius


On Jul 2, 2010, at 11:33 PM, Paulo Barata wrote:



Dear R-list members,

I would like to pose a question about the use and results
of the glm() function for logistic regression calculations.

The question is based on an example provided on p. 229
in P. Dalgaard, Introductory Statistics with R, 2nd. edition,
Springer, 2008. By means of this example, I was trying to
practice the different ways of entering data in glm().

In his book, Dalgaard provides data from a case-based study
about hypertension summarized in the form of a table. He shows
two ways of entering the response (dependent) variable data
in glm(): (1) as a matrix of successes/failures (diseased/
healthy); (2) as the proportion of people diseased for each
combination of independent variable's categories.

I tried to enter the response variable in glm() in a third
way: by reconstructing, from the table, the original data
in a case-based format, that is, a data frame in which
each row shows the data for one person. In this situation,
the response variable would be coded as a numeric 0/1 vector,
0=failure, 1=success, and so it would be entered in glm() as
a numeric 0/1 vector.

The program below presents the calculations for each of the
three ways of entering data - the first and second methods
were just copied from Dalgaard's book.

The three methods produce the same results with regard
to the estimated coefficients, when the output is seen
with five decimals (although regression 3 seems to have
produced slightly different std.errors).

My main question is: Why are the residual deviance, its
degrees of freedom and the AIC produced by regression 3
completely different when compared to those produced by
regressions 1 and 2 (which seem to produce equal results)?
It seems that the degrees of freedom in regressions 1
and 2 are based on the size (number of rows) of table d
(see the output of the program below), but this table is
just a way of summarizing the data. The degrees of
freedom in regressions 1 and 2 are not based on the
actual number of cases (people) examined, which is n=433.


I first encountered this phenomenon 25 years ago when using GLIM. The  
answer from my statistical betters was that the deviance is actually  
only established up to a constant and that it is only differences in  
deviance that can be properly interpreted. The same situation exists  
with indefinite integrals in calculus.


I understand that no matter the way of entering the data
in glm(), we are always analyzing the same data, which
are those presented in table format on Dalgaard's page
229 (these data are in data.frame d in the program below).
So I understand that the three ways of entering data
in glm() should produce the same results.


The differences between equivalent nested models should remain the  
same (up to numerical accuracy).


 411.42  on 432  degrees of freedom
-398.92  on 429
-
12.5   3

 14.1259  on 7  degrees of freedom
-1.6184   on 4
--
12.50753



Secondarily, why are the std.errors in regression 3 slightly
different from those calculated in regressions 1 and 2?


You mean the differences 4 places to the right of the decimal???



I am using R version 2.11.1 on Windows XP.

Thank you very much.

Paulo Barata

##== begin =

## data in: P. Dalgaard, Introductory Statistics with R,
## 2nd. edition, Springer, 2008
## logistic regression - example in Dalgaard's Section 13.2,
## page 229

rm(list=ls())


Personally, I rather annoyed when people post this particular line  
without commenting it out. It is basically saying that your code is  
terribly much more important than whatever else might be in a user's  
workspace.




## data provided on Dalgaard's page 229:
no.yes - c(No,Yes)
smoking - gl(2,1,8,no.yes)
obesity - gl(2,2,8,no.yes)
snoring - gl(2,4,8,no.yes)
n.tot - c(60,17,8,2,187,85,51,23)
n.hyp - c(5,2,1,0,35,13,15,8)

d - data.frame(smoking,obesity,snoring,n.tot,n.hyp)
## d is the data to be analyzed, in table format
## d is the first table on Dalgaard's page 229
## n.tot = total number of cases
## n.hyp = people with hypertension
d

## regression 1: Dalgaard's page 230
## response variable entered in glm() as a matrix of
## successes/failures
hyp.tbl - cbind(n.hyp,n.tot-n.hyp)
regression1 - glm(hyp.tbl~smoking+obesity+snoring,
  family=binomial(logit))

## regression 2: Dalgaard's page 230
## response variable entered in glm() as proportions
prop.hyp - n.hyp/n.tot
regression2 - glm(prop.hyp~smoking+obesity+snoring,
  weights=n.tot,family=binomial(logit))

## regression 3 (well below): data entered in glm()
## by means of 'reconstructed' variables
## variables with names beginning with 'r' are
## 'reconstructed' from data in data.frame d.
## The objective is to reconstruct the original
## data from which the table on Dalgaard's page 229
## has been produced

rsmoking - c(rep('No',d[1,4]),rep('Yes',d[2,4]),
 

Re: [R] XML and RCurl: problem with encoding (htmlTreeParse)

2010-07-03 Thread Duncan Temple Lang
Hi Ryusuke

 I would use the encoding parameter of htmlParse() and 
 download and parse the content in one operation:

 htmlParse(http://home.sina.com;, encoding = UTF-8)

 If you want to use getURL() in RCurl, use the .encoding parameter

  You didn't tell us the output of Sys.getlocale()
  or how your terminal/console is configured, so the above
  may vary under your configuration, but works on various
  machines for me with different settings.

D.


Ryusuke Kenji wrote:
 
 Hi All,
 
 First method:-
 library(XML)
 
 theurl - http://home.sina.com;
 download.file(theurl, tmp.html)
 
 txt - readLines(tmp.html)
 
 txt - htmlTreeParse(txt, error=function(...){}, useInternalNodes = 
 TRUE)
 
 g - xpathSApply(txt, //p, function(x) xmlValue(x))
 
 head(grep( , g, value=T))
 
 
 [1] ?? | ?? | ENGLISH   
 ??? ???
 [3] ??? ?? ??(???)  
 ?? 
 [5]  ???? ??! 
 ? ??! !  
 
 
 
 SecondMethod:-
 library(RCurl)
 
 theurl - getURL(http://home.sina.com,encoding='GB2312')
 
 Encoding(theurl)
 
 [1]unknown
 
 txt - readLines(con=textConnection(theurl),encoding='GB2312')
 txt[5:10] #show the lines which occurred encoding problem.
 [1] meta http-equiv=\Content-Type\ content=\text/html; charset=utf-8\ 
 /
 [2] titleSINA.com US ? -??/title
 [3] meta name=\Keywords\ content=\, ???, 
 ???, ??,, SINA, US, News, Chinese, 
 Asia\ /
 [4] meta name=\Description\ 
 content=\???, 
 ???24, , 
 , ??, , ?BBS, 
 ???.\ /
 [5] 
   
   

 [6] link rel=\stylesheet\ type=\text/css\ 
 href=\http://ui.sina.com/assets/css/style_home.css\; /
 
 i am trying to read data from a Chinese language website, but the Chinese 
 characters always unreadable, may I know if any good idea to cope such 
 encoding problem in RCurl and XML?
 
 
 Regards,
 Ryusuke
 
 _
 
 
   [[alternative HTML version deleted]]
 

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
There are men who can think no deeper than a fact - Voltaire


Duncan Temple Langdun...@wald.ucdavis.edu
Department of Statistics  work:  (530) 752-4782
4210 Mathematical Sciences Bldg.  fax:   (530) 752-7099
One Shields Ave.
University of California at Davis
Davis, CA 95616, USA





pgpYi9CYtba6H.pgp
Description: PGP signature
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] logistic regression - glm() - example in Dalgaard's book ISwR

2010-07-03 Thread Marc Schwartz
On Jul 3, 2010, at 9:00 AM, David Winsemius wrote:

 
 On Jul 2, 2010, at 11:33 PM, Paulo Barata wrote:
 
 
 Dear R-list members,
 
 I would like to pose a question about the use and results
 of the glm() function for logistic regression calculations.
 
 The question is based on an example provided on p. 229
 in P. Dalgaard, Introductory Statistics with R, 2nd. edition,
 Springer, 2008. By means of this example, I was trying to
 practice the different ways of entering data in glm().
 
 In his book, Dalgaard provides data from a case-based study
 about hypertension summarized in the form of a table. He shows
 two ways of entering the response (dependent) variable data
 in glm(): (1) as a matrix of successes/failures (diseased/
 healthy); (2) as the proportion of people diseased for each
 combination of independent variable's categories.
 
 I tried to enter the response variable in glm() in a third
 way: by reconstructing, from the table, the original data
 in a case-based format, that is, a data frame in which
 each row shows the data for one person. In this situation,
 the response variable would be coded as a numeric 0/1 vector,
 0=failure, 1=success, and so it would be entered in glm() as
 a numeric 0/1 vector.
 
 The program below presents the calculations for each of the
 three ways of entering data - the first and second methods
 were just copied from Dalgaard's book.
 
 The three methods produce the same results with regard
 to the estimated coefficients, when the output is seen
 with five decimals (although regression 3 seems to have
 produced slightly different std.errors).
 
 My main question is: Why are the residual deviance, its
 degrees of freedom and the AIC produced by regression 3
 completely different when compared to those produced by
 regressions 1 and 2 (which seem to produce equal results)?
 It seems that the degrees of freedom in regressions 1
 and 2 are based on the size (number of rows) of table d
 (see the output of the program below), but this table is
 just a way of summarizing the data. The degrees of
 freedom in regressions 1 and 2 are not based on the
 actual number of cases (people) examined, which is n=433.
 
 I first encountered this phenomenon 25 years ago when using GLIM. The answer 
 from my statistical betters was that the deviance is actually only 
 established up to a constant and that it is only differences in deviance that 
 can be properly interpreted. The same situation exists with indefinite 
 integrals in calculus.
 
 I understand that no matter the way of entering the data
 in glm(), we are always analyzing the same data, which
 are those presented in table format on Dalgaard's page
 229 (these data are in data.frame d in the program below).
 So I understand that the three ways of entering data
 in glm() should produce the same results.
 
 The differences between equivalent nested models should remain the same (up 
 to numerical accuracy).
 
 411.42  on 432  degrees of freedom
 -398.92  on 429
 -
 12.5   3
 
 14.1259  on 7  degrees of freedom
 -1.6184   on 4
 --
 12.50753
 
 
 Secondarily, why are the std.errors in regression 3 slightly
 different from those calculated in regressions 1 and 2?
 
 You mean the differences 4 places to the right of the decimal???
 
 
 I am using R version 2.11.1 on Windows XP.
 
 Thank you very much.
 
 Paulo Barata
 
 ##== begin =
 
 ## data in: P. Dalgaard, Introductory Statistics with R,
 ## 2nd. edition, Springer, 2008
 ## logistic regression - example in Dalgaard's Section 13.2,
 ## page 229
 
 rm(list=ls())
 
 Personally, I rather annoyed when people post this particular line without 
 commenting it out. It is basically saying that your code is terribly much 
 more important than whatever else might be in a user's workspace.
 
 
 ## data provided on Dalgaard's page 229:
 no.yes - c(No,Yes)
 smoking - gl(2,1,8,no.yes)
 obesity - gl(2,2,8,no.yes)
 snoring - gl(2,4,8,no.yes)
 n.tot - c(60,17,8,2,187,85,51,23)
 n.hyp - c(5,2,1,0,35,13,15,8)
 
 d - data.frame(smoking,obesity,snoring,n.tot,n.hyp)
 ## d is the data to be analyzed, in table format
 ## d is the first table on Dalgaard's page 229
 ## n.tot = total number of cases
 ## n.hyp = people with hypertension
 d
 
 ## regression 1: Dalgaard's page 230
 ## response variable entered in glm() as a matrix of
 ## successes/failures
 hyp.tbl - cbind(n.hyp,n.tot-n.hyp)
 regression1 - glm(hyp.tbl~smoking+obesity+snoring,
  family=binomial(logit))
 
 ## regression 2: Dalgaard's page 230
 ## response variable entered in glm() as proportions
 prop.hyp - n.hyp/n.tot
 regression2 - glm(prop.hyp~smoking+obesity+snoring,
  weights=n.tot,family=binomial(logit))
 
 ## regression 3 (well below): data entered in glm()
 ## by means of 'reconstructed' variables
 ## variables with names beginning with 'r' are
 ## 'reconstructed' from data in data.frame d.
 ## The objective is to 

Re: [R] XML and RCurl: problem with encoding (htmlTreeParse)

2010-07-03 Thread Ryusuke Kenji

Hi Prof,

Thank you for your reply. Sorry that I missed out the below information.
Sys.getlocale()
[1] LC_COLLATE=English_United States.1252;LC_CTYPE=English_United 
States.1252;LC_MONETARY=English_United 
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

I have just noticed that traditional chinese character cause the encoding 
problem, while chinese simplified works fine.

library(RCurl)
theurl - getURL(http://home.sina.com,encoding='utf8')
#Encoding(theurl)
#[1]latin1
txt - readLines(con=textConnection(theurl),encoding='utf8')
write.table(file='D:/fileas.txt',txt)

When I open the fileas.txt, the Chinese traditional character readable in 
notepad, but when I try to read file to Rgui:-
 smple - scan('D:/fileas.txt',what='')
Then it comes to unrecognisable character again, I was wondering if Rgui 
support traditional Chinese character now... 

I think I need to looking for solution of inter-Chinese character's translation.
Thank you.


Best,
Ryusuke

  ===

Hi Ryusuke
 
 I would use the encoding parameter of htmlParse() and 
 download and parse the content in one operation:
 
 htmlParse(http://home.sina.com;, encoding = UTF-8)
 
 If you want to use getURL() in RCurl, use the .encoding parameter
 
  You didn't tell us the output of Sys.getlocale()
  or how your terminal/console is configured, so the above
  may vary under your configuration, but works on various
  machines for me with different settings.
 
D.
 
 
Ryusuke Kenji wrote:
 
 Hi All,
 
 First method:-
 library(XML)
 
 theurl - http://home.sina.com;
 download.file(theurl, tmp.html)
 
 txt - readLines(tmp.html)
 
 txt - htmlTreeParse(txt, error=function(...){}, useInternalNodes = 
 TRUE)
 
 g - xpathSApply(txt, //p, function(x) xmlValue(x))
 
 head(grep( , g, value=T))
 
 
 [1] ?? | ?? | ENGLISH   
 ??? ???
 [3] ??? ?? ??(???)  
 ?? 
 [5]  ???? ??! 
 ? ??! !  
 
 
 
 SecondMethod:-
 library(RCurl)
 
 theurl - getURL(http://home.sina.com,encoding='GB2312')
 
 Encoding(theurl)
 
 [1]unknown
 
 txt - readLines(con=textConnection(theurl),encoding='GB2312')
 txt[5:10] #show the lines which occurred encoding problem.
 [1] meta http-equiv=\Content-Type\ content=\text/html; charset=utf-8\ 
 /
 [2] titleSINA.com US ? -??/title
 [3] meta name=\Keywords\ content=\, ???, 
 ???, ??,, SINA, US, News, Chinese, 
 Asia\ /
 [4] meta name=\Description\ 
 content=\???, 
 ???24, , 
 , ??, , ?BBS, 
 ???.\ /
 [5] 
   
   

 [6] link rel=\stylesheet\ type=\text/css\ 
 href=\http://ui.sina.com/assets/css/style_home.css\; /
 
 i am trying to read data from a Chinese language website, but the Chinese 
 characters always unreadable, may I know if any good idea to cope such 
 encoding problem in RCurl and XML?
 
 
 Regards,
 Ryusuke

 _
 
 
 [[alternative HTML version deleted]]
 
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 
 
-- 
There are men who can think no deeper than a fact - Voltaire
 
 
Duncan Temple Langdun...@wald.ucdavis.edu
Department of Statistics  work:  (530) 752-4782
4210 Mathematical Sciences Bldg.  fax:   (530) 752-7099
One Shields Ave.
University of California at Davis
Davis, CA 95616, USA
 
 
  
_
¥á©`¥ë¤òÒ»À¨¥Á¥§¥Ã¥¯£¡Ëû¤ÎŸoÁÏ¥á©`¥ë¤â¥×¥í¥Ð¥¤¥À©`¥á©`¥ë¤â¡£

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Assigning entries to categories

2010-07-03 Thread Charles C. Berry

On Sat, 3 Jul 2010, LogLord wrote:



Thanks for your help!
You are right it is not one-to-one assigned that would be indeed very
easy... its more like assigning 1000 entries to 60 categories...

Unfortunately, the ?match and ?merge did not help me a lot... I am a newbie
to such programming stuff in R.

It would be great if you could help me again to set this up.


Then you need to observe this:

   PLEASE do read the posting guide
   http://www.R-project.org/posting-guide.html and provide
   commented, minimal, self-contained, reproducible code.

If you provide a _reproducible example_ that properly mimics the features 
of the problem you need to solve, the chance that someone will either 
solve it for you or point you in the right direction will be better.



[stuff deleted]

Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Double Integration

2010-07-03 Thread Hans W Borchers
Bogaso Christofer bogaso.christofer at gmail.com writes:
 
 Hi Ravi, your suggestion helped me as well a lot. If I look into
 that function, I see this function is calling another function  :

 .Call(doCubature, as.integer(fDim), body(f.check), 
 as.double(lowerLimit), as.double(upperLimit),
 as.integer(maxEval), 
 as.double(absError), as.double(tol), new.env(), PACKAGE =
 cubature)

 How I can see the interior of this doCubature?


Find the original code for the 'cubature' package at

http://ab-initio.mit.edu/wiki/index.php/Cubature

plus information why the 'adapt' package had to be abandoned and that
'cubature' is based on the same original algorithm of Genz and Malik,
but using free and GPLed software.

We should not bemoan the loss of the 'adapt' package, 'cubature' and
'R2cuba' are worthy successors for adaptive quadrature.

Hans Werner

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Best way to compute a sum

2010-07-03 Thread Bernardo Rangel Tura
On Fri, 2010-07-02 at 21:23 -0700, Roger Deangelis wrote:
 
   Although it does not apply to your series and is impractical,  it seems to
 me that the most accurate algorithm might be to add all the rational numbers
 whose sum and components can be represented  without error in binary first,
 ie 2.5 + .5 or 1/16 + 1/16 + 1/8.
 
   You could also get very clever and investigate a sum that should have an
 exact binary representation when the individual components do not, ie .1 +
 .2 + .2 = .5 and correct the sum.
 
 Roger

Roger I think you must read: What Every Computer Scientist Should Know
About Floating-Point Arithmetic
( http://docs.sun.com/source/806-3568/ncg_goldberg.html )

I think your question and others like this question is answer in this
paper 
-- 
Bernardo Rangel Tura, M.D,MPH,Ph.D
National Institute of Cardiology
Brazil

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Change the frequency of a ts?

2010-07-03 Thread Nicholas R Frazier
Thanks, Stefan.  I'm sure the difference between ts() and as.ts() seemed
simple to everyone on the list, but I'd been staring at the help files for a
long time and never made the connection.  ts's make more sense now.

Nick Frazier


On Sat, Jul 3, 2010 at 8:10 AM, Stefan Grosse singularit...@gmx.net wrote:

 Am 03.07.2010 13:55, schrieb Nicholas R Frazier:
  I'm trying to convert a column of a table into a ts object.  The data is
  monthly, so I want the ts frequency to be 12.
 
  I did this ...
 
  filings.ts = as.ts(Filings.100K, frequency=12)

 try:

 filings.ts - ts(Filings.100K, frequency=12)

 example:
 test-runif(312)
 test.ts-ts(test, frequency=12)
 tsp(test.ts)
 plot(test.ts)

 Oh I am late, Achim was faster...

 Cheers
 Stefan


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] How to generate longitudinal data using R

2010-07-03 Thread ZZY ZYBOYS
How to generate the longitudinal data with correlation structure of
independent , exchangeable and AR (1) through errors? Can someone provide
some sample codes?
Great appreciation!

Thanks much,
Yi

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] PDFfontNumber bugs in devPS.c (Re: plain text in Chinese can not be set)

2010-07-03 Thread Paul Murrell

Hi

Thanks very much for the report, diagnosis, and patch!

I have implemented your fix in the development version of R.

Paul

Jinsong Zhao wrote:

On 2010-7-1 15:24, Jinsong Zhao wrote:

Read the source again more carefully. I think I get the solution:

Change the following line in PDFfontNumber function in devPS.c:

num = 1000 + (cidfontIndex - 1)*5 + 1 + face;
to
num = 1000 + (cidfontIndex - 1)*5 + face;

It appears two times in the function.

However, I don't know how to compile the whole R distribution on Windows
platform. Would anyone here like to give a test? Thanks in advance!

Regards,
Jinsong


I have compiled R 2.11.1 on a Linux machine, and confirmed that 
PDFfontNumber function in devPS.c (grDevices library) has a bug, which 
causes the plain face of CID fonts cannot be accessed when CID fonts 
were used together with default font family in pdf().


the following is the patch.

--- devPS_orig.cSun Apr 25 06:10:04 2010
+++ devPS.c Fri Jul 02 09:46:55 2010
@@ -7267,7 +7267,7 @@
 * Use very high font number for CID fonts to avoid
 * Type 1 fonts
 */
-   num = 1000 + (cidfontIndex - 1)*5 + 1 + face;
+   num = 1000 + (cidfontIndex - 1)*5 + face;
else {
/*
 * Check whether the font is loaded and, if not,
@@ -7303,7 +7303,7 @@
} else /* (isCIDFont(family, PDFFonts)) */ {
if (addPDFDeviceCIDfont(cidfontfamily, pd,
cidfontIndex)) {
-   num = 1000 + (cidfontIndex - 1)*5 + 1 + face;
+   num = 1000 + (cidfontIndex - 1)*5 + face;
} else {
cidfontfamily = NULL;

Regards,
Jinsong


--
Dr Paul Murrell
Department of Statistics
The University of Auckland
Private Bag 92019
Auckland
New Zealand
64 9 3737599 x85392
p...@stat.auckland.ac.nz
http://www.stat.auckland.ac.nz/~paul/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] interval censored grouped data

2010-07-03 Thread Tims Corbett
Hi All,

 I have data in the following format:

Inspection  #failures
Month/Year
01/99 5
02/99 20
06/993
01/02 3

for 11 years ... the prob of failure on demand per month pfd is  #Total
failures / sample size(=total components * 11yrs * 12months).
I am required to cross-check pfd by other means.

Can I fit a weibull or lognormal plot to this count data and find the
failure probability or reliability from this data? exact failure times are
not known..only the number of failures in  the month is known. please
suggest an appropriate method / R code.

Thanks,
Tims

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to generate longitudinal data using R

2010-07-03 Thread Charles C. Berry

On Sat, 3 Jul 2010, ZZY ZYBOYS wrote:


How to generate the longitudinal data with correlation structure of
independent , exchangeable and AR (1) through errors? Can someone provide
some sample codes?



See

http://cran.r-project.org/web/views/Distributions.html



Great appreciation!

Thanks much,
Yi

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



Charles C. Berry(858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu   UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] help with predict.lda

2010-07-03 Thread Changbin Du
HI, Dear community,

I am using the linear discriminant analysis to build model and make new
predictions:

 dim(train)  #training data
[1] 1272   22
 dim(valid)  # validation data
[1] 140  22


lda.fit - lda(out ~ ., data=train, na.action=na.omit, CV=TRUE) # model
fitting of linear discriminant analysis on training data

 predict(lda.fit, valid)   # make prediction on validation data
Error in UseMethod(predict) :
  no applicable method for 'predict' applied to an object of class list

Can anyone help with this?

Thanks so much!

-- 
Sincerely,
Changbin
--

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.