Re: [R] EOF within quoted string

2017-08-11 Thread Mohan.Radhakrishnan
Yes. I tried that already. Not straightforward.

data <- read.csv("20_newsgroups.csv",fill=TRUE,as.is=T,header=F, quote="", 
sep=",", encoding="UTF-8")

This line does read it haphazardly. The emails in the column are split into 
multiple columns and there are several columns with just ‘NA’. Totally 202 
columns.

And then I removed columns with NA’s and concatenated all the text and finally 
got it.

munged <- data[, unlist(lapply(data, function(x) !all(is.na(x]
munged <- munged[-1,]
munged$text <- apply( munged[ , c(3:ncol(munged)) ] , 1 , paste0 , collapse = " 
")

munged <- munged[,c("V1","V2","text")]

print(head(munged$text))

Mohan

From: Adams, Jean [mailto:jvad...@usgs.gov]
Sent: Thursday, August 10, 2017 8:03 PM
To: Radhakrishnan, Mohan (Cognizant) 
Cc: R help 
Subject: Re: [R] EOF within quoted string

You might want to try some of the suggestions mentioned in this post: 
https://stackoverflow.com/q/17414776/2140956

Jean

On Thu, Aug 10, 2017 at 7:59 AM, 
mailto:mohan.radhakrish...@cognizant.com>> 
wrote:
Hi,

Reading http://ssc.wisc.edu/~ahanna/20_newsgroups.csv after downloading it using

data <- read.csv("20_newsgroups.csv",header=TRUE)

throws this.

Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  EOF within quoted string

So, for example, the first line in the file is this. This column contains only 
such text. Is there a way read it ?

From: cub...@garnet.berkeley.edu () Subject: 
Re: Cubs behind Marlins? How? Article-I.D.: agate.1pt592$f9a Organization: 
University of California, Berkeley Lines: 12 NNTP-Posting-Host: 
garnet.berkeley.edu   
gajar...@pilot.njin.net writes:  morgan and 
guzman will have era's 1 run higher than last year, and  the cubs will be 
idiots and not pitch harkey as much as hibbard.  castillo won't be good (i 
think he's a stud pitcher) This season so far, Morgan and Guzman helped 
to lead the Cubsat top in ERA, even better than THE rotation at 
Atlanta.Cubs ERA at 0.056 while Braves at 0.059. We know it is early
in the season, we Cubs fans have learned how to enjoy theshort 
triumph while it is still there.

Thanks,
Mohan
This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To 
UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] EOF within quoted string

2017-08-10 Thread Mohan.Radhakrishnan
Hi,

Reading http://ssc.wisc.edu/~ahanna/20_newsgroups.csv after downloading it using

data <- read.csv("20_newsgroups.csv",header=TRUE)

throws this.

Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  EOF within quoted string

So, for example, the first line in the file is this. This column contains only 
such text. Is there a way read it ?

From: cub...@garnet.berkeley.edu () Subject: Re: Cubs behind Marlins? How? 
Article-I.D.: agate.1pt592$f9a Organization: University of California, Berkeley 
Lines: 12 NNTP-Posting-Host: garnet.berkeley.edu   gajar...@pilot.njin.net 
writes:  morgan and guzman will have era's 1 run higher than last year, and  
the cubs will be idiots and not pitch harkey as much as hibbard.  castillo 
won't be good (i think he's a stud pitcher) This season so far, Morgan 
and Guzman helped to lead the Cubsat top in ERA, even better than THE 
rotation at Atlanta.Cubs ERA at 0.056 while Braves at 0.059. We know it 
is earlyin the season, we Cubs fans have learned how to enjoy the   
 short triumph while it is still there.

Thanks,
Mohan
This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Extract XMLAtrributeValue

2017-07-10 Thread Mohan.Radhakrishnan
Hi,

I am trying to extract an attribute value which is like this.

(e.g) class="whQuestion"

The 'extract' function prints this. But I am not sure how to get "whQuestion" 
from that. The type of 'x' in extract
is "character"

[1] "XMLAttributeValue"
   class
"whQuestion"
attr(,"class")

extract <- function(x){
print(x)
}

filteredclasses <-
function(){
classes <- xpathSApply(doc = posts, path = "/*/Posts/Post/@class", extract)
}

Thanks,
Mohan
This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Operating on RC in a list

2017-06-08 Thread Mohan.Radhakrishnan
I am replying to my question.
AFAIK dplyr works only with data frames.

So I flattened the RC's like this. A pure OO approach and a functional 
representation of it are at loggerheads. I think.

  filteredmeasurements <-
  keep(measurements, function(x){
   x$getid() == 
subject$getid()
   })
  groupedmeasurements <-
 filteredmeasurements %>% 
lapply(function(x){
   m <<- x$getmeasurement()
   
as.data.frame(list('visit'=m$getvisit(),

  'location'=x$getlocation()$getlocation(),

 'amount'=m$getquantity()$amount))
}) %>% rbind_all()
  dataColumns <- c('amount')
  
ddply(groupedmeasurements,c('visit','location'),function(x) 
colSums(x[dataColumns]))



Thanks,
Mohan

From: Radhakrishnan, Mohan (Cognizant)
Sent: Wednesday, June 07, 2017 2:05 PM
To: r-help@r-project.org
Subject: Operating on RC in a list

Hi,

I have a hierarchy of such classes. Subject has a list of measurements. Let 
assume I have a list of such 'Subject' RC's.

Can I use dplyr to navigate from Subject to the list of measurement RC's and 
filter and group data ? dplyr should
be able to call the methods on these RC's to operate on the data structure ?

I tried to coerce the list of RC's into a data frame unsuccessfully. But dplyr 
should be able to work with lists too. Right ?


Subject <- setRefClass("Subject",
fields = list( id = "numeric",
measurement = "Measurement",
location = "Location"),
methods=list(getmeasurement = function()
{
measurement
},
getid = function()
{
id
},
getlocation = function()
{
location
},
summary = function()#Implement other summary methods in appropriate objects as 
per their responsibilities
{
paste("Subject summary ID [",id,"] Location [",location$summary(),"]")
},show = function(){
cat("Subject summary ID [",id,"] Location [",location$summary(),"]\n")
})
)


Thanks,
Mohan
This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Operating on RC in a list

2017-06-07 Thread Mohan.Radhakrishnan
Hi,

I have a hierarchy of such classes. Subject has a list of measurements. Let 
assume I have a list of such 'Subject' RC's.

Can I use dplyr to navigate from Subject to the list of measurement RC's and 
filter and group data ? dplyr should
be able to call the methods on these RC's to operate on the data structure ?

I tried to coerce the list of RC's into a data frame unsuccessfully. But dplyr 
should be able to work with lists too. Right ?


Subject <- setRefClass("Subject",
fields = list( id = "numeric",
measurement = "Measurement",
location = "Location"),
methods=list(getmeasurement = function()
{
measurement
},
getid = function()
{
id
},
getlocation = function()
{
location
},
summary = function()#Implement other summary methods in appropriate objects as 
per their responsibilities
{
paste("Subject summary ID [",id,"] Location [",location$summary(),"]")
},show = function(){
cat("Subject summary ID [",id,"] Location [",location$summary(),"]\n")
})
)


Thanks,
Mohan
This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RC class composition

2017-05-22 Thread Mohan.Radhakrishnan
Hi,

The last line should give me the value of 'amount'. Is the syntax wrong ?

Measurement <- setRefClass("Measurement",
  fields = list(subject = Subject,
 quantity = Quantity))

s <- Subject$new(id = 100)

u <- CompoundUnit$new(  micrograms = 100,
  cubicmeter = 1 )

q <- Quantity$new(amount = 100,
 units = u )

m <- Measurement$new(subject = s,
  quantity = q)
print( m$quantity$amount )

Thanks,
Mohan
This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Task estimation - Monte Carlo

2015-07-06 Thread Mohan.Radhakrishnan
Hi

   I am trying to  simulate task estimation person days using this type of R 
code. But I am not sure about reasoning here. Should the distribution be beta 
or triangular or something else ? How do we get the values of mu,z and s here ? 
Are there any explanations available ? Sections of some book ?

I have the book about monte carlo analysis using R but that looks like the next 
step for me. I am at a preliminary stage.


taskestimation <- function( low , high, ci=0.9, n=1) {
mu = mean(c(low,high))
z = qnorm(1-(1-ci)/2)
s = (high - mu)/z
rnorm(n, mu, s)
}

result = taskestimation(10,80)


#calculate the percentage of cases below certain number of days
length(result[result < 50])/length(result)

I am able to plot a density curve showing the percentage of completion below 
'50' days that the simulation predicts.

Thanks,
Mohan
This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Simple monte carlo

2015-06-22 Thread Mohan.Radhakrishnan
Hi,
   I am a developer and I code 'R'. We have some project tasks and 
durations(Expected, 50% - Average Case and 90% - Worst Case ) and I am trying 
to understand how a simulation of this using monte carlo would help. Most of 
the websites deal with either the math or some commercial package. I don't want 
to use Excel because I use Eclipse StatET environment.

What kind of distribution should I use ? Is there a simpler explanation of a 
practical schedule distribution calculation ?

Thanks,
Mohan
This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Error bars and CI

2015-06-18 Thread Mohan.Radhakrishnan
Hi Dennis,
 I have copied the 'r' group. Could you explain ? Why 
can't we compute CI and error bars using this data set ?
The graph generated has equal-sized error bars and a 99% confidence band. 
Groups are not needed here. But the error bar and CI calculations could be 
incorrect but I am able to draw this.

  V1 IDX
1  0.796   1
2  0.542   2
3  0.510   3
4  0.617   4
5  0.482   5
6  0.387   6
7  0.272   7
8  0.536   8
9  0.498   9
10 0.402  10
11 0.328  11
12 0.542  12
13 0.299  13
14 0.647  14
15 0.291  15
16 0.815  16
17 0.680  17
18 0.363  18
19 0.560  19
20 0.334  20

Assume the dataframe is 'jc'.

print(summary(jc$V1))
error <- qt(0.995,df=length(jc$V1)-1)*sd(jc$V1)/sqrt(length(jc$V1))
error1 <- mean(jc$V1)-error
error2 <- mean(jc$V1)+error
print(error1)
print(error2)

q <- qplot(geom = "line",jc$IDX,jc$V1, 
colour='red')+geom_errorbar(aes(x=jc$IDX, ymin=jc$V1-sd(jc$V1), 
ymax=jc$V1+sd(jc$V1)), width=0.25)+
geom_ribbon(aes(x=jc$IDX, y=jc$V1, ymin=error1, 
ymax=error2),fill="ivory2",alpha = 0.4)+
xlab('Iterations') + ylab("Java Collections")+theme_bw()


Thanks,
Mohan

-Original Message-
From: Dennis Murphy [mailto:djmu...@gmail.com]
Sent: Wednesday, June 17, 2015 8:42 PM
To: Radhakrishnan, Mohan (Cognizant)
Subject: Re: [R] Error bars and CI

Q: How do you expect to get error bars when you plot "groups" having samples of 
size 1? If you "are not grouping", then what is the point of trying to 
manufacture variation where none exists? I'd suggest you think a little more 
deeply about what you can achieve with the available data.

This plot visualizes the data you posted. Every point is accounted for. I named 
the input data frame DF.

ggplot(DF, aes(x = IDX, y = V1)) +
   geom_line() + geom_point()

If you don't have replicate data at each unique x-value you want to plot, you 
cannot legitimately plot error bars, confidence intervals or any other visual 
that describes a (summary of) a distribution. If the values of V1 are supposed 
to represent averages that come from other data set, then you should have a 
corresponding column of standard deviations/standard errors, and *then* you can 
plot error bars, CIs, etc. Without a legitimate measure of variation in your 
input data frame, I don't see how you can possibly generate a line graph with 
accompanying error bars/CIs.

Dennis

On Wed, Jun 17, 2015 at 1:13 AM,   wrote:
> I think it could be something like this. But the mean is for the entire set. 
> Not groups.
> I get a graph with this code but error bars are not there.
>
>
> p<-ggplot(jc,aes(IDX,V1,colour=V1))
> p <- p + stat_summary(fun.y=mean,geom="point")
> p <- p + stat_summary(fun.y=mean,geom="line")
> p <- p + stat_summary(fun.data=mean_cl_normal,conf.int = .99,
> geom="errorbar", width=0.2)
>
>
> Thanks,
> Mohan
>
> -Original Message-
> From: Radhakrishnan, Mohan (Cognizant)
> Sent: Wednesday, June 17, 2015 12:54 PM
> To: 'Dennis Murphy'
> Cc: r-help@r-project.org
> Subject: RE: [R] Error bars and CI
>
> Your sample code is working. But I am missing the logic when my dataset is 
> involved.
>
> My full dataset is this. It is the V1 column I am interested in.  I am not 
> 'grouping' here.
>
>   V1 IDX
> 1  0.796   1
> 2  0.542   2
> 3  0.510   3
> 4  0.617   4
> 5  0.482   5
> 6  0.387   6
> 7  0.272   7
> 8  0.536   8
> 9  0.498   9
> 10 0.402  10
> 11 0.328  11
> 12 0.542  12
> 13 0.299  13
> 14 0.647  14
> 15 0.291  15
> 16 0.815  16
> 17 0.680  17
> 18 0.363  18
> 19 0.560  19
> 20 0.334  20
>
> Thanks,
> Mohan
>
> -Original Message-
> From: Dennis Murphy [mailto:djmu...@gmail.com]
> Sent: Tuesday, June 16, 2015 1:18 AM
> To: Radhakrishnan, Mohan (Cognizant)
> Subject: Re: [R] Error bars and CI
>
> Hi:
>
> Firstly, your dplyr code to generate the summary data frame is unnecessary 
> and distracting, particularly since you didn't provide the input data set; 
> you are asked to provide a *minimal* reproducible example, which you could 
> easily have done with a built-in data set.
> That said, to get what I perceive you want, I used the InsectSprays data from 
> the autoloaded datasets package.
>
> # Function to compute standard error of a mean sem <- function(x)
> sqrt(var(x)/length(x))
>
> ## Use insectSprays data for illustration ## Compute mean and SE of
> count for each level of spray
>
> library(dplyr)
> library(ggplot2)
>
> insectSumm <- InsectSprays %>%
>   group_by(spray) %>%
>   summarise(mean = mean(count), se = sem(count))
>
>
> # Since the x-variable is a factor, need to map group = 1 to # draw lines 
> between factor levels. geom_pointrange() can be # used to produce the 99% CIs 
> per factor level, geom_errorbar() # for the mean +/- SE. I ordered the geoms 
> so that the errorbar # is last, but if you want it (mostly) overwritten, put 
> the # geom_pointrange() call last.
>
> ggplot(insectSumm, aes(x = spray, y = mean)) +
>theme_bw() +
>geom_line(ae

Re: [R] Error bars and CI

2015-06-17 Thread Mohan.Radhakrishnan
I think it could be something like this. But the mean is for the entire set. 
Not groups.
I get a graph with this code but error bars are not there.


p<-ggplot(jc,aes(IDX,V1,colour=V1))
p <- p + stat_summary(fun.y=mean,geom="point")
p <- p + stat_summary(fun.y=mean,geom="line")
p <- p + stat_summary(fun.data=mean_cl_normal,conf.int = .99, geom="errorbar", 
width=0.2)


Thanks,
Mohan

-Original Message-
From: Radhakrishnan, Mohan (Cognizant)
Sent: Wednesday, June 17, 2015 12:54 PM
To: 'Dennis Murphy'
Cc: r-help@r-project.org
Subject: RE: [R] Error bars and CI

Your sample code is working. But I am missing the logic when my dataset is 
involved.

My full dataset is this. It is the V1 column I am interested in.  I am not 
'grouping' here.

  V1 IDX
1  0.796   1
2  0.542   2
3  0.510   3
4  0.617   4
5  0.482   5
6  0.387   6
7  0.272   7
8  0.536   8
9  0.498   9
10 0.402  10
11 0.328  11
12 0.542  12
13 0.299  13
14 0.647  14
15 0.291  15
16 0.815  16
17 0.680  17
18 0.363  18
19 0.560  19
20 0.334  20

Thanks,
Mohan

-Original Message-
From: Dennis Murphy [mailto:djmu...@gmail.com]
Sent: Tuesday, June 16, 2015 1:18 AM
To: Radhakrishnan, Mohan (Cognizant)
Subject: Re: [R] Error bars and CI

Hi:

Firstly, your dplyr code to generate the summary data frame is unnecessary and 
distracting, particularly since you didn't provide the input data set; you are 
asked to provide a *minimal* reproducible example, which you could easily have 
done with a built-in data set.
That said, to get what I perceive you want, I used the InsectSprays data from 
the autoloaded datasets package.

# Function to compute standard error of a mean sem <- function(x) 
sqrt(var(x)/length(x))

## Use insectSprays data for illustration ## Compute mean and SE of count for 
each level of spray

library(dplyr)
library(ggplot2)

insectSumm <- InsectSprays %>%
  group_by(spray) %>%
  summarise(mean = mean(count), se = sem(count))


# Since the x-variable is a factor, need to map group = 1 to # draw lines 
between factor levels. geom_pointrange() can be # used to produce the 99% CIs 
per factor level, geom_errorbar() # for the mean +/- SE. I ordered the geoms so 
that the errorbar # is last, but if you want it (mostly) overwritten, put the # 
geom_pointrange() call last.

ggplot(insectSumm, aes(x = spray, y = mean)) +
   theme_bw() +
   geom_line(aes(group = 1), size = 1, color = "darkorange") +
   geom_pointrange(aes(ymin = mean - qt(.995, 11) * se,
  ymax = mean + qt(.995, 11) * se),
   size = 1.5, color = "firebrick") +
   geom_errorbar(aes(ymin = mean - se, ymax = mean + se), width = 0.2,
   size = 1)

Clearly, you can pipe all the way through the ggplot() call, but I wanted to 
check the contents of the summary data frame first.

Dennis

On Mon, Jun 15, 2015 at 3:51 AM,   wrote:
> Hi,
>
> I want to plot a line graph using this data. IDX is x-axis and V1 is y-axis.  
> I also want standard error bars and 99% CI to be shown. My code is given 
> below. The section that plots the graph is the problem.  I don't see all the 
> points in the line graph with error bars. How can I also show the 99% CI in 
> the graph ?
>
>   V1 IDX
> 1  0.987  21
> 2  0.585  22
> 3  0.770  23
> 4  0.711  24
>
> library(stringr)
> library(dplyr)
> library(ggplot2)
>
> data <- read.table("D:\\jmh\\jmh.txt",sep="\t")
>
> final <-data %>%
>select(V1) %>%
>   filter(grepl("^Iteration", V1)) %>%
> mutate(V1 = str_extract(V1, "\\d+\\.\\d*"))
>
> final <- mutate(final,IDX = 1:n())
>
> jc <- final %>%
>   filter(IDX < 21)
>
>
> #Convert to numeric
> jc <- data.frame(sapply(jc, function(x) as.numeric(as.character(x
>
> print(jc)
>
> # The following section is the problem.
>
> sem <- function(x){
>sd(x)/sqrt(length(x))
> }
>
> meanvalue <- apply(jc,2,mean)
> semvalue <- apply(jc, 2, sem)
>
> mean_sem <- data.frame(mean= meanvalue, sem= semvalue,
> group=names(jc))
>
> #larger font
> theme_set(theme_gray(base_size = 20))
>
> #plot using ggplot
> p <- ggplot(mean_sem, aes(x=group, y=mean)) +
>   geom_line(stat='identity') +
>   geom_errorbar(aes(ymin=mean-sem, ymax=mean+sem),
>width=.2)
> print(p)
>
> Thanks,
> Mohan
> This e-mail and any files transmitted with it are for the sole use of the 
> intended recipient(s) and may contain confidential and privileged 
> information. If you are not the intended recipient(s), please reply to the 
> sender and destroy all copies of the original message. Any unauthorized 
> review, use, disclosure, dissemination, forwarding, printing or copying of 
> this email, and/or any action taken in reliance on the contents of this 
> e-mail is strictly prohibited and may be unlawful. Where permitted by 
> applicable law, this e-mail and other e-mail communications sent to and from 
> Cognizant e-mail addresses may be monitored.
>
>

Re: [R] Error bars and CI

2015-06-17 Thread Mohan.Radhakrishnan
Your sample code is working. But I am missing the logic when my dataset is 
involved.

My full dataset is this. It is the V1 column I am interested in.  I am not 
'grouping' here.

  V1 IDX
1  0.796   1
2  0.542   2
3  0.510   3
4  0.617   4
5  0.482   5
6  0.387   6
7  0.272   7
8  0.536   8
9  0.498   9
10 0.402  10
11 0.328  11
12 0.542  12
13 0.299  13
14 0.647  14
15 0.291  15
16 0.815  16
17 0.680  17
18 0.363  18
19 0.560  19
20 0.334  20

Thanks,
Mohan

-Original Message-
From: Dennis Murphy [mailto:djmu...@gmail.com]
Sent: Tuesday, June 16, 2015 1:18 AM
To: Radhakrishnan, Mohan (Cognizant)
Subject: Re: [R] Error bars and CI

Hi:

Firstly, your dplyr code to generate the summary data frame is unnecessary and 
distracting, particularly since you didn't provide the input data set; you are 
asked to provide a *minimal* reproducible example, which you could easily have 
done with a built-in data set.
That said, to get what I perceive you want, I used the InsectSprays data from 
the autoloaded datasets package.

# Function to compute standard error of a mean sem <- function(x) 
sqrt(var(x)/length(x))

## Use insectSprays data for illustration ## Compute mean and SE of count for 
each level of spray

library(dplyr)
library(ggplot2)

insectSumm <- InsectSprays %>%
  group_by(spray) %>%
  summarise(mean = mean(count), se = sem(count))


# Since the x-variable is a factor, need to map group = 1 to # draw lines 
between factor levels. geom_pointrange() can be # used to produce the 99% CIs 
per factor level, geom_errorbar() # for the mean +/- SE. I ordered the geoms so 
that the errorbar # is last, but if you want it (mostly) overwritten, put the # 
geom_pointrange() call last.

ggplot(insectSumm, aes(x = spray, y = mean)) +
   theme_bw() +
   geom_line(aes(group = 1), size = 1, color = "darkorange") +
   geom_pointrange(aes(ymin = mean - qt(.995, 11) * se,
  ymax = mean + qt(.995, 11) * se),
   size = 1.5, color = "firebrick") +
   geom_errorbar(aes(ymin = mean - se, ymax = mean + se), width = 0.2,
   size = 1)

Clearly, you can pipe all the way through the ggplot() call, but I wanted to 
check the contents of the summary data frame first.

Dennis

On Mon, Jun 15, 2015 at 3:51 AM,   wrote:
> Hi,
>
> I want to plot a line graph using this data. IDX is x-axis and V1 is y-axis.  
> I also want standard error bars and 99% CI to be shown. My code is given 
> below. The section that plots the graph is the problem.  I don't see all the 
> points in the line graph with error bars. How can I also show the 99% CI in 
> the graph ?
>
>   V1 IDX
> 1  0.987  21
> 2  0.585  22
> 3  0.770  23
> 4  0.711  24
>
> library(stringr)
> library(dplyr)
> library(ggplot2)
>
> data <- read.table("D:\\jmh\\jmh.txt",sep="\t")
>
> final <-data %>%
>select(V1) %>%
>   filter(grepl("^Iteration", V1)) %>%
> mutate(V1 = str_extract(V1, "\\d+\\.\\d*"))
>
> final <- mutate(final,IDX = 1:n())
>
> jc <- final %>%
>   filter(IDX < 21)
>
>
> #Convert to numeric
> jc <- data.frame(sapply(jc, function(x) as.numeric(as.character(x
>
> print(jc)
>
> # The following section is the problem.
>
> sem <- function(x){
>sd(x)/sqrt(length(x))
> }
>
> meanvalue <- apply(jc,2,mean)
> semvalue <- apply(jc, 2, sem)
>
> mean_sem <- data.frame(mean= meanvalue, sem= semvalue,
> group=names(jc))
>
> #larger font
> theme_set(theme_gray(base_size = 20))
>
> #plot using ggplot
> p <- ggplot(mean_sem, aes(x=group, y=mean)) +
>   geom_line(stat='identity') +
>   geom_errorbar(aes(ymin=mean-sem, ymax=mean+sem),
>width=.2)
> print(p)
>
> Thanks,
> Mohan
> This e-mail and any files transmitted with it are for the sole use of the 
> intended recipient(s) and may contain confidential and privileged 
> information. If you are not the intended recipient(s), please reply to the 
> sender and destroy all copies of the original message. Any unauthorized 
> review, use, disclosure, dissemination, forwarding, printing or copying of 
> this email, and/or any action taken in reliance on the contents of this 
> e-mail is strictly prohibited and may be unlawful. Where permitted by 
> applicable law, this e-mail and other e-mail communications sent to and from 
> Cognizant e-mail addresses may be monitored.
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the send

[R] Error bars and CI

2015-06-15 Thread Mohan.Radhakrishnan
Hi,

I want to plot a line graph using this data. IDX is x-axis and V1 is y-axis.  I 
also want standard error bars and 99% CI to be shown. My code is given below. 
The section that plots the graph is the problem.  I don't see all the points in 
the line graph with error bars. How can I also show the 99% CI in the graph ?

  V1 IDX
1  0.987  21
2  0.585  22
3  0.770  23
4  0.711  24

library(stringr)
library(dplyr)
library(ggplot2)

data <- read.table("D:\\jmh\\jmh.txt",sep="\t")

final <-data %>%
   select(V1) %>%
  filter(grepl("^Iteration", V1)) %>%
mutate(V1 = str_extract(V1, "\\d+\\.\\d*"))

final <- mutate(final,IDX = 1:n())

jc <- final %>%
  filter(IDX < 21)


#Convert to numeric
jc <- data.frame(sapply(jc, function(x) as.numeric(as.character(x

print(jc)

# The following section is the problem.

sem <- function(x){
   sd(x)/sqrt(length(x))
}

meanvalue <- apply(jc,2,mean)
semvalue <- apply(jc, 2, sem)

mean_sem <- data.frame(mean= meanvalue, sem= semvalue, group=names(jc))

#larger font
theme_set(theme_gray(base_size = 20))

#plot using ggplot
p <- ggplot(mean_sem, aes(x=group, y=mean)) +
  geom_line(stat='identity') +
  geom_errorbar(aes(ymin=mean-sem, ymax=mean+sem),
   width=.2)
print(p)

Thanks,
Mohan
This e-mail and any files transmitted with it are for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
If you are not the intended recipient(s), please reply to the sender and 
destroy all copies of the original message. Any unauthorized review, use, 
disclosure, dissemination, forwarding, printing or copying of this email, 
and/or any action taken in reliance on the contents of this e-mail is strictly 
prohibited and may be unlawful. Where permitted by applicable law, this e-mail 
and other e-mail communications sent to and from Cognizant e-mail addresses may 
be monitored.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.