[R] Assistance with httr package with R version 3.3.0

2016-05-09 Thread Luca Meyer
Hello,

I am trying to run a code I have been using for a few years now after
downloading the new R version 3.3.0 and I get the following error:

> rm(list=ls())
> library(httr)
>
> #carico i dati da Google spreadsheets
> url <- "
https://docs.google.com/spreadsheets/d/102-jJ7x1YfIe4Kkvb9olQ4chQ_TS90jxoU0vAbFZewc/pubhtml?gid=0=true
"
> readSpreadsheet <- function(url, sheet = 1){
+   r <- GET(url)
+   html <- content(r)
+   sheets <- readHTMLTable(html, header=FALSE, stringsAsFactors=FALSE)
+   df <- sheets[[sheet]]
+   dfClean <- function(df){
+ nms <- t(df[1,])
+ names(df) <- nms
+ df <- df[-1,-1]
+ row.names(df) <- seq(1,nrow(df))
+ df
+   }
+   dfClean(df)
+ }
> dati <- readSpreadsheet(url)
Error in (function (classes, fdef, mtable)  :
  unable to find an inherited method for function ‘readHTMLTable’ for
signature ‘"xml_document"’
> rm(readSpreadsheet,url)

Can anyone suggest a solution to it?

Thanks,

Luca

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem while predicting in regression trees

2016-05-09 Thread Max Kuhn
I've brought this up numerous times... you shouldn't use `predict.rpart`
(or whatever modeling function) from the `finalModel` object. That object
has no idea what was done to the data prior to its invocation.

The issue here is that `train(formula)` converts the factors to dummy
variables. `rpart` does not require that and the `finalModel` object has no
idea that that happened. Using `predict.train` works just fine so why not
use it?

> table(predict(tr_m, newdata = testPFI))

-2617.42857142857 -1786.76923076923 -1777.583   -1217.3
3 3 6 3
-886.6667  -408.375-375.7 -240.307692307692
5 1 4 5
-201.612903225806 -19.6071428571429  30.80833  43.9
   307266 9
151.5  209.647058823529
628

On Mon, May 9, 2016 at 2:46 PM, Muhammad Bilal <
muhammad2.bi...@live.uwe.ac.uk> wrote:

> Please find the sample dataset attached along with R code pasted below to
> reproduce the issue.
>
>
> #Loading the data frame
>
> pfi <- read.csv("pfi_data.csv")
>
> #Splitting the data into training and test sets
> split <- sample.split(pfi, SplitRatio = 0.7)
> trainPFI <- subset(pfi, split == TRUE)
> testPFI <- subset(pfi, split == FALSE)
>
> #Cross validating the decision trees
> tr.control <- trainControl(method="repeatedcv", number=20)
> cp.grid <- expand.grid(.cp = (0:10)*0.001)
> tr_m <- train(project_delay ~ project_lon + project_lat + project_duration
> + sector + contract_type + capital_value, data = trainPFI, method="rpart",
> trControl=tr.control, tuneGrid = cp.grid)
>
> #Displaying the train results
> tr_m
>
> #Fetching the best tree
> best_tree <- tr_m$finalModel
>
> #Plotting the best tree
> prp(best_tree)
>
> #Using the best tree to make predictions *[This command raises the error]*
> best_tree_pred <- predict(best_tree, newdata = testPFI)
>
> #Calculating the SSE
> best_tree_pred.sse <- sum((best_tree_pred - testPFI$project_delay)^2)
>
> #
> tree_pred.sse
>
> ...
>
> Many Thanks and
>
>
> Kind Regards
>
>
>
> --
> Muhammad Bilal
> Research Fellow and Doctoral Researcher,
> Bristol Enterprise, Research, and Innovation Centre (BERIC),
> University of the West of England (UWE),
> Frenchay Campus,
> Bristol,
> BS16 1QY
>
> *muhammad2.bi...@live.uwe.ac.uk* 
>
>
> --
> *From:* Max Kuhn 
> *Sent:* 09 May 2016 17:22:22
> *To:* Muhammad Bilal
> *Cc:* Bert Gunter; r-help@r-project.org
>
> *Subject:* Re: [R] Problem while predicting in regression trees
>
> It is extremely difficult to tell what the issue might be without a
> reproducible example.
>
> The only thing that I can suggest is to use the non-formula interface to
> `train` so that you can avoid creating dummy variables.
>
> On Mon, May 9, 2016 at 11:23 AM, Muhammad Bilal <
> muhammad2.bi...@live.uwe.ac.uk> wrote:
>
>> Hi Bert,
>>
>> Thanks for the response.
>>
>> I checked the datasets, however, the Hospitals level appears in both of
>> them. See the output below:
>>
>> > sqldf("SELECT sector, count(*) FROM trainPFI GROUP BY sector")
>> sector count(*)
>> 1  Defense9
>> 2Hospitals  101
>> 3  Housing   32
>> 4   Others   99
>> 5 Public Buildings   39
>> 6  Schools  148
>> 7  Social Care   10
>> 8  Transportation   27
>> 9Waste   26
>> > sqldf("SELECT sector, count(*) FROM testPFI GROUP BY sector")
>> sector count(*)
>> 1  Defense5
>> 2Hospitals   47
>> 3  Housing   11
>> 4   Others   44
>> 5 Public Buildings   18
>> 6  Schools   69
>> 7  Social Care9
>> 8   Transportation8
>> 9Waste   12
>>
>> Any thing else to try?
>>
>> --
>> Muhammad Bilal
>> Research Fellow and Doctoral Researcher,
>> Bristol Enterprise, Research, and Innovation Centre (BERIC),
>> University of the West of England (UWE),
>> Frenchay Campus,
>> Bristol,
>> BS16 1QY
>>
>> muhammad2.bi...@live.uwe.ac.uk
>>
>>
>> 
>> From: Bert Gunter 
>> Sent: 09 May 2016 01:42:39
>> To: Muhammad Bilal
>> Cc: r-help@r-project.org
>> Subject: Re: [R] Problem while predicting in regression trees
>>
>> It seems that the data that you used for prediction contained a level
>> "Hospitals" for the sector factor that did not appear in the training
>> data (or maybe it's the other way round). Check this.
>>
>> Cheers,
>> Bert
>>
>>
>> Bert Gunter
>>
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>>
>>
>> On Sun, May 8, 2016 at 4:14 PM, Muhammad 

[R-es] Latex y gráfico

2016-05-09 Thread Javier Marcuzzi
Estimados

Hay varias alternativas para utilizar latex dentro de R, concretamente por 
ejemplo una ecuación matemática dentro de un gráfico.

Pero para un caso se me ocurre utilizar líneas que conecten partes de gráficos, 
por ejemplo, supongamos que hay dos barras con tres o cuatro divisiones, 
digamos, 25% cada una, de la primera desde una de estas me gustaría realizar 
una flecha, hacia la segunda barra.

En latex hay algunas alternativas como xy-pic, podría utilizar esto dentro de 
un archivo Rnw, colocando el gráfico en una matriz de latex, pero posiblemente 
alguna persona escribió al respecto, y me gustaría poder leer la experiencia de 
otros para no cometer los mismos errores, u obtener soluciones alternativas a 
las que estoy pensando. 

Gracias

Javier Rubén Marcuzzi


[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


[R] ggplot scale_colour_distiller legend to display all values?

2016-05-09 Thread Mike Smith
Is there a way to get ggplot scale_colour_distiller to display all values in 
the legend? 

Currently using this code.

thanks!

mike

library(ggplot2)
#Input data: insert the filename for raw data
data <- 
read.csv("http://www.lecturematerials.co.uk/data/learning_bands.csv",header=T)

ggplot(data,aes(x=multiplier,y=factor)) +
  geom_point(aes(colour=band), size=8, shape=15) +
  scale_colour_distiller(palette = "Spectral", direction=-1, guide="legend", 
name="Order") +
  ggtitle("Times Tables Learning Bands") +
  scale_x_continuous(name="Multiplier", limits=c(2, 12), 
breaks=c(2,4,6,8,10,12)) +
  scale_y_continuous(name="Factor", limits=c(2, 12), breaks=c(2,4,6,8,10,12)) + 
  geom_abline(intercept = 0, slope = 1, size=1) +
  coord_fixed() 




---
Mike Smith

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem while predicting in regression trees

2016-05-09 Thread Muhammad Bilal
Hi Bill,


Many thanks for highlighting the issue. It worked as I predicted using the 
tr_m. I'm extremely grateful for the insight.


Thanks for all who gave me prior guidance as well.


--
Muhammad Bilal
Research Fellow and Doctoral Researcher,
Bristol Enterprise, Research, and Innovation Centre (BERIC),
University of the West of England (UWE),
Frenchay Campus,
Bristol,
BS16 1QY

muhammad2.bi...@live.uwe.ac.uk



From: William Dunlap 
Sent: 09 May 2016 20:27:14
To: Muhammad Bilal
Cc: Max Kuhn; r-help@r-project.org
Subject: Re: [R] Problem while predicting in regression trees

Why are you predicting from tr_m$finalModel instead of from tr_m?

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, May 9, 2016 at 11:46 AM, Muhammad Bilal 
> wrote:
Please find the sample dataset attached along with R code pasted below to 
reproduce the issue.


#Loading the data frame

pfi <- read.csv("pfi_data.csv")

#Splitting the data into training and test sets
split <- sample.split(pfi, SplitRatio = 0.7)
trainPFI <- subset(pfi, split == TRUE)
testPFI <- subset(pfi, split == FALSE)

#Cross validating the decision trees
tr.control <- trainControl(method="repeatedcv", number=20)
cp.grid <- expand.grid(.cp = (0:10)*0.001)
tr_m <- train(project_delay ~ project_lon + project_lat + project_duration + 
sector + contract_type + capital_value, data = trainPFI, method="rpart", 
trControl=tr.control, tuneGrid = cp.grid)

#Displaying the train results
tr_m

#Fetching the best tree
best_tree <- tr_m$finalModel

#Plotting the best tree
prp(best_tree)

#Using the best tree to make predictions [This command raises the error]
best_tree_pred <- predict(best_tree, newdata = testPFI)

#Calculating the SSE
best_tree_pred.sse <- sum((best_tree_pred - testPFI$project_delay)^2)

#
tree_pred.sse

...


Many Thanks and


Kind Regards



--
Muhammad Bilal
Research Fellow and Doctoral Researcher,
Bristol Enterprise, Research, and Innovation Centre (BERIC),
University of the West of England (UWE),
Frenchay Campus,
Bristol,
BS16 1QY

muhammad2.bi...@live.uwe.ac.uk>



From: Max Kuhn >
Sent: 09 May 2016 17:22:22
To: Muhammad Bilal
Cc: Bert Gunter; r-help@r-project.org
Subject: Re: [R] Problem while predicting in regression trees

It is extremely difficult to tell what the issue might be without a 
reproducible example.

The only thing that I can suggest is to use the non-formula interface to 
`train` so that you can avoid creating dummy variables.

On Mon, May 9, 2016 at 11:23 AM, Muhammad Bilal 
>>
 wrote:
Hi Bert,

Thanks for the response.

I checked the datasets, however, the Hospitals level appears in both of them. 
See the output below:

> sqldf("SELECT sector, count(*) FROM trainPFI GROUP BY sector")
sector count(*)
1  Defense9
2Hospitals  101
3  Housing   32
4   Others   99
5 Public Buildings   39
6  Schools  148
7  Social Care   10
8  Transportation   27
9Waste   26
> sqldf("SELECT sector, count(*) FROM testPFI GROUP BY sector")
sector count(*)
1  Defense5
2Hospitals   47
3  Housing   11
4   Others   44
5 Public Buildings   18
6  Schools   69
7  Social Care9
8   Transportation8
9Waste   12

Any thing else to try?

--
Muhammad Bilal
Research Fellow and Doctoral Researcher,
Bristol Enterprise, Research, and Innovation Centre (BERIC),
University of the West of England (UWE),
Frenchay Campus,
Bristol,
BS16 1QY

muhammad2.bi...@live.uwe.ac.uk>



From: Bert Gunter 
>>
Sent: 09 May 2016 01:42:39
To: Muhammad Bilal
Cc: 
r-help@r-project.org>
Subject: Re: [R] Problem while predicting in regression trees

It seems that the data that you used for prediction contained a level
"Hospitals" for the sector factor that did not appear in the training
data (or maybe it's the other way round). Check this.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka 

Re: [R] Problem while predicting in regression trees

2016-05-09 Thread William Dunlap via R-help
Why are you predicting from tr_m$finalModel instead of from tr_m?

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, May 9, 2016 at 11:46 AM, Muhammad Bilal <
muhammad2.bi...@live.uwe.ac.uk> wrote:

> Please find the sample dataset attached along with R code pasted below to
> reproduce the issue.
>
>
> #Loading the data frame
>
> pfi <- read.csv("pfi_data.csv")
>
> #Splitting the data into training and test sets
> split <- sample.split(pfi, SplitRatio = 0.7)
> trainPFI <- subset(pfi, split == TRUE)
> testPFI <- subset(pfi, split == FALSE)
>
> #Cross validating the decision trees
> tr.control <- trainControl(method="repeatedcv", number=20)
> cp.grid <- expand.grid(.cp = (0:10)*0.001)
> tr_m <- train(project_delay ~ project_lon + project_lat + project_duration
> + sector + contract_type + capital_value, data = trainPFI, method="rpart",
> trControl=tr.control, tuneGrid = cp.grid)
>
> #Displaying the train results
> tr_m
>
> #Fetching the best tree
> best_tree <- tr_m$finalModel
>
> #Plotting the best tree
> prp(best_tree)
>
> #Using the best tree to make predictions [This command raises the error]
> best_tree_pred <- predict(best_tree, newdata = testPFI)
>
> #Calculating the SSE
> best_tree_pred.sse <- sum((best_tree_pred - testPFI$project_delay)^2)
>
> #
> tree_pred.sse
>
> ...
>
>
> Many Thanks and
>
>
> Kind Regards
>
>
>
> --
> Muhammad Bilal
> Research Fellow and Doctoral Researcher,
> Bristol Enterprise, Research, and Innovation Centre (BERIC),
> University of the West of England (UWE),
> Frenchay Campus,
> Bristol,
> BS16 1QY
>
> muhammad2.bi...@live.uwe.ac.uk
>
>
> 
> From: Max Kuhn 
> Sent: 09 May 2016 17:22:22
> To: Muhammad Bilal
> Cc: Bert Gunter; r-help@r-project.org
> Subject: Re: [R] Problem while predicting in regression trees
>
> It is extremely difficult to tell what the issue might be without a
> reproducible example.
>
> The only thing that I can suggest is to use the non-formula interface to
> `train` so that you can avoid creating dummy variables.
>
> On Mon, May 9, 2016 at 11:23 AM, Muhammad Bilal <
> muhammad2.bi...@live.uwe.ac.uk>
> wrote:
> Hi Bert,
>
> Thanks for the response.
>
> I checked the datasets, however, the Hospitals level appears in both of
> them. See the output below:
>
> > sqldf("SELECT sector, count(*) FROM trainPFI GROUP BY sector")
> sector count(*)
> 1  Defense9
> 2Hospitals  101
> 3  Housing   32
> 4   Others   99
> 5 Public Buildings   39
> 6  Schools  148
> 7  Social Care   10
> 8  Transportation   27
> 9Waste   26
> > sqldf("SELECT sector, count(*) FROM testPFI GROUP BY sector")
> sector count(*)
> 1  Defense5
> 2Hospitals   47
> 3  Housing   11
> 4   Others   44
> 5 Public Buildings   18
> 6  Schools   69
> 7  Social Care9
> 8   Transportation8
> 9Waste   12
>
> Any thing else to try?
>
> --
> Muhammad Bilal
> Research Fellow and Doctoral Researcher,
> Bristol Enterprise, Research, and Innovation Centre (BERIC),
> University of the West of England (UWE),
> Frenchay Campus,
> Bristol,
> BS16 1QY
>
> muhammad2.bi...@live.uwe.ac.uk
>
>
> 
> From: Bert Gunter >
> Sent: 09 May 2016 01:42:39
> To: Muhammad Bilal
> Cc: r-help@r-project.org
> Subject: Re: [R] Problem while predicting in regression trees
>
> It seems that the data that you used for prediction contained a level
> "Hospitals" for the sector factor that did not appear in the training
> data (or maybe it's the other way round). Check this.
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Sun, May 8, 2016 at 4:14 PM, Muhammad Bilal
> >
> wrote:
> > Hi All,
> >
> > I have the following script, that raises error at the last command. I am
> new to R and require some clarification on what is going wrong.
> >
> > #Creating the training and testing data sets
> > splitFlag <- sample.split(pfi_v3, SplitRatio = 0.7)
> > trainPFI <- subset(pfi_v3, splitFlag==TRUE)
> > testPFI <- subset(pfi_v3, splitFlag==FALSE)
> >
> >
> > #Structure of the trainPFI data frame
> >> str(trainPFI)
> > ***
> > 'data.frame': 491 obs. of  16 variables:
> >  $ project_id : int  1 2 3 6 7 9 10 12 13 14 ...
> >  $ project_lat: num  51.4 51.5 52.2 51.9 52.5 ...
> >  $ project_lon: num  -0.642 -1.85 0.08 -0.401 -1.888 ...
> >  $ sector

Re: [R] Problem while predicting in regression trees

2016-05-09 Thread Muhammad Bilal
The dataset could also be downloaded from the following link:

https://www.dropbox.com/s/kkiwm32jxfk7jac/pfi_data.csv?dl=0

[https://cf.dropboxstatic.com/static/images/icons128/page_white_excel.png]

pfi_data.csv
www.dropbox.com
Shared with Dropbox





--
Muhammad Bilal
Research Fellow and Doctoral Researcher,
Bristol Enterprise, Research, and Innovation Centre (BERIC),
University of the West of England (UWE),
Frenchay Campus,
Bristol,
BS16 1QY

muhammad2.bi...@live.uwe.ac.uk



From: Max Kuhn 
Sent: 09 May 2016 17:22:22
To: Muhammad Bilal
Cc: Bert Gunter; r-help@r-project.org
Subject: Re: [R] Problem while predicting in regression trees

It is extremely difficult to tell what the issue might be without a 
reproducible example.

The only thing that I can suggest is to use the non-formula interface to 
`train` so that you can avoid creating dummy variables.

On Mon, May 9, 2016 at 11:23 AM, Muhammad Bilal 
> wrote:
Hi Bert,

Thanks for the response.

I checked the datasets, however, the Hospitals level appears in both of them. 
See the output below:

> sqldf("SELECT sector, count(*) FROM trainPFI GROUP BY sector")
sector count(*)
1  Defense9
2Hospitals  101
3  Housing   32
4   Others   99
5 Public Buildings   39
6  Schools  148
7  Social Care   10
8  Transportation   27
9Waste   26
> sqldf("SELECT sector, count(*) FROM testPFI GROUP BY sector")
sector count(*)
1  Defense5
2Hospitals   47
3  Housing   11
4   Others   44
5 Public Buildings   18
6  Schools   69
7  Social Care9
8   Transportation8
9Waste   12

Any thing else to try?

--
Muhammad Bilal
Research Fellow and Doctoral Researcher,
Bristol Enterprise, Research, and Innovation Centre (BERIC),
University of the West of England (UWE),
Frenchay Campus,
Bristol,
BS16 1QY

muhammad2.bi...@live.uwe.ac.uk



From: Bert Gunter >
Sent: 09 May 2016 01:42:39
To: Muhammad Bilal
Cc: r-help@r-project.org
Subject: Re: [R] Problem while predicting in regression trees

It seems that the data that you used for prediction contained a level
"Hospitals" for the sector factor that did not appear in the training
data (or maybe it's the other way round). Check this.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sun, May 8, 2016 at 4:14 PM, Muhammad Bilal
> wrote:
> Hi All,
>
> I have the following script, that raises error at the last command. I am new 
> to R and require some clarification on what is going wrong.
>
> #Creating the training and testing data sets
> splitFlag <- sample.split(pfi_v3, SplitRatio = 0.7)
> trainPFI <- subset(pfi_v3, splitFlag==TRUE)
> testPFI <- subset(pfi_v3, splitFlag==FALSE)
>
>
> #Structure of the trainPFI data frame
>> str(trainPFI)
> ***
> 'data.frame': 491 obs. of  16 variables:
>  $ project_id : int  1 2 3 6 7 9 10 12 13 14 ...
>  $ project_lat: num  51.4 51.5 52.2 51.9 52.5 ...
>  $ project_lon: num  -0.642 -1.85 0.08 -0.401 -1.888 ...
>  $ sector : Factor w/ 9 levels "Defense","Hospitals",..: 4 4 
> 4 6 6 6 6 6 6 6 ...
>  $ contract_type  : chr  "Turnkey" "Turnkey" "Turnkey" "Turnkey" ...
>  $ project_duration   : int  1826 3652 121 730 730 790 522 819 998 372 ...
>  $ project_delay  : int  -323 0 -60 0 0 0 -91 0 0 7 ...
>  $ capital_value  : num  6.7 5.8 21.8 24.2 40.7 10.7 70 24.5 60.5 78 
> ...
>  $ project_delay_pct  : num  -17.7 0 -49.6 0 0 0 -17.4 0 0 1.9 ...
>  $ delay_type : Ord.factor w/ 9 levels "7 months early & 
> beyond"<..: 1 5 3 5 5 5 2 5 5 6 ...
>
> library(caret)
> library(e1071)
>
> set.seed(100)
>
> tr.control <- trainControl(method="cv", number=10)
> cp.grid <- expand.grid(.cp = (0:10)*0.001)
>
> #Fitting the model using regression tree
> tr_m <- train(project_delay ~ project_lon + project_lat + project_duration + 
> sector + contract_type + capital_value, data = trainPFI, method="rpart", 
> trControl=tr.control, tuneGrid = cp.grid)
>
> tr_m
>
> CART
> 491 samples
> 15 predictor
> No pre-processing
> Resampling: Cross-Validated (10 fold)
> Summary of sample sizes: 443, 442, 441, 442, 441, 442, ...
> Resampling results across tuning 

Re: [R] Problem while predicting in regression trees

2016-05-09 Thread Muhammad Bilal
Please find the sample dataset attached along with R code pasted below to 
reproduce the issue.


#Loading the data frame

pfi <- read.csv("pfi_data.csv")

#Splitting the data into training and test sets
split <- sample.split(pfi, SplitRatio = 0.7)
trainPFI <- subset(pfi, split == TRUE)
testPFI <- subset(pfi, split == FALSE)

#Cross validating the decision trees
tr.control <- trainControl(method="repeatedcv", number=20)
cp.grid <- expand.grid(.cp = (0:10)*0.001)
tr_m <- train(project_delay ~ project_lon + project_lat + project_duration + 
sector + contract_type + capital_value, data = trainPFI, method="rpart", 
trControl=tr.control, tuneGrid = cp.grid)

#Displaying the train results
tr_m

#Fetching the best tree
best_tree <- tr_m$finalModel

#Plotting the best tree
prp(best_tree)

#Using the best tree to make predictions [This command raises the error]
best_tree_pred <- predict(best_tree, newdata = testPFI)

#Calculating the SSE
best_tree_pred.sse <- sum((best_tree_pred - testPFI$project_delay)^2)

#
tree_pred.sse

...


Many Thanks and


Kind Regards



--
Muhammad Bilal
Research Fellow and Doctoral Researcher,
Bristol Enterprise, Research, and Innovation Centre (BERIC),
University of the West of England (UWE),
Frenchay Campus,
Bristol,
BS16 1QY

muhammad2.bi...@live.uwe.ac.uk



From: Max Kuhn 
Sent: 09 May 2016 17:22:22
To: Muhammad Bilal
Cc: Bert Gunter; r-help@r-project.org
Subject: Re: [R] Problem while predicting in regression trees

It is extremely difficult to tell what the issue might be without a 
reproducible example.

The only thing that I can suggest is to use the non-formula interface to 
`train` so that you can avoid creating dummy variables.

On Mon, May 9, 2016 at 11:23 AM, Muhammad Bilal 
> wrote:
Hi Bert,

Thanks for the response.

I checked the datasets, however, the Hospitals level appears in both of them. 
See the output below:

> sqldf("SELECT sector, count(*) FROM trainPFI GROUP BY sector")
sector count(*)
1  Defense9
2Hospitals  101
3  Housing   32
4   Others   99
5 Public Buildings   39
6  Schools  148
7  Social Care   10
8  Transportation   27
9Waste   26
> sqldf("SELECT sector, count(*) FROM testPFI GROUP BY sector")
sector count(*)
1  Defense5
2Hospitals   47
3  Housing   11
4   Others   44
5 Public Buildings   18
6  Schools   69
7  Social Care9
8   Transportation8
9Waste   12

Any thing else to try?

--
Muhammad Bilal
Research Fellow and Doctoral Researcher,
Bristol Enterprise, Research, and Innovation Centre (BERIC),
University of the West of England (UWE),
Frenchay Campus,
Bristol,
BS16 1QY

muhammad2.bi...@live.uwe.ac.uk



From: Bert Gunter >
Sent: 09 May 2016 01:42:39
To: Muhammad Bilal
Cc: r-help@r-project.org
Subject: Re: [R] Problem while predicting in regression trees

It seems that the data that you used for prediction contained a level
"Hospitals" for the sector factor that did not appear in the training
data (or maybe it's the other way round). Check this.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sun, May 8, 2016 at 4:14 PM, Muhammad Bilal
> wrote:
> Hi All,
>
> I have the following script, that raises error at the last command. I am new 
> to R and require some clarification on what is going wrong.
>
> #Creating the training and testing data sets
> splitFlag <- sample.split(pfi_v3, SplitRatio = 0.7)
> trainPFI <- subset(pfi_v3, splitFlag==TRUE)
> testPFI <- subset(pfi_v3, splitFlag==FALSE)
>
>
> #Structure of the trainPFI data frame
>> str(trainPFI)
> ***
> 'data.frame': 491 obs. of  16 variables:
>  $ project_id : int  1 2 3 6 7 9 10 12 13 14 ...
>  $ project_lat: num  51.4 51.5 52.2 51.9 52.5 ...
>  $ project_lon: num  -0.642 -1.85 0.08 -0.401 -1.888 ...
>  $ sector : Factor w/ 9 levels "Defense","Hospitals",..: 4 4 
> 4 6 6 6 6 6 6 6 ...
>  $ contract_type  : chr  "Turnkey" "Turnkey" "Turnkey" "Turnkey" ...
>  $ project_duration   : int  1826 3652 121 730 730 790 522 819 998 372 ...
>  $ project_delay  : int  -323 0 -60 0 0 0 -91 0 0 7 ...
>  $ capital_value  : num  6.7 5.8 21.8 24.2 40.7 10.7 70 24.5 60.5 78 
> ...
>  $ project_delay_pct  : num  -17.7 0 -49.6 0 0 0 -17.4 0 0 1.9 ...
>  $ delay_type   

Re: [R] Clean method to convert date and time between time zones keeping it in POSIXct format

2016-05-09 Thread MacQueen, Don
I think setting the attribute is the best way to "convert", and the
following will hopefully explain why. (And I would tend to agree with
William Dunlap that a function to set the attribute might help userRs.)



R always stores POSIXct objects internally in seconds since an origin in
UTC. I would not think in terms of converting; I would think in terms of
displaying.


t1 <- as.POSIXct("2016-05-09 10:00:00", tz="America/New_York")
t2 <- t1
attributes(t2)$tzone <- 'UTC'

> print(t1)
[1] "2016-05-09 10:00:00 EDT"
> print(t2)
[1] "2016-05-09 14:00:00 UTC"

> as.numeric(t1)
[1] 1462802400
> as.numeric(t2)
[1] 1462802400



The actual value of t2 is the same as t1, it has not been "converted"
(because to me conversion implies change, and there has been no change in
the value of t2). R has been merely been told to display it in UTC when
printed.


Similarly, when a character string is converted to POSIXct, R has to be
told what timezone to use to convert it to seconds since the origin in
UTC. If a timezone is not specified, the user's local (default) timezone
is used.

> t3 <- as.POSIXct("2016-05-09 10:00:00")
> t3
[1] "2016-05-09 10:00:00 PDT"
> as.numeric(t3)
[1] 1462813200


The number of seconds for t3 is different than for t1 and t2, and this is
because I am not in the America/New_York timezone

And, in fact,

> (as.numeric(t3) - as.numeric(t1))/3600
[1] 3


Indicating that PDT and America/New_York are three hours apart, as indeed
they are.

To me, this version
  t4 <- as.POSIXct(format(t1, tz="UTC"), tz="UTC")
doesn't recognize the distinction between internal storage and display,
and the fact that there is no real conversion.

> identical(t2, t4)
[1] TRUE




To go a little further,

> attributes(t3)
$class
[1] "POSIXct" "POSIXt"

$tzone
[1] ""


If a timezone is not specified when the object is created, then the tzone
attribute is set to "" and R displays using the local timezone.

> t3
[1] "2016-05-09 10:00:00 PDT"

I can change the local timezone:

> Sys.setenv(TZ='UTC')
> t3
[1] "2016-05-09 17:00:00 UTC"

t3 has not changed, but how it is displayed has changed.

Given all this, it's helpful that the format() function for POSIXt objects
has a tz argument.

> format(t3, tz='US/Pacific')
[1] "2016-05-09 10:00:00"


Indicating again that t3 has not changed; I've just manipulated to rules
for how it is displayed.


-Don

-- 
Don MacQueen

Lawrence Livermore National Laboratory
7000 East Ave., L-627
Livermore, CA 94550
925-423-1062





On 5/9/16, 6:24 AM, "R-help on behalf of Arnaud Mosnier"
 wrote:

>Dear UseRs,
>
>I know two ways to convert dates and time from on time zone to another but
>I am pretty sure that there is a better (cleaner) way to do that.
>
>
>Here are the methods I know:
>
>
>## The longest way ...
>
>T1 <- as.POSIXct("2016-05-09 10:00:00", format="%Y-%m-%d %H:%M:%S",
>tz="America/New_York")
>
>print(T1)
>
>T2 <- as.POSIXct(format(T1, tz="UTC"), tz="UTC") # format convert it to
>character, so I have to convert it back to POSIXct afterward.
>
>print(T2)
>
>
>
>## The shortest but probably not the cleanest ...
>
>attributes(T1)$tzone <- "UTC"
>
>print(T1)
>
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://secure-web.cisco.com/1nvZKqHLPcK9TOXimc8J5movKns8BiTyuMWcZyyxtloFp
>axMIqoeoukSrVfSV5mZWIt9iP1hZc8xp4qpTd4myFmgBNUpWkT0GA8U1sEDLLDrqq4f6rBRi8l
>XA8AZCvehwZW9JmJOjsfBUHIciqVRyN2jDW3WmSTnM6vZP_pc2W1B_DxINMQlrH6d8IveSoEVz
>x5Ie8aC104HM-D5z1qkGhkmTZBDtAkwyRfij_jTlVGsYXCgM3f8umyU0J6rr0zlsQyTFAdzqmi
>rfZsH0CvNtISZVnx_h3_ErJhp7onwLO-9l4UvGOe9y7j0thranjcVCJ45UFTgaP8Jp0lEZ1XjJ
>QnrRiRwQ9m8IHckFxPXutNY/https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2F
>r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Revolutions blog: April 2016 roundup

2016-05-09 Thread David Smith
Since 2008, Microsoft (formerly Revolution Analytics) staff and guests have 
written about R every weekday at the
Revolutions blog: http://blog.revolutionanalytics.com
and every month I post a summary of articles from the previous month of 
particular interest to readers of r-help.

And in case you missed them, here are some articles related to R from the month 
of April:

Lukasz Piwek recreates classic graphs from Tufte's 'The Visual Display of 
Quantitative Information' in R:
http://blog.revolutionanalytics.com/2016/04/tufte-style-graphics-in-r.html

A preview of upcoming R conferences in Europe:
http://blog.revolutionanalytics.com/2016/04/r-conferences-europe-2016.html

Andrie de Vries updates the data on R package growth on CRAN,
http://blog.revolutionanalytics.com/2016/04/cran-package-growth.html and finds 
a segmented regression model with
break-points in 2007 and 2011 fits the data well:
http://blog.revolutionanalytics.com/2016/04/a-segmented-model-of-cran-package-growth.html

A Microsoft data scientist compares R, Microsoft R Open and Microsoft R Server:
http://blog.revolutionanalytics.com/2016/04/data-scientist-perspective.html

A webinar on data visualization with Microsoft R Open, presented by Naomi and 
Joyce Robbins:
http://blog.revolutionanalytics.com/2016/04/webinar-april-28-effective-graphs.html

Microsoft R Open 3.2.4 now available for Windows, Mac and Linux:
http://blog.revolutionanalytics.com/2016/04/mro-324-available.html

A preview of R/Finance 2016, May 20-21 in Chicago:
http://blog.revolutionanalytics.com/2016/04/get-ready-for-rfinance-2016.html

Julia Silge releases a CRAN package with the text of Jane Austin's novels, and 
uses the syuzhet package to map 
sentiment in the narratives: 
http://blog.revolutionanalytics.com/2016/04/pride-and-prejudice-and-z-scores.html

Modeling tips paid to taxi drivers in NYC with Microsoft R Server running on 
HDInsight Hadoop:
http://blog.revolutionanalytics.com/2016/04/mrs-nyc-taxi.html

A webinar recording (with slides) shows how to scale Microsoft R Server to very 
large data sets on HDInsight with Apache
Spark: http://blog.revolutionanalytics.com/2016/04/scalable-ds-platform.html

News on recent grants to community projects by the R Consortium (proposals for 
the next round are due July 10):
http://blog.revolutionanalytics.com/2016/04/get-involved-with-the-r-consortium.html

The Microsoft Data Science Virtual Machine, which packages many data science 
tools including R, is now available as a
Linux VM: http://blog.revolutionanalytics.com/2016/04/microsoft-ds-vm-linux.html

Buzzfeed used R to visualize the activity of surveillance aircraft used by the 
US government:
http://blog.revolutionanalytics.com/2016/04/the-fbis-aerial-surveillance-program-visualized-with-r.html

Using Azure ML and R to predict the quality of wine:
http://blog.revolutionanalytics.com/2016/04/predicting-wine-quality.html

A review of the book 'Graphical Data Analysis with R' by Antony Unwin:
http://blog.revolutionanalytics.com/2016/04/graphical-data-analysis-with-r.html

Montgomery County, MD opened its traffic violation data, and Srini Kumar used 
SQL Server and R to visualize it:
http://blog.revolutionanalytics.com/2016/04/an-analysis-of-traffic-violation-data-with-sql-server-and-r.html

80% of Airbnb's data scientists use R, and share methods and tools via an 
internal R package:
http://blog.revolutionanalytics.com/2016/04/airbnb-uses-r.html 

Microsoft sponsors a competition using R to evaluate a treatment for brain 
injury and infer vision from brain waves:
http://blog.revolutionanalytics.com/2016/04/connected-brains.html

General interest stories (not related to R) in the past month included stories 
about: the new Thunderbirds
(http://blog.revolutionanalytics.com/2016/04/thunderbirds-are-go.html), 
Australian abbreviations
(http://blog.revolutionanalytics.com/2016/04/abbreviated-discource.html), 
anamorphic illusions
(http://blog.revolutionanalytics.com/2016/04/witness-brusspup-illusions.html), 
the jet streams of Earth and Jupiter
(http://blog.revolutionanalytics.com/2016/04/because-its-friday-jet-stream.html),
 and never-seen YouTube videos
(http://blog.revolutionanalytics.com/2016/04/petit-tube.html).

Meeting times for local R user groups 
(http://blog.revolutionanalytics.com/local-r-groups.html) can be found on the
updated R Community Calendar at: 
http://blog.revolutionanalytics.com/calendar.html .
If you're looking for more articles about R, you can find summaries from 
previous months at
http://blog.revolutionanalytics.com/roundups/. You can receive daily blog posts 
via email using services like
blogtrottr.com.

As always, thanks for the comments and please keep sending suggestions to me at 
david...@microsoft.com or via Twitter
(I'm @revodavid).

Cheers,
# David

-- 
David M Smith 
R Community Lead, Microsoft  
Tel: +1 (312) 9205766 (Chicago IL, USA)
Twitter: @revodavid | Blog:  http://blog.revolutionanalytics.com


Re: [R] Problem while predicting in regression trees

2016-05-09 Thread Max Kuhn
It is extremely difficult to tell what the issue might be without a
reproducible example.

The only thing that I can suggest is to use the non-formula interface to
`train` so that you can avoid creating dummy variables.

On Mon, May 9, 2016 at 11:23 AM, Muhammad Bilal <
muhammad2.bi...@live.uwe.ac.uk> wrote:

> Hi Bert,
>
> Thanks for the response.
>
> I checked the datasets, however, the Hospitals level appears in both of
> them. See the output below:
>
> > sqldf("SELECT sector, count(*) FROM trainPFI GROUP BY sector")
> sector count(*)
> 1  Defense9
> 2Hospitals  101
> 3  Housing   32
> 4   Others   99
> 5 Public Buildings   39
> 6  Schools  148
> 7  Social Care   10
> 8  Transportation   27
> 9Waste   26
> > sqldf("SELECT sector, count(*) FROM testPFI GROUP BY sector")
> sector count(*)
> 1  Defense5
> 2Hospitals   47
> 3  Housing   11
> 4   Others   44
> 5 Public Buildings   18
> 6  Schools   69
> 7  Social Care9
> 8   Transportation8
> 9Waste   12
>
> Any thing else to try?
>
> --
> Muhammad Bilal
> Research Fellow and Doctoral Researcher,
> Bristol Enterprise, Research, and Innovation Centre (BERIC),
> University of the West of England (UWE),
> Frenchay Campus,
> Bristol,
> BS16 1QY
>
> muhammad2.bi...@live.uwe.ac.uk
>
>
> 
> From: Bert Gunter 
> Sent: 09 May 2016 01:42:39
> To: Muhammad Bilal
> Cc: r-help@r-project.org
> Subject: Re: [R] Problem while predicting in regression trees
>
> It seems that the data that you used for prediction contained a level
> "Hospitals" for the sector factor that did not appear in the training
> data (or maybe it's the other way round). Check this.
>
> Cheers,
> Bert
>
>
> Bert Gunter
>
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>
>
> On Sun, May 8, 2016 at 4:14 PM, Muhammad Bilal
>  wrote:
> > Hi All,
> >
> > I have the following script, that raises error at the last command. I am
> new to R and require some clarification on what is going wrong.
> >
> > #Creating the training and testing data sets
> > splitFlag <- sample.split(pfi_v3, SplitRatio = 0.7)
> > trainPFI <- subset(pfi_v3, splitFlag==TRUE)
> > testPFI <- subset(pfi_v3, splitFlag==FALSE)
> >
> >
> > #Structure of the trainPFI data frame
> >> str(trainPFI)
> > ***
> > 'data.frame': 491 obs. of  16 variables:
> >  $ project_id : int  1 2 3 6 7 9 10 12 13 14 ...
> >  $ project_lat: num  51.4 51.5 52.2 51.9 52.5 ...
> >  $ project_lon: num  -0.642 -1.85 0.08 -0.401 -1.888 ...
> >  $ sector : Factor w/ 9 levels "Defense","Hospitals",..:
> 4 4 4 6 6 6 6 6 6 6 ...
> >  $ contract_type  : chr  "Turnkey" "Turnkey" "Turnkey" "Turnkey"
> ...
> >  $ project_duration   : int  1826 3652 121 730 730 790 522 819 998
> 372 ...
> >  $ project_delay  : int  -323 0 -60 0 0 0 -91 0 0 7 ...
> >  $ capital_value  : num  6.7 5.8 21.8 24.2 40.7 10.7 70 24.5
> 60.5 78 ...
> >  $ project_delay_pct  : num  -17.7 0 -49.6 0 0 0 -17.4 0 0 1.9 ...
> >  $ delay_type : Ord.factor w/ 9 levels "7 months early &
> beyond"<..: 1 5 3 5 5 5 2 5 5 6 ...
> >
> > library(caret)
> > library(e1071)
> >
> > set.seed(100)
> >
> > tr.control <- trainControl(method="cv", number=10)
> > cp.grid <- expand.grid(.cp = (0:10)*0.001)
> >
> > #Fitting the model using regression tree
> > tr_m <- train(project_delay ~ project_lon + project_lat +
> project_duration + sector + contract_type + capital_value, data = trainPFI,
> method="rpart", trControl=tr.control, tuneGrid = cp.grid)
> >
> > tr_m
> >
> > CART
> > 491 samples
> > 15 predictor
> > No pre-processing
> > Resampling: Cross-Validated (10 fold)
> > Summary of sample sizes: 443, 442, 441, 442, 441, 442, ...
> > Resampling results across tuning parameters:
> >   cp RMSE  Rsquared
> >   0.000  441.1524  0.5417064
> >   0.001  439.6319  0.5451104
> >   0.002  437.4039  0.5487203
> >   0.003  432.3675  0.551
> >   0.004  434.2138  0.5519964
> >   0.005  431.6635  0.551
> >   0.006  436.6163  0.5474135
> >   0.007  440.5473  0.5407240
> >   0.008  441.0876  0.5399614
> >   0.009  441.5715  0.5401718
> >   0.010  441.1401  0.5407121
> > RMSE was used to select the optimal model using  the smallest value.
> > The final value used for the model was cp = 0.005.
> >
> > #Fetching the best tree
> > best_tree <- tr_m$finalModel
> >
> > Alright, all the aforementioned commands worked fine.
> >
> > Except the subsequent command raises error, when the developed model is
> used to make predictions:
> > best_tree_pred <- predict(best_tree, newdata = testPFI)
> > 

Re: [R] Problem while predicting in regression trees

2016-05-09 Thread Muhammad Bilal
Hi Bert,

Thanks for the response.

I checked the datasets, however, the Hospitals level appears in both of them. 
See the output below:

> sqldf("SELECT sector, count(*) FROM trainPFI GROUP BY sector")
sector count(*)
1  Defense9
2Hospitals  101
3  Housing   32
4   Others   99
5 Public Buildings   39
6  Schools  148
7  Social Care   10
8  Transportation   27
9Waste   26
> sqldf("SELECT sector, count(*) FROM testPFI GROUP BY sector")
sector count(*)
1  Defense5
2Hospitals   47
3  Housing   11
4   Others   44
5 Public Buildings   18
6  Schools   69
7  Social Care9
8   Transportation8
9Waste   12

Any thing else to try?

--
Muhammad Bilal
Research Fellow and Doctoral Researcher,
Bristol Enterprise, Research, and Innovation Centre (BERIC),
University of the West of England (UWE),
Frenchay Campus,
Bristol,
BS16 1QY

muhammad2.bi...@live.uwe.ac.uk



From: Bert Gunter 
Sent: 09 May 2016 01:42:39
To: Muhammad Bilal
Cc: r-help@r-project.org
Subject: Re: [R] Problem while predicting in regression trees

It seems that the data that you used for prediction contained a level
"Hospitals" for the sector factor that did not appear in the training
data (or maybe it's the other way round). Check this.

Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sun, May 8, 2016 at 4:14 PM, Muhammad Bilal
 wrote:
> Hi All,
>
> I have the following script, that raises error at the last command. I am new 
> to R and require some clarification on what is going wrong.
>
> #Creating the training and testing data sets
> splitFlag <- sample.split(pfi_v3, SplitRatio = 0.7)
> trainPFI <- subset(pfi_v3, splitFlag==TRUE)
> testPFI <- subset(pfi_v3, splitFlag==FALSE)
>
>
> #Structure of the trainPFI data frame
>> str(trainPFI)
> ***
> 'data.frame': 491 obs. of  16 variables:
>  $ project_id : int  1 2 3 6 7 9 10 12 13 14 ...
>  $ project_lat: num  51.4 51.5 52.2 51.9 52.5 ...
>  $ project_lon: num  -0.642 -1.85 0.08 -0.401 -1.888 ...
>  $ sector : Factor w/ 9 levels "Defense","Hospitals",..: 4 4 
> 4 6 6 6 6 6 6 6 ...
>  $ contract_type  : chr  "Turnkey" "Turnkey" "Turnkey" "Turnkey" ...
>  $ project_duration   : int  1826 3652 121 730 730 790 522 819 998 372 ...
>  $ project_delay  : int  -323 0 -60 0 0 0 -91 0 0 7 ...
>  $ capital_value  : num  6.7 5.8 21.8 24.2 40.7 10.7 70 24.5 60.5 78 
> ...
>  $ project_delay_pct  : num  -17.7 0 -49.6 0 0 0 -17.4 0 0 1.9 ...
>  $ delay_type : Ord.factor w/ 9 levels "7 months early & 
> beyond"<..: 1 5 3 5 5 5 2 5 5 6 ...
>
> library(caret)
> library(e1071)
>
> set.seed(100)
>
> tr.control <- trainControl(method="cv", number=10)
> cp.grid <- expand.grid(.cp = (0:10)*0.001)
>
> #Fitting the model using regression tree
> tr_m <- train(project_delay ~ project_lon + project_lat + project_duration + 
> sector + contract_type + capital_value, data = trainPFI, method="rpart", 
> trControl=tr.control, tuneGrid = cp.grid)
>
> tr_m
>
> CART
> 491 samples
> 15 predictor
> No pre-processing
> Resampling: Cross-Validated (10 fold)
> Summary of sample sizes: 443, 442, 441, 442, 441, 442, ...
> Resampling results across tuning parameters:
>   cp RMSE  Rsquared
>   0.000  441.1524  0.5417064
>   0.001  439.6319  0.5451104
>   0.002  437.4039  0.5487203
>   0.003  432.3675  0.551
>   0.004  434.2138  0.5519964
>   0.005  431.6635  0.551
>   0.006  436.6163  0.5474135
>   0.007  440.5473  0.5407240
>   0.008  441.0876  0.5399614
>   0.009  441.5715  0.5401718
>   0.010  441.1401  0.5407121
> RMSE was used to select the optimal model using  the smallest value.
> The final value used for the model was cp = 0.005.
>
> #Fetching the best tree
> best_tree <- tr_m$finalModel
>
> Alright, all the aforementioned commands worked fine.
>
> Except the subsequent command raises error, when the developed model is used 
> to make predictions:
> best_tree_pred <- predict(best_tree, newdata = testPFI)
> Error in eval(expr, envir, enclos) : object 'sectorHospitals' not found
>
> Can someone guide me what to do to resolve this issue.
>
> Any help will be highly appreciated.
>
> Many Thanks and
>
> Kind Regards
>
> --
> Muhammad Bilal
> Research Fellow and Doctoral Researcher,
> Bristol Enterprise, Research, and Innovation Centre (BERIC),
> University of the West of England (UWE),
> Frenchay Campus,
> Bristol,
> BS16 1QY
>
> muhammad2.bi...@live.uwe.ac.uk
>
>
> [[alternative HTML version deleted]]
>
> 

Re: [R] Clean method to convert date and time between time zones keeping it in POSIXct format

2016-05-09 Thread William Dunlap via R-help
I think as.POSIXct will just pass through a POSIXct object without any
changes.  E.g.,
  > dput(as.POSIXct( structure( list(quote(foo)),
class=c("POSIXct","POSIXt"
  structure(list(foo), class = c("POSIXct", "POSIXt"))

If as.POSIXct( POSIXctObject, tz="ZONE") changed the time zone then a fair
bit of code would have to be changed from
t <- as.POSIXct(t)
to
if (!is.POSIXct(t)) {
t <- as.POSIXct(t)
}
so that existing POSIXct objects would not have their time zones changed.

Having a tzone<- or tz<- function could be handy.

Bill Dunlap
TIBCO Software
wdunlap tibco.com

On Mon, May 9, 2016 at 6:37 AM, Ivan Calandra 
wrote:

> I don't have an answer, but actually, I would have expected
> as.POSIXct(T1, tz="UTC")
> to work...
>
> Looks like as.POSIXct cannot convert from class "POSIXct"
>
> Ivan
>
> --
> Ivan Calandra, PhD
> Scientific Mediator
> University of Reims Champagne-Ardenne
> GEGENAA - EA 3795
> CREA - 2 esplanade Roland Garros
> 51100 Reims, France
> +33(0)3 26 77 36 89
> ivan.calan...@univ-reims.fr
> --
> https://www.researchgate.net/profile/Ivan_Calandra
> https://publons.com/author/705639/
>
>
> Le 09/05/2016 à 15:24, Arnaud Mosnier a écrit :
>
>> Dear UseRs,
>>
>> I know two ways to convert dates and time from on time zone to another but
>> I am pretty sure that there is a better (cleaner) way to do that.
>>
>>
>> Here are the methods I know:
>>
>>
>> ## The longest way ...
>>
>> T1 <- as.POSIXct("2016-05-09 10:00:00", format="%Y-%m-%d %H:%M:%S",
>> tz="America/New_York")
>>
>> print(T1)
>>
>> T2 <- as.POSIXct(format(T1, tz="UTC"), tz="UTC") # format convert it to
>> character, so I have to convert it back to POSIXct afterward.
>>
>> print(T2)
>>
>>
>>
>> ## The shortest but probably not the cleanest ...
>>
>> attributes(T1)$tzone <- "UTC"
>>
>> print(T1)
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Clean method to convert date and time between time zones keeping it in POSIXct format

2016-05-09 Thread Gabor Grothendieck
This involves mucking with the internals as well but it is short:

   structure(T1, tzone = "UTC")

On Mon, May 9, 2016 at 9:24 AM, Arnaud Mosnier  wrote:
> Dear UseRs,
>
> I know two ways to convert dates and time from on time zone to another but
> I am pretty sure that there is a better (cleaner) way to do that.
>
>
> Here are the methods I know:
>
>
> ## The longest way ...
>
> T1 <- as.POSIXct("2016-05-09 10:00:00", format="%Y-%m-%d %H:%M:%S",
> tz="America/New_York")
>
> print(T1)
>
> T2 <- as.POSIXct(format(T1, tz="UTC"), tz="UTC") # format convert it to
> character, so I have to convert it back to POSIXct afterward.
>
> print(T2)
>
>
>
> ## The shortest but probably not the cleanest ...
>
> attributes(T1)$tzone <- "UTC"
>
> print(T1)
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Clean method to convert date and time between time zones keeping it in POSIXct format

2016-05-09 Thread Ivan Calandra

I don't have an answer, but actually, I would have expected
as.POSIXct(T1, tz="UTC")
to work...

Looks like as.POSIXct cannot convert from class "POSIXct"

Ivan

--
Ivan Calandra, PhD
Scientific Mediator
University of Reims Champagne-Ardenne
GEGENAA - EA 3795
CREA - 2 esplanade Roland Garros
51100 Reims, France
+33(0)3 26 77 36 89
ivan.calan...@univ-reims.fr
--
https://www.researchgate.net/profile/Ivan_Calandra
https://publons.com/author/705639/

Le 09/05/2016 à 15:24, Arnaud Mosnier a écrit :

Dear UseRs,

I know two ways to convert dates and time from on time zone to another but
I am pretty sure that there is a better (cleaner) way to do that.


Here are the methods I know:


## The longest way ...

T1 <- as.POSIXct("2016-05-09 10:00:00", format="%Y-%m-%d %H:%M:%S",
tz="America/New_York")

print(T1)

T2 <- as.POSIXct(format(T1, tz="UTC"), tz="UTC") # format convert it to
character, so I have to convert it back to POSIXct afterward.

print(T2)



## The shortest but probably not the cleanest ...

attributes(T1)$tzone <- "UTC"

print(T1)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Clean method to convert date and time between time zones keeping it in POSIXct format

2016-05-09 Thread Arnaud Mosnier
Dear UseRs,

I know two ways to convert dates and time from on time zone to another but
I am pretty sure that there is a better (cleaner) way to do that.


Here are the methods I know:


## The longest way ...

T1 <- as.POSIXct("2016-05-09 10:00:00", format="%Y-%m-%d %H:%M:%S",
tz="America/New_York")

print(T1)

T2 <- as.POSIXct(format(T1, tz="UTC"), tz="UTC") # format convert it to
character, so I have to convert it back to POSIXct afterward.

print(T2)



## The shortest but probably not the cleanest ...

attributes(T1)$tzone <- "UTC"

print(T1)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] with vs. attach

2016-05-09 Thread Hadley Wickham
On Mon, May 9, 2016 at 7:12 AM, peter dalgaard  wrote:
>
> On 09 May 2016, at 02:46 , Bert Gunter  wrote:
>
>> ... To be clear, Hadley or anyone else should also feel free to set me
>> straight, preferably publicly, but privately if you prefer.
>
> Not really to "set anyone straight", but there are some subtleties with mode 
> call objects versus expression objects and formulas to be aware of.
>
> E.g.,
>
>> a <- 2
>> do.call("print", list(a*pi))
> [1] 6.283185
>> do.call("print", list(quote(a*pi)))
> [1] 6.283185
>> do.call("print", list(expression(a*pi)))
> expression(a * pi)
>> do.call("print", list(~a*pi))
> ~a * pi
>
> Thing is, if you insert a call object into a parse tree, nothing is there to 
> preserve its nature as an unevaluated expression. Similarly, in
>
>> call("print", quote(a*pi))
> print(a * pi)
>
> the result is identical to quote(print(a * pi)), so when evaluated, quoting 
> is not seen by print().
>
> As far as I understand, this is also the reason that for math in ggplot, you 
> may need as.expression(bquote()).
>
> In general, I think that a number of things in R had been more cleanly 
> implemented using formulas/expression objects than using substitution and 
> lazy evaluation, notably subset and offset arguments in lm/glm. It would have 
> been so much cleaner to have
>
> lm(math ~ age, data = foo, subset = ~ sex=="1")
>
> than the current situation where lm internally chops its own head off and 
> substitutes with model.frame, then evaluates the call to model.frame() which 
> in turn does eval(substitute(subset), data, env). Of course, at the time, ~ 
> was intended specifically for Wilkinson Rogers type formulas; "abusing" it 
> for other kinds of expressions is something of an afterthought.

Yeah, to my mind, the cool thing about formulas is that they provide a
concise way to capture an environment and an expression, and then
Wilkinson Rogers are just a special case.

It's obvious impossible to go back and change how lm() etc works now,
but I'm reasonably confident that lazyeval provides a strong
foundation going forward. The quasiquotation stuff is particularly
important - and unquote-splice makes it possible to do things that are
impossible with bquote().  (Of course, unquote-splice could be added
to bquote(), but I think you'll still run into issues with
environments)

Hadley


-- 
http://hadley.nz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] with vs. attach

2016-05-09 Thread Hadley Wickham
On Sun, May 8, 2016 at 7:28 PM, Bert Gunter  wrote:
> Jeff:
>
> That's easy to do already with substitute(), since you can pass around
> an unevaluated expression (a parse tree) however you like. As I read
> it, (admittedly quickly) what it's main feature is that it allows you
> more control over the environment in which the expression is finally
> evaluated -- as well as permitting nested expression evaluation fairly
> easily.
>
> But maybe we're saying the same thing ...  IMHO I think Hadley has
> gone overboard here, worrying about rarely important issues, as you
> seem to be intimating also.

These are absolutely critical issues that crop up as soon as other
people want to write functions that use your functions that use NSE.

Hadley

-- 
http://hadley.nz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] with vs. attach

2016-05-09 Thread peter dalgaard

On 09 May 2016, at 02:46 , Bert Gunter  wrote:

> ... To be clear, Hadley or anyone else should also feel free to set me
> straight, preferably publicly, but privately if you prefer.

Not really to "set anyone straight", but there are some subtleties with mode 
call objects versus expression objects and formulas to be aware of. 

E.g.,

> a <- 2
> do.call("print", list(a*pi))
[1] 6.283185
> do.call("print", list(quote(a*pi)))
[1] 6.283185
> do.call("print", list(expression(a*pi)))
expression(a * pi)
> do.call("print", list(~a*pi))
~a * pi

Thing is, if you insert a call object into a parse tree, nothing is there to 
preserve its nature as an unevaluated expression. Similarly, in

> call("print", quote(a*pi))
print(a * pi)

the result is identical to quote(print(a * pi)), so when evaluated, quoting is 
not seen by print().

As far as I understand, this is also the reason that for math in ggplot, you 
may need as.expression(bquote()).

In general, I think that a number of things in R had been more cleanly 
implemented using formulas/expression objects than using substitution and lazy 
evaluation, notably subset and offset arguments in lm/glm. It would have been 
so much cleaner to have

lm(math ~ age, data = foo, subset = ~ sex=="1")

than the current situation where lm internally chops its own head off and 
substitutes with model.frame, then evaluates the call to model.frame() which in 
turn does eval(substitute(subset), data, env). Of course, at the time, ~ was 
intended specifically for Wilkinson Rogers type formulas; "abusing" it for 
other kinds of expressions is something of an afterthought. 

-pd

> 
> Cheers,
> Bert
> Bert Gunter
> 
> "The trouble with having an open mind is that people keep coming along
> and sticking things into it."
> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
> 
> 
> On Sun, May 8, 2016 at 5:28 PM, Bert Gunter  wrote:
>> Jeff:
>> 
>> That's easy to do already with substitute(), since you can pass around
>> an unevaluated expression (a parse tree) however you like. As I read
>> it, (admittedly quickly) what it's main feature is that it allows you
>> more control over the environment in which the expression is finally
>> evaluated -- as well as permitting nested expression evaluation fairly
>> easily.
>> 
>> But maybe we're saying the same thing ...  IMHO I think Hadley has
>> gone overboard here, worrying about rarely important issues, as you
>> seem to be intimating also.
>> 
>> Feel free to set me straight... or ignore.
>> 
>> Cheers,
>> Bert
>> Bert Gunter
>> 
>> "The trouble with having an open mind is that people keep coming along
>> and sticking things into it."
>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )
>> 
>> 
>> On Sun, May 8, 2016 at 4:02 PM, Jeff Newmiller  
>> wrote:
>>> The lazyeval package addresses the problem of how to delay evaluation even 
>>> when the function you want to do the evaluation in is buried two or more 
>>> function calls below where the original call was made. If you are not 
>>> building nested function calls with delayed evaluation then you probably 
>>> don't need that package.
>>> --
>>> Sent from my phone. Please excuse my brevity.
>>> 
>>> On May 8, 2016 3:30:16 PM PDT, Spencer Graves 
>>>  wrote:
 Hi, Hadley et al.:
 
 
  Hadley's link requires his development version of "lazyeval",
 which can be obtained as follows:
 
 
 library(devtools)
 install_github("hadley/lazyeval")
 
 
  Hadley's link describes real problems with elegant solutions.
 
 
  However, David's solution solved my immediate problem, and it's
 not immediately obvious to me how his "expr_text" function (or other
 functions in "lazyevel") to produce a better solution.
 
 
  Thanks again to David, Peter and Hadley for their replies.
 
 
  Spencer Graves
 
 
 On 5/6/2016 5:08 PM, Hadley Wickham wrote:
> You may want to read http://rpubs.com/hadley/157957, which captures
 my
> latest thinking (and tooling) around this problem. Feedback is much
> appreciated.
> 
> Hadley
> 
> On Fri, May 6, 2016 at 2:14 PM, David Winsemius
  wrote:
>>> On May 6, 2016, at 5:47 AM, Spencer Graves
  wrote:
>>> 
>>> 
>>> 
>>> On 5/6/2016 6:46 AM, peter dalgaard wrote:
 On 06 May 2016, at 02:43 , David Winsemius
  wrote:
 
>> On May 5, 2016, at 5:12 PM, Spencer Graves
  wrote:
>> 
>> I want a function to evaluate one argument
>> in the environment of a data.frame supplied
>> as another argument.  "attach" works for
>> this, but "with" does not.  Is there a way