Re: [R] Best way to merge 300+ .5MB dataframes?

2014-08-12 Thread David Winsemius

On Aug 11, 2014, at 8:01 PM, John McKown wrote:

 On Mon, Aug 11, 2014 at 9:43 PM, Thomas Adams tea...@gmail.com wrote:
 Grant,
 
 Assuming all your filenames are something like file1.txt,
 file2.txt,file3.txt... And using the Mac OSX terminal app (after you cd to
 the directory where your files are located...
 
 This will strip off the 1st lines, that is, your header lines:
 
 for file in *.txt;do
 sed -i '1d'${file};
 done
 
 Then, do this:
 
 cat *.txt  newfilename.txt
 
 Doing both should only take a few seconds, depending on your file sizes.
 
 Cheers!
 Tom
 
 
 Using sed hadn't occurred to me. I guess I'm just awk-ward grin/.
 A slightly different way would be:
 
 for file in *.txt;do
  sed '1d' ${file}
 done newfilename.txt
 
 that way the original files are not modified.  But it strips out the
 header on the 1st file as well. Not a big deal, but the read.table
 will need to be changed to accommodate that. Also, it creates an
 otherwise unnecessary intermediate file newfilename.txt. To get the
 1st file's header, the script could:
 
 head -1 newfilename.txt
 for file in *.txt;do
   sed '1d' ${file}
 done newfilename.txt
 
 I really like having multiple answers to a given problem. Especially
 since I have a poorly implemented version of awk on one of my
 systems. It is the vendor's awk and conforms exactly to the POSIX
 definition with no additions. So I don't have the FNR built-in
 variable. Your implementation would work well on that system. Well, if
 there were a version of R for it. It is a branded UNIX system which
 was designed to be totally __and only__ POSIX compliant, with few
 (maybe no) extensions at all. IOW, it stinks. No, it can't be
 replaced. It is the z/OS system from IBM which is EBCDIC based and
 runs on the big iron mainframe, system z.
 
 -- 

On the Mac the awk equivalent is gawk. Within R you would use `system()` 
possibly using paste0() to construct a string to send.

-- 



David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Best way to merge 300+ .5MB dataframes?

2014-08-12 Thread Prof Brian Ripley

On 12/08/2014 07:07, David Winsemius wrote:


On Aug 11, 2014, at 8:01 PM, John McKown wrote:


On Mon, Aug 11, 2014 at 9:43 PM, Thomas Adams tea...@gmail.com wrote:

Grant,

Assuming all your filenames are something like file1.txt,
file2.txt,file3.txt... And using the Mac OSX terminal app (after you cd to
the directory where your files are located...

This will strip off the 1st lines, that is, your header lines:

for file in *.txt;do
sed -i '1d'${file};
done

Then, do this:

cat *.txt  newfilename.txt

Doing both should only take a few seconds, depending on your file sizes.

Cheers!
Tom



Using sed hadn't occurred to me. I guess I'm just awk-ward grin/.
A slightly different way would be:

for file in *.txt;do
  sed '1d' ${file}
done newfilename.txt

that way the original files are not modified.  But it strips out the
header on the 1st file as well. Not a big deal, but the read.table
will need to be changed to accommodate that. Also, it creates an
otherwise unnecessary intermediate file newfilename.txt. To get the
1st file's header, the script could:

head -1 newfilename.txt
for file in *.txt;do
   sed '1d' ${file}
done newfilename.txt

I really like having multiple answers to a given problem. Especially
since I have a poorly implemented version of awk on one of my
systems. It is the vendor's awk and conforms exactly to the POSIX
definition with no additions. So I don't have the FNR built-in
variable. Your implementation would work well on that system. Well, if
there were a version of R for it. It is a branded UNIX system which
was designed to be totally __and only__ POSIX compliant, with few
(maybe no) extensions at all. IOW, it stinks. No, it can't be
replaced. It is the z/OS system from IBM which is EBCDIC based and
runs on the big iron mainframe, system z.

--


On the Mac the awk equivalent is gawk. Within R you would use `system()` 
possibly using paste0() to construct a string to send.


For historical reasons this is actually part of R's configuration: see 
the AWK entry in R_HOME/etc/Makeconf.  (There is an SED entry too: not 
all sed's in current OSes are POSIX-compliant.)


Using system2() rather than system() is recommended for new code.

--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford
1 South Parks Road, Oxford OX1 3TG, UK

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Prediction intervals (i.e. not CI of the fit) for monotonic loess curve using bootstrapping

2014-08-12 Thread Jan Stanstrup

Hi,

I am trying to find a way to estimate prediction intervals (PI) for a 
monotonic loess curve using bootstrapping.


At the moment my approach is to use the boot function from the boot 
package to bootstrap my loess model, which consist of loess + monoproc 
from the monoproc package (to force the fit to be monotonic which gives 
me much improved results with my particular data). The output from the 
monoproc package is simply the fitted y values at each x-value.
I then use boot.ci (again from the boot package) to get confidence 
intervals. The problem is that this gives me confidence intervals (CI) 
for the fit (is there a proper way to specify this?) and not a 
prediction interval. The interval is thus way too optimistic to give me 
an idea of the confidence interval of a predicted value.


For linear models predict.lm can give PI instead of CI by setting 
interval = prediction. Further discussion of that here:

http://stats.stackexchange.com/questions/82603/understanding-the-confidence-band-from-a-polynomial-regression
http://stats.stackexchange.com/questions/44860/how-to-prediction-intervals-for-linear-regression-via-bootstrapping.

However I don't see a way to do that for boot.ci. Does there exist a way 
to get PIs after bootstrapping? If some sample code is required I am 
more than happy to supply it but I thought the question was general 
enough to be understandable without it.



Any hints are highly appreciated.


--
Jan Stanstrup
Postdoc

Metabolomics
Food Quality and Nutrition
Fondazione Edmund Mach

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] script to data clear

2014-08-12 Thread Maicel Monzón Pérez
Hello List,

I did this script to clear data after import (I don’t know is ok ). After
its execution levels and label values got lost. Could some explain me to
reassign levels again in the script (new depurate value)? 

Best regard

Maicel Monzon MD, PHD

Center of Cybernetic Apply to Medicine

# data cleaning  script

library(stringr)

for(i in 1:length(data)) { 

  if (is.factor(data[[i]])==T) 

  {for(j in 1:sum(str_detect(data[,i],   ))) 

  {data[[i]]-str_replace_all(data[[i]],   ,  )}}

  data[[i]]-str_trim (data[[i]],side = both)

  data[[i]]-tolower(data[[i]])

}

Note: “   ” is 2 blank space  and “ “  only one

 



--
Nunca digas nunca, di mejor: gracias, permiso, disculpe.

Este mensaje le ha llegado mediante el servicio de correo electronico que 
ofrece Infomed para respaldar el cumplimiento de las misiones del Sistema 
Nacional de Salud. La persona que envia este correo asume el compromiso de usar 
el servicio a tales fines y cumplir con las regulaciones establecidas

Infomed: http://www.sld.cu/




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Multivariate tobit regression

2014-08-12 Thread Vera Miguéis
Dear R-users,
 
I would like to run a multivariate tobit model in R. Is there any package
available to perform this task?
 
Best regards,
Printil
  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Superimposing graphs

2014-08-12 Thread Naser Jamil
Dear Richard and Duncan,
your suggestions are absolutely serving what I need. But I would like to
see x-axis to be up to 30 instead of 20. Do you have any suggestion on that?

Many thanks for your kind help.

Regards,

Jamil.


On 12 August 2014 01:22, Duncan Mackay dulca...@bigpond.com wrote:

 Hi

 If you want a 1 package and 1 function approach try this

 xyplot(conc ~ time | factor(subject, levels = c(2,1,3)), data = data.d,
 par.settings = list(strip.background = list(col = transparent)),
 layout = c(3,1),
 aspect = 1,
 type   = c(b,g),
 scales = list(alternating = FALSE),
 panel = function(x,y,...){

   panel.xyplot(x,y,...)

   # f1-function(x,v,cl,t)
   # (x/v)*exp(-(cl/v)*t) f1(0.5,0.5,0.06,t),
   panel.curve((0.5/0.5)*exp(-(0.06/0.5)*x),0,30)

 }
  )

 # par.settings ... if you are publishing show text better
 # with factor if you want 1:3 omit the levels
 # has advantage of doing more things than in groupedData as Doug Bates has
 said

 Regards

 Duncan Mackay
 Department of Agronomy and Soil Science
 University of New England
 Armidale NSW 2351
 Email: home: mac...@northnet.com.au

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On
 Behalf Of Naser Jamil
 Sent: Monday, 11 August 2014 19:06
 To: R help
 Subject: [R] Superimposing graphs

 Dear R-user,
 May I seek your help to sort out a little problem. I have the following
 codes
 to draw two graphs. I want to superimpose the second one on each of the
 first one.

 

 library(nlme)
 subject-c(1,1,1,2,2,2,3,3,3)
 time-c(0.0,5.4,21.0,0.0,5.4,21.0,0.0,5.4,21.0)
 con.cohort-c(1.10971703,0.54535512,0.07176724,0.75912539,0.47825282,
 0.10593292,1.20808375,0.47638394,0.02808967)

 data.d=data.frame(subject=subject,time=time,conc=con.cohort)
 grouped.data-groupedData(formula=conc~time | subject, data =data.d)

 plot(grouped.data)

 ##

 f1-function(x,v,cl,t) {
 (x/v)*exp(-(cl/v)*t)
   }
 t-seq(0,30, .01)
 plot(t,f1(0.5,0.5,0.06,t),type=l,pch=18, ylim=c(), xlab=time,
 ylab=conc)


 ###

 Any suggestion will really be helpful.


 Regards,

 Jamil.

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] bugs and misfeatures in polr(MASS).... fixed!

2014-08-12 Thread tjb
The official maintainers were dismissive when I suggested there were some
problems I could fix with the then implementation of polr. I haven't looked
at it since, sorry.


On Tue, Aug 12, 2014 at 7:44 PM, Guido Biele [via R] 
ml-node+s789695n4695392...@n4.nabble.com wrote:

 I modified (where neccessary) the file polr.R of the current MASS package
 (7.3-33) following the fixes in fixed-polr.R* and it is still working.
 the original polr.R file had implemented some of Tim's suggestion, but not
 the new method to generate starting values for the optimization.

 Does anybody know why polr was only partially fixed?

 Regards - Guido

 *http://r.789695.n4.nabble.com/attachment/4647403/0/fixed-polr.R

 --
  If you reply to this email, your message will be added to the discussion
 below:

 http://r.789695.n4.nabble.com/bugs-and-misfeatures-in-polr-MASS-fixed-tp3024677p4695392.html
  To unsubscribe from bugs and misfeatures in polr(MASS) fixed!, click
 here
 http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3024677code=dGltb3RoeS5iZW5oYW1AdXFjb25uZWN0LmVkdS5hdXwzMDI0Njc3fDE5NTE2NDMxMjk=
 .
 NAML
 http://r.789695.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml





-
Tim J. Benham
--
View this message in context: 
http://r.789695.n4.nabble.com/bugs-and-misfeatures-in-polr-MASS-fixed-tp3024677p4695394.html
Sent from the R help mailing list archive at Nabble.com.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Superimposing graphs

2014-08-12 Thread Richard M. Heiberger
Yes, use xlim=c(0, 30) in your definition of P1

On Tue, Aug 12, 2014 at 7:26 AM, Naser Jamil jamilnase...@gmail.com wrote:
 Dear Richard and Duncan,
 your suggestions are absolutely serving what I need. But I would like to
 see x-axis to be up to 30 instead of 20. Do you have any suggestion on that?

 Many thanks for your kind help.

 Regards,

 Jamil.


 On 12 August 2014 01:22, Duncan Mackay dulca...@bigpond.com wrote:

 Hi

 If you want a 1 package and 1 function approach try this

 xyplot(conc ~ time | factor(subject, levels = c(2,1,3)), data = data.d,
 par.settings = list(strip.background = list(col = transparent)),
 layout = c(3,1),
 aspect = 1,
 type   = c(b,g),
 scales = list(alternating = FALSE),
 panel = function(x,y,...){

   panel.xyplot(x,y,...)

   # f1-function(x,v,cl,t)
   # (x/v)*exp(-(cl/v)*t) f1(0.5,0.5,0.06,t),
   panel.curve((0.5/0.5)*exp(-(0.06/0.5)*x),0,30)

 }
  )

 # par.settings ... if you are publishing show text better
 # with factor if you want 1:3 omit the levels
 # has advantage of doing more things than in groupedData as Doug Bates has
 said

 Regards

 Duncan Mackay
 Department of Agronomy and Soil Science
 University of New England
 Armidale NSW 2351
 Email: home: mac...@northnet.com.au

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On
 Behalf Of Naser Jamil
 Sent: Monday, 11 August 2014 19:06
 To: R help
 Subject: [R] Superimposing graphs

 Dear R-user,
 May I seek your help to sort out a little problem. I have the following
 codes
 to draw two graphs. I want to superimpose the second one on each of the
 first one.

 

 library(nlme)
 subject-c(1,1,1,2,2,2,3,3,3)
 time-c(0.0,5.4,21.0,0.0,5.4,21.0,0.0,5.4,21.0)
 con.cohort-c(1.10971703,0.54535512,0.07176724,0.75912539,0.47825282,
 0.10593292,1.20808375,0.47638394,0.02808967)

 data.d=data.frame(subject=subject,time=time,conc=con.cohort)
 grouped.data-groupedData(formula=conc~time | subject, data =data.d)

 plot(grouped.data)

 ##

 f1-function(x,v,cl,t) {
 (x/v)*exp(-(cl/v)*t)
   }
 t-seq(0,30, .01)
 plot(t,f1(0.5,0.5,0.06,t),type=l,pch=18, ylim=c(), xlab=time,
 ylab=conc)


 ###

 Any suggestion will really be helpful.


 Regards,

 Jamil.

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Best way to merge 300+ .5MB dataframes?

2014-08-12 Thread Grant Rettke
Thank you all kindly.
Grant Rettke | ACM, AMA, COG, IEEE
gret...@acm.org | http://www.wisdomandwonder.com/
“Wisdom begins in wonder.” --Socrates
((λ (x) (x x)) (λ (x) (x x)))
“Life has become immeasurably better since I have been forced to stop
taking it seriously.” --Thompson


On Tue, Aug 12, 2014 at 1:07 AM, David Winsemius dwinsem...@comcast.net wrote:

 On Aug 11, 2014, at 8:01 PM, John McKown wrote:

 On Mon, Aug 11, 2014 at 9:43 PM, Thomas Adams tea...@gmail.com wrote:
 Grant,

 Assuming all your filenames are something like file1.txt,
 file2.txt,file3.txt... And using the Mac OSX terminal app (after you cd to
 the directory where your files are located...

 This will strip off the 1st lines, that is, your header lines:

 for file in *.txt;do
 sed -i '1d'${file};
 done

 Then, do this:

 cat *.txt  newfilename.txt

 Doing both should only take a few seconds, depending on your file sizes.

 Cheers!
 Tom


 Using sed hadn't occurred to me. I guess I'm just awk-ward grin/.
 A slightly different way would be:

 for file in *.txt;do
  sed '1d' ${file}
 done newfilename.txt

 that way the original files are not modified.  But it strips out the
 header on the 1st file as well. Not a big deal, but the read.table
 will need to be changed to accommodate that. Also, it creates an
 otherwise unnecessary intermediate file newfilename.txt. To get the
 1st file's header, the script could:

 head -1 newfilename.txt
 for file in *.txt;do
   sed '1d' ${file}
 done newfilename.txt

 I really like having multiple answers to a given problem. Especially
 since I have a poorly implemented version of awk on one of my
 systems. It is the vendor's awk and conforms exactly to the POSIX
 definition with no additions. So I don't have the FNR built-in
 variable. Your implementation would work well on that system. Well, if
 there were a version of R for it. It is a branded UNIX system which
 was designed to be totally __and only__ POSIX compliant, with few
 (maybe no) extensions at all. IOW, it stinks. No, it can't be
 replaced. It is the z/OS system from IBM which is EBCDIC based and
 runs on the big iron mainframe, system z.

 --

 On the Mac the awk equivalent is gawk. Within R you would use `system()` 
 possibly using paste0() to construct a string to send.

 --



 David Winsemius
 Alameda, CA, USA

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Superimposing graphs

2014-08-12 Thread Naser Jamil
That's perfect! Many thanks.


On 12 August 2014 14:32, Richard M. Heiberger r...@temple.edu wrote:

 Yes, use xlim=c(0, 30) in your definition of P1

 On Tue, Aug 12, 2014 at 7:26 AM, Naser Jamil jamilnase...@gmail.com
 wrote:
  Dear Richard and Duncan,
  your suggestions are absolutely serving what I need. But I would like to
  see x-axis to be up to 30 instead of 20. Do you have any suggestion on
 that?
 
  Many thanks for your kind help.
 
  Regards,
 
  Jamil.
 
 
  On 12 August 2014 01:22, Duncan Mackay dulca...@bigpond.com wrote:
 
  Hi
 
  If you want a 1 package and 1 function approach try this
 
  xyplot(conc ~ time | factor(subject, levels = c(2,1,3)), data = data.d,
  par.settings = list(strip.background = list(col =
 transparent)),
  layout = c(3,1),
  aspect = 1,
  type   = c(b,g),
  scales = list(alternating = FALSE),
  panel = function(x,y,...){
 
panel.xyplot(x,y,...)
 
# f1-function(x,v,cl,t)
# (x/v)*exp(-(cl/v)*t) f1(0.5,0.5,0.06,t),
panel.curve((0.5/0.5)*exp(-(0.06/0.5)*x),0,30)
 
  }
   )
 
  # par.settings ... if you are publishing show text better
  # with factor if you want 1:3 omit the levels
  # has advantage of doing more things than in groupedData as Doug Bates
 has
  said
 
  Regards
 
  Duncan Mackay
  Department of Agronomy and Soil Science
  University of New England
  Armidale NSW 2351
  Email: home: mac...@northnet.com.au
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org
 ]
  On
  Behalf Of Naser Jamil
  Sent: Monday, 11 August 2014 19:06
  To: R help
  Subject: [R] Superimposing graphs
 
  Dear R-user,
  May I seek your help to sort out a little problem. I have the following
  codes
  to draw two graphs. I want to superimpose the second one on each of the
  first one.
 
  
 
  library(nlme)
  subject-c(1,1,1,2,2,2,3,3,3)
  time-c(0.0,5.4,21.0,0.0,5.4,21.0,0.0,5.4,21.0)
  con.cohort-c(1.10971703,0.54535512,0.07176724,0.75912539,0.47825282,
  0.10593292,1.20808375,0.47638394,0.02808967)
 
  data.d=data.frame(subject=subject,time=time,conc=con.cohort)
  grouped.data-groupedData(formula=conc~time | subject, data =data.d)
 
  plot(grouped.data)
 
  ##
 
  f1-function(x,v,cl,t) {
  (x/v)*exp(-(cl/v)*t)
}
  t-seq(0,30, .01)
  plot(t,f1(0.5,0.5,0.06,t),type=l,pch=18, ylim=c(), xlab=time,
  ylab=conc)
 
 
  ###
 
  Any suggestion will really be helpful.
 
 
  Regards,
 
  Jamil.
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] script to data clear

2014-08-12 Thread Jeff Newmiller
Without a representative sample of data, it is very hard to understand your 
question or to be specific about suggestions. See [1] for some ideas about how 
to communicate questions online.

Not that clearing data would usually mean deleting it, as in rm(data). From 
context I assume you mean cleaning, where invalid characters need to be 
removed.

Also assuming that you have a data frame with some columns that are categorical 
data:

1) If the values are contaminated or incomplete (don't have rows representing 
every possible category) then it is almost always better to delay converting to 
factor until after data are cleaned. The read.table family of functions include 
a stringsAsFactors=FALSE option that will prevent automatic conversion of 
columns with unknown types into factors. This is also useful for contaminated 
numeric columns. Only after the vector of character data is clean and as 
complete as it can be should you convert to factor.

Note that most data sets have a variety of column types, and even after 
resolving issues discussed here your function is not necessarily going to work 
with every input data file that you encounter. Specifically, not every column 
of data should be converted to factor. With this in mind, it can be helpful to 
look for ways to confirm that the date you are processing is what you expect it 
to be. Often this is implemented by confirming that specific columns have 
specific kinds of data in them. That is using a loop may be TOO flexible... 
apply this cleaning loop cautiously.

2) Most functions in R can process whole vectors of data at once, so your inner 
loop should not be necessary. Specifically, the line

data[[i]] - gsub(  +,  , data[[i]] )

would replace all sequences of one or more spaces in every element of the 
vector with a single space.

(Your j loop also goes too many times... str_replace_all(data[[i]],   ,  ) 
is affecting the whole column, but you repeat it unnecessarily.)

 3) I don't know what a depurate value is.

4) You should be able to convert your cleaned character column to factor with 
the factor function... like

data[[i]] - factor( data[[i]] )

Note that if you know certain levels should be possible but not all of them are 
actually present (e.g. Small, Medium, and Large but no data with Small 
are present) then you will need to specify the levels as a parameter to the 
factor function. See the help file ?factor.

5) You have several lines of code at the end that appear to execute regardless 
of whether the column is a factor or not. They should be within the braces of 
the if statement.

6) Please read the Posting Guide mentioned at the end of this and every post on 
this list, specifically regarding posting in plain text. Your code was 
partially damaged by the HTML email format.

[1] 
http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On August 12, 2014 5:42:13 AM PDT, Maicel Monzón Pérez 
mai...@infomed.sld.cu wrote:
Hello List,

I did this script to clear data after import (I don�t know is ok ).
After
its execution levels and label values got lost. Could some explain me
to
reassign levels again in the script (new depurate value)? 

Best regard

Maicel Monzon MD, PHD

Center of Cybernetic Apply to Medicine

# data cleaning  script

library(stringr)

for(i in 1:length(data)) { 

  if (is.factor(data[[i]])==T) 

  {for(j in 1:sum(str_detect(data[,i],   ))) 

  {data[[i]]-str_replace_all(data[[i]],   ,  )}}

  data[[i]]-str_trim (data[[i]],side = both)

  data[[i]]-tolower(data[[i]])

}

Note: �   � is 2 blank space  and � �  only one

 



--
Nunca digas nunca, di mejor: gracias, permiso, disculpe.

Este mensaje le ha llegado mediante el servicio de correo electronico
que ofrece Infomed para respaldar el cumplimiento de las misiones del
Sistema Nacional de Salud. La persona que envia este correo asume el
compromiso de usar el servicio a tales fines y cumplir con las
regulaciones establecidas

Infomed: http://www.sld.cu/




   [[alternative HTML version deleted]]





__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Prediction intervals (i.e. not CI of the fit) for monotonic loess curve using bootstrapping

2014-08-12 Thread David Winsemius

On Aug 12, 2014, at 12:23 AM, Jan Stanstrup wrote:

 Hi,
 
 I am trying to find a way to estimate prediction intervals (PI) for a 
 monotonic loess curve using bootstrapping.
 
 At the moment my approach is to use the boot function from the boot package 
 to bootstrap my loess model, which consist of loess + monoproc from the 
 monoproc package (to force the fit to be monotonic which gives me much 
 improved results with my particular data). The output from the monoproc 
 package is simply the fitted y values at each x-value.
 I then use boot.ci (again from the boot package) to get confidence intervals. 
 The problem is that this gives me confidence intervals (CI) for the fit (is 
 there a proper way to specify this?) and not a prediction interval. The 
 interval is thus way too optimistic to give me an idea of the confidence 
 interval of a predicted value.
 
 For linear models predict.lm can give PI instead of CI by setting interval = 
 prediction. Further discussion of that here:
 http://stats.stackexchange.com/questions/82603/understanding-the-confidence-band-from-a-polynomial-regression
 http://stats.stackexchange.com/questions/44860/how-to-prediction-intervals-for-linear-regression-via-bootstrapping.
 
 However I don't see a way to do that for boot.ci. Does there exist a way to 
 get PIs after bootstrapping? If some sample code is required I am more than 
 happy to supply it but I thought the question was general enough to be 
 understandable without it.
 

Why not use the quantreg package to estimate the quantiles of interest to you? 
That way you would not be depending on Normal theory assumptions which you 
apparently don't trust. I've used it with the `cobs` function from the package 
of the same name to implement the monotonic constraint. I think there is a 
worked example in the quantreg package, but since I bought Koenker's book, I 
may be remembering from there.
-- 

David Winsemius
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Prediction intervals (i.e. not CI of the fit) for monotonic loess curve using bootstrapping

2014-08-12 Thread Bert Gunter
PI's of what? -- future individual values or mean values?

I assume quantreg provides quantiles for the latter, not the former.
(See ?predict.lm for a terse explanation of the difference). Both are
obtainable from bootstrapping but the details depend on what you are
prepared to assume. Consult references or your local statistician for
help if needed.

-- Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
Clifford Stoll




On Tue, Aug 12, 2014 at 8:20 AM, David Winsemius dwinsem...@comcast.net wrote:

 On Aug 12, 2014, at 12:23 AM, Jan Stanstrup wrote:

 Hi,

 I am trying to find a way to estimate prediction intervals (PI) for a 
 monotonic loess curve using bootstrapping.

 At the moment my approach is to use the boot function from the boot package 
 to bootstrap my loess model, which consist of loess + monoproc from the 
 monoproc package (to force the fit to be monotonic which gives me much 
 improved results with my particular data). The output from the monoproc 
 package is simply the fitted y values at each x-value.
 I then use boot.ci (again from the boot package) to get confidence 
 intervals. The problem is that this gives me confidence intervals (CI) for 
 the fit (is there a proper way to specify this?) and not a prediction 
 interval. The interval is thus way too optimistic to give me an idea of the 
 confidence interval of a predicted value.

 For linear models predict.lm can give PI instead of CI by setting interval = 
 prediction. Further discussion of that here:
 http://stats.stackexchange.com/questions/82603/understanding-the-confidence-band-from-a-polynomial-regression
 http://stats.stackexchange.com/questions/44860/how-to-prediction-intervals-for-linear-regression-via-bootstrapping.

 However I don't see a way to do that for boot.ci. Does there exist a way to 
 get PIs after bootstrapping? If some sample code is required I am more than 
 happy to supply it but I thought the question was general enough to be 
 understandable without it.


 Why not use the quantreg package to estimate the quantiles of interest to 
 you? That way you would not be depending on Normal theory assumptions which 
 you apparently don't trust. I've used it with the `cobs` function from the 
 package of the same name to implement the monotonic constraint. I think there 
 is a worked example in the quantreg package, but since I bought Koenker's 
 book, I may be remembering from there.
 --

 David Winsemius
 Alameda, CA, USA

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] pass vector binding to DBI parameter (rsqlite)

2014-08-12 Thread Dan Muresan
Hi, is there a way to bind vectors to DBI query parameters? The
following tells me that vectors are sent as separate values:

 library(RSQLite)
 c - dbConnect (SQLite())
 dbGetQuery(c, create table tst (x int, y int))
 dbGetQuery(c, insert into tst values (?, ?), data.frame(x=c (1,2,1,2), 
 y=c(3, 4, 5, 6)))
 dbReadTable(c, tst)
  x y
1 1 3
2 2 4
3 1 5
4 2 6
 dbGetQuery(c, select * from tst where y not in (?), c(7,6))
  x y
1 1 3
2 2 4
3 1 5
4 2 6
5 1 3
6 2 4
7 1 5

This looks like 2 result sets (4 + 3 entries), not one.

Is there to send multiple values to a '?' binding? Is this at all
possible using the R DBI interface (not necessarily with rsqlite)?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] how to process multiple data files using R loop

2014-08-12 Thread Fix Ace
Thank you very much for all replies:) Here is my working code:

for(i in ls(pattern=P_)){print(head(get(i),2))}



On Monday, August 11, 2014 11:04 AM, Greg Snow 538...@gmail.com wrote:
 


In addition to the solution and comments that you have already
received, here are a couple of additional comments:

This is a variant on FAQ 7.21, if you had found that FAQ then it would
have told you about the get function.

The most important part of the answer in FAQ 7.21 is the last part
where it says that it is better to use a list.� If all the objects of
interest are related and you want to do the same or similar things to
each one, then having them all stored in a single list can simplify
things for the future.� You can collect all the objects into a single
list using the mget command, e.g.:

P_objects - mget( ls(pattern='P_'))

Now that they are in a list you can do the equivalent of your loop,
but simpler with the lapply function, e.g.:

lapply( P_objects, head, 2 )

And if you want to do other things with all these objects, such as
save them, plot them, do a regression analysis on them, delete them,
etc. then you can do that using lapply/sapply as well in a simpler way
than looping.



On Fri, Aug 8, 2014 at 12:25 PM, Fix Ace ace...@rocketmail.com wrote:
 I have 16 files and would like to check the information of their first two 
 lines, what I did:


 ls(pattern=P_)
� [1] P_3_utr_source_data� � � � � � �  P_5_utr_source_data
� [3] P_exon_per_gene_cds_source_data�  P_exon_per_gene_source_data
� [5] P_exon_source_data� � � � � � � � P_first_exon_oncds_source_data
� [7] P_first_intron_oncds_source_data� P_first_intron_ongene_source_data
� [9] P_firt_exon_ongene_source_data� � P_gene_cds_source_data
 [11] P_gene_source_data� � � � � � � � P_intron_source_data
 [13] P_last_exon_oncds_source_data� �  P_last_exon_ongene_source_data
 [15] P_last_intron_oncds_source_data�  P_last_intron_ongene_source_data



for(i in ls(pattern=P_)){head(i, 2)}

 It obviously does not work since nothing came out

 What I would like to see for the output is :

 head(P_3_utr_source_data,2)
�  V1
 1� 1
 2� 1
 head(P_5_utr_source_data,2)
�  V1
 1� 1
 2� 1

 .

 .
 .



 Could anybody help me with this?

 Thank you very much for your time:)
� � � �  [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




-- 
Gregory (Greg) L. Snow Ph.D.
538...@gmail.com
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pass vector binding to DBI parameter (rsqlite)

2014-08-12 Thread John McKown
On Tue, Aug 12, 2014 at 10:55 AM, Dan Muresan danm...@gmail.com wrote:
 Hi, is there a way to bind vectors to DBI query parameters? The
 following tells me that vectors are sent as separate values:

 library(RSQLite)
 c - dbConnect (SQLite())
 dbGetQuery(c, create table tst (x int, y int))
 dbGetQuery(c, insert into tst values (?, ?), data.frame(x=c (1,2,1,2), 
 y=c(3, 4, 5, 6)))
 dbReadTable(c, tst)
   x y
 1 1 3
 2 2 4
 3 1 5
 4 2 6
 dbGetQuery(c, select * from tst where y not in (?), c(7,6))
   x y
 1 1 3
 2 2 4
 3 1 5
 4 2 6
 5 1 3
 6 2 4
 7 1 5

 This looks like 2 result sets (4 + 3 entries), not one.

 Is there to send multiple values to a '?' binding? Is this at all
 possible using the R DBI interface (not necessarily with rsqlite)?

I don't really _know_ much, but what I would try would be something like:

dbGetQuery(c,select * from tst where y not in (?),paste(c(7,6),collapse=','));

The paste(c(7,6),collapse=',') results in the string 6,7. You could
always subject yourself to a SQL injection attack by doing:

dbGetQuery(c,paste(select * from tst where y not in
(,c(7,6),),collapse=','));

If you do this and use a variable instead of the c(7,6), make sure you
cleanse the contents of the variable. Just as making sure that there
is no bare semi-colon in it. And other things that don't come to
mind off hand.

Hum, perhaps better:

values-c(7,6);
dbGetQuery(c,paste(select * from tst where y not in (,
paste(rep('?',length(values)),collapse=','),
)),
values);

As you can see, this dynamically adjusts the number of ? marks in the
SELECT statement, based on the number of elements in the values
variable.

-- 
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! 
John McKown

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] generating a sequence of seconds

2014-08-12 Thread Erin Hodgess
Hello!

If I would like to generate a sequence of seconds for a date, I would do
the following:

x - seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12
23:59:59),by=secs)

What if I just want the seconds vector without the date, please?  Is there
a convenient way to create such a vector, please?

thanks,
Erin


-- 
Erin Hodgess
Associate Professor
Department of Mathematical and Statistics
University of Houston - Downtown
mailto: erinm.hodg...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Validation of the Markov chain assumption

2014-08-12 Thread Kaptue Tchuente, Armel
Hi,
I 'm modelling the occurrence of daily rainfall with a first order Markov chain.
I would like to know if there is a statistic test implemented in R that could 
allow me to asses that the observed rainfall time series verifies the Markov 
assumption.
Thanks
P.S. My apologies for cross-posting since I send this question by mistake to an 
inadequate R mailing list.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] t.test of matching columns from two datasets using plyr

2014-08-12 Thread Felipe Carrillo
Hi,
I Have two datasets df1 and df2 with 3 matching columns. I need to do a t.test
of sp1, sp2 and sp3� and var1, var2 and var3 where the year, month and location 
match. 
I can do it with sapply or mapply but I want the end result to be a data.frame. 
I prefer to do it with
plyr or dplyr as I have been using these packages throughout this project. My 
final
dataframe should have the t.test statistic and the p.value.
�
Sample datasets
first dataframe
df1 - structure(list(Year = c(1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 
1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 
1995L, 1995L, 1995L, 1995L, 1995L, 1995L), month = c(Feb, Mar, 
Mar, Mar, Mar, Mar, Mar, Mar, Mar, Mar, Mar, 
Mar, Mar, Mar, Mar, Apr, Apr, Apr, Apr, Apr, 
Apr), location = structure(c(5L, 5L, 5L, 5L, 5L, 5L, 2L, 4L, 
4L, 1L, 4L, 4L, 3L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 2L), .Label = c(Far West, 
North, Other, South, West), class = factor), var1 = c(111.6, 
0, 0, 0, 0, 0, 0, 14, 0, 0, 0, 31.4, 245.9, 46.3, 59.8, 206.1, 
200.3, 88, 73.4, 33.9, 7.1), var2 = c(0, 4.7, 4.4, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 159.8, 0, 0, 142.2, 94.3, 0, 0, 0, 0), var3 = c(180.2, 
14.1, 123.7, 17.4, 5.5, 12.9, 39.3, 21, 66.6, 12.2, 13.6, 15.7, 
36.9, 0, 143.5, 35.5, 235.6, 51.3, 230.6, 81.3, 190.9)), .Names = c(Year, 
month, location, var1, var2, var3), row.names = 17093:17113, class = 
data.frame)
second dataframe
df2 - structure(list(Year = c(1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 
1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 1995L, 
1995L, 1995L, 1995L, 1995L, 1995L, 1995L), month = c(Apr, Apr, 
Apr, Apr, Apr, Apr, Apr, Apr, May, May, May, 
May, May, May, May, May, May, May, May, May, 
May), location = structure(c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
1L, 1L, 1L, 1L, 1L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c(Far West, 
North, South, West), class = factor), sp1 = c(853.0055629, 
147.7158909, 160.1536518, 65.01652491, 2332.609706, 701.4706852, 
11.36420842, 0, 2645.671425, 2769.409257, 523.4284249, 135.1274855, 
72.22498557, 35.07497333, 572.087043, 150.4768424, 111.5881472, 
61.21848041, 392.0651906, 0, 771.0337355), sp2 = c(10.27717546, 
0, 0, 0, 0, 10.16624181, 0, 0, 0, 307.7121397, 52.34284249, 19.30392649, 
24.07499519, 0, 35.75544018, 42.99338354, 0, 40.81232027, 0, 
90.9210806, 622.7580172), sp3 = c(92.49457911, 128.0204387, 203.8319205, 
175.5446173, 120.6522262, 71.1636927, 107.95998, 57.14456898, 
43.37166271, 153.8560698, 104.685685, 77.21570598, 96.29998075, 
187.0665244, 0, 0, 111.5881472, 163.2492811, 26.13767938, 45.4605403, 
207.5860057)), .Names = c(Year, month, location, sp1, 
sp2, sp3), row.names = 30:50, class = data.frame)
�
Thank you much.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generating a sequence of seconds

2014-08-12 Thread Marc Schwartz

On Aug 12, 2014, at 1:51 PM, Erin Hodgess erinm.hodg...@gmail.com wrote:

 Hello!
 
 If I would like to generate a sequence of seconds for a date, I would do
 the following:
 
 x - seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12
 23:59:59),by=secs)
 
 What if I just want the seconds vector without the date, please?  Is there
 a convenient way to create such a vector, please?
 
 thanks,
 Erin


Erin,

Do you want just the numeric vector of seconds, with the first value being 0, 
incrementing by 1 to the final value?

x - seq(from = as.POSIXct(2014-08-12 00:00:00), 
 to = as.POSIXct(2014-08-12 23:59:59), 
 by = secs)

 head(x)
[1] 2014-08-12 00:00:00 CDT 2014-08-12 00:00:01 CDT
[3] 2014-08-12 00:00:02 CDT 2014-08-12 00:00:03 CDT
[5] 2014-08-12 00:00:04 CDT 2014-08-12 00:00:05 CDT

 tail(x)
[1] 2014-08-12 23:59:54 CDT 2014-08-12 23:59:55 CDT
[3] 2014-08-12 23:59:56 CDT 2014-08-12 23:59:57 CDT
[5] 2014-08-12 23:59:58 CDT 2014-08-12 23:59:59 CDT


 head(as.numeric(x - x[1]))
[1] 0 1 2 3 4 5

 tail(as.numeric(x - x[1]))
[1] 86394 86395 86396 86397 86398 86399


Regards,

Marc Schwartz

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generating a sequence of seconds

2014-08-12 Thread William Dunlap
 What if I just want the seconds vector without the date, please?  Is there
 a convenient way to create such a vector, please?

Why do you want such a thing?  E.g., do you want it to print the time
of day without the date?  Or are you trying to avoid numeric problems
when you do regressions with the seconds-since-1970 numbers around
1414918800?  Or is there another problem you want solved?

Note that the number of seconds in a day depends on the day and the
time zone.  In US/Pacific time I get:

   length(seq(from=as.POSIXct(2014-08-12
00:00:00),to=as.POSIXct(2014-08-12 23:59:59), by=secs))
  [1] 86400
   length(seq(from=as.POSIXct(2014-03-09
00:00:00),to=as.POSIXct(2014-03-09 23:59:59), by=secs))
  [1] 82800
   length(seq(from=as.POSIXct(2014-11-02
00:00:00),to=as.POSIXct(2014-11-02 23:59:59), by=secs))
  [1] 9

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Tue, Aug 12, 2014 at 11:51 AM, Erin Hodgess erinm.hodg...@gmail.com wrote:
 Hello!

 If I would like to generate a sequence of seconds for a date, I would do
 the following:

 x - seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12
 23:59:59),by=secs)

 What if I just want the seconds vector without the date, please?  Is there
 a convenient way to create such a vector, please?

 thanks,
 Erin


 --
 Erin Hodgess
 Associate Professor
 Department of Mathematical and Statistics
 University of Houston - Downtown
 mailto: erinm.hodg...@gmail.com

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generating a sequence of seconds

2014-08-12 Thread Erin Hodgess
What I would like to do is to look at several days and determine activities
that happened at times on those days.  I don't really care which days, I
just care about what time.

Thank you!




On Tue, Aug 12, 2014 at 3:14 PM, William Dunlap wdun...@tibco.com wrote:

  What if I just want the seconds vector without the date, please?  Is
 there
  a convenient way to create such a vector, please?

 Why do you want such a thing?  E.g., do you want it to print the time
 of day without the date?  Or are you trying to avoid numeric problems
 when you do regressions with the seconds-since-1970 numbers around
 1414918800?  Or is there another problem you want solved?

 Note that the number of seconds in a day depends on the day and the
 time zone.  In US/Pacific time I get:

length(seq(from=as.POSIXct(2014-08-12
 00:00:00),to=as.POSIXct(2014-08-12 23:59:59), by=secs))
   [1] 86400
length(seq(from=as.POSIXct(2014-03-09
 00:00:00),to=as.POSIXct(2014-03-09 23:59:59), by=secs))
   [1] 82800
length(seq(from=as.POSIXct(2014-11-02
 00:00:00),to=as.POSIXct(2014-11-02 23:59:59), by=secs))
   [1] 9

 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com


 On Tue, Aug 12, 2014 at 11:51 AM, Erin Hodgess erinm.hodg...@gmail.com
 wrote:
  Hello!
 
  If I would like to generate a sequence of seconds for a date, I would do
  the following:
 
  x - seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12
  23:59:59),by=secs)
 
  What if I just want the seconds vector without the date, please?  Is
 there
  a convenient way to create such a vector, please?
 
  thanks,
  Erin
 
 
  --
  Erin Hodgess
  Associate Professor
  Department of Mathematical and Statistics
  University of Houston - Downtown
  mailto: erinm.hodg...@gmail.com
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.




-- 
Erin Hodgess
Associate Professor
Department of Mathematical and Statistics
University of Houston - Downtown
mailto: erinm.hodg...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generating a sequence of seconds

2014-08-12 Thread William Dunlap
If your activities of interest are mainly during the workday then
seconds-since-3am might give good results, avoiding most daylight
savings time issues.  If they are more biologically oriented then
something like seconds before or after sunrise or sunset might be
better.  Both can be expressed as differences between POSIXct times.
Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Tue, Aug 12, 2014 at 12:26 PM, Erin Hodgess erinm.hodg...@gmail.com wrote:
 What I would like to do is to look at several days and determine activities
 that happened at times on those days.  I don't really care which days, I
 just care about what time.

 Thank you!




 On Tue, Aug 12, 2014 at 3:14 PM, William Dunlap wdun...@tibco.com wrote:

  What if I just want the seconds vector without the date, please?  Is
  there
  a convenient way to create such a vector, please?

 Why do you want such a thing?  E.g., do you want it to print the time
 of day without the date?  Or are you trying to avoid numeric problems
 when you do regressions with the seconds-since-1970 numbers around
 1414918800?  Or is there another problem you want solved?

 Note that the number of seconds in a day depends on the day and the
 time zone.  In US/Pacific time I get:

length(seq(from=as.POSIXct(2014-08-12
 00:00:00),to=as.POSIXct(2014-08-12 23:59:59), by=secs))
   [1] 86400
length(seq(from=as.POSIXct(2014-03-09
 00:00:00),to=as.POSIXct(2014-03-09 23:59:59), by=secs))
   [1] 82800
length(seq(from=as.POSIXct(2014-11-02
 00:00:00),to=as.POSIXct(2014-11-02 23:59:59), by=secs))
   [1] 9

 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com


 On Tue, Aug 12, 2014 at 11:51 AM, Erin Hodgess erinm.hodg...@gmail.com
 wrote:
  Hello!
 
  If I would like to generate a sequence of seconds for a date, I would do
  the following:
 
  x - seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12
  23:59:59),by=secs)
 
  What if I just want the seconds vector without the date, please?  Is
  there
  a convenient way to create such a vector, please?
 
  thanks,
  Erin
 
 
  --
  Erin Hodgess
  Associate Professor
  Department of Mathematical and Statistics
  University of Houston - Downtown
  mailto: erinm.hodg...@gmail.com
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.




 --
 Erin Hodgess
 Associate Professor
 Department of Mathematical and Statistics
 University of Houston - Downtown
 mailto: erinm.hodg...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generating a sequence of seconds

2014-08-12 Thread John McKown
On Tue, Aug 12, 2014 at 1:51 PM, Erin Hodgess erinm.hodg...@gmail.com wrote:
 Hello!

 If I would like to generate a sequence of seconds for a date, I would do
 the following:

 x - seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12
 23:59:59),by=secs)

 What if I just want the seconds vector without the date, please?  Is there
 a convenient way to create such a vector, please?

 thanks,
 Erin


 --
 Erin Hodgess

I'm a bit confused by this request. The definition of a POSIXct is:
Class POSIXct represents the (signed) number of seconds since the
beginning of 1970 (in the UTC time zone) as a numeric vector.

So I don't really know what you mean by the seconds portion. There
are 24*60*60 or 86,400 seconds in a day. Those seconds are from +0 at
00:00:00 to +86399 for 23:59:59. Is this what you were asking?

seconds_vector -0:86399; #is the simple way to get the above.

By the definition given above, there is no such thing as a POSIXct
value without a date portion. Any number value will convert to a
date+time. Like a timestamp variable in SQL vs. a time variable.

If you want to display the seconds_vector as HH:MM:SS for some
reason, the simple way is:

character_time=sprintf(%02d:%02d:%02d, # C-style formatting string
 seconds_vector/3600, # hour value
 (seconds_vector%%3600)/60, #minute value
 seconds_vector%%60); #second value

You can simply make that a function

getTimePortion - function(POSIXct_value) {
value_in_seconds=as.integer(POSIXct_value);
sprintf(%02d:%02d:%02d, # C-style
formatting string
 seconds_vector/3600, # hour value
 (seconds_vector%%3600)/60, #minute value
 seconds_vector%%60); #second value
  };

-- 
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! 
John McKown

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generating a sequence of seconds

2014-08-12 Thread Marc Schwartz
Erin,

Is a sequential resolution of seconds required, as per your original post?

If so, then using my approach and specifying the start and end dates and times 
will work, with the coercion of the resultant vector to numeric as I included. 
The method I used (subtracting the first value) will also give you the starting 
second as 0, or you can alter the math to adjust the origin of the vector as 
you desire.

As Bill notes, there will be some days where the number of seconds in the day 
will be something other than 86,400. In Bill's example, it is due to his 
choosing the start and end dates of daylight savings time in a relevant time 
zone. Thus, his second date is short an hour, while the third has an extra hour.

Regards,

Marc


On Aug 12, 2014, at 2:26 PM, Erin Hodgess erinm.hodg...@gmail.com wrote:

 What I would like to do is to look at several days and determine activities
 that happened at times on those days.  I don't really care which days, I
 just care about what time.
 
 Thank you!
 
 
 
 
 On Tue, Aug 12, 2014 at 3:14 PM, William Dunlap wdun...@tibco.com wrote:
 
 What if I just want the seconds vector without the date, please?  Is
 there
 a convenient way to create such a vector, please?
 
 Why do you want such a thing?  E.g., do you want it to print the time
 of day without the date?  Or are you trying to avoid numeric problems
 when you do regressions with the seconds-since-1970 numbers around
 1414918800?  Or is there another problem you want solved?
 
 Note that the number of seconds in a day depends on the day and the
 time zone.  In US/Pacific time I get:
 
 length(seq(from=as.POSIXct(2014-08-12
 00:00:00),to=as.POSIXct(2014-08-12 23:59:59), by=secs))
  [1] 86400
 length(seq(from=as.POSIXct(2014-03-09
 00:00:00),to=as.POSIXct(2014-03-09 23:59:59), by=secs))
  [1] 82800
 length(seq(from=as.POSIXct(2014-11-02
 00:00:00),to=as.POSIXct(2014-11-02 23:59:59), by=secs))
  [1] 9
 
 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com
 
 
 On Tue, Aug 12, 2014 at 11:51 AM, Erin Hodgess erinm.hodg...@gmail.com
 wrote:
 Hello!
 
 If I would like to generate a sequence of seconds for a date, I would do
 the following:
 
 x - seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12
 23:59:59),by=secs)
 
 What if I just want the seconds vector without the date, please?  Is
 there
 a convenient way to create such a vector, please?
 
 thanks,
 Erin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generating a sequence of seconds

2014-08-12 Thread John McKown
On Tue, Aug 12, 2014 at 2:40 PM, John McKown
john.archie.mck...@gmail.com wrote:
snip
 You can simply make that a function

 getTimePortion - function(POSIXct_value) {
 value_in_seconds=as.integer(POSIXct_value);
 sprintf(%02d:%02d:%02d, # C-style
 formatting string
  seconds_vector/3600, # hour value
  (seconds_vector%%3600)/60, #minute value
  seconds_vector%%60); #second value
   };


Sorry, cut'n'pasted that incorrectly

getTimePortion - function(POSIXct_value) {
value_in_seconds=as.integer(POSIXct_value);
sprintf(%02d:%02d:%02d, # C-style
 value_in_seconds/3600, # hour value
 (value_in_seconds%%3600)/60, #minute value
 value_in_seconds_vector%%60); #second value
  };

And the above is vectorized and will work if argument has multiple
values in it.

-- 
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! 
John McKown

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pass vector binding to DBI parameter (rsqlite)

2014-08-12 Thread Dan Muresan
Yes, of course, that's an obvious work-around, thanks. Another one is
to use temporary tables.

But I'd like to know if binding a vector to an SQL parameter is
possible in rsqlite (or even in the DBI API or with other drivers --
it seems to me it isn't). This seems like a nasty shortcoming
(especially in light of SQL injection, but there are other
considerations).

On 8/12/14, John McKown john.archie.mck...@gmail.com wrote:
 On Tue, Aug 12, 2014 at 10:55 AM, Dan Muresan danm...@gmail.com wrote:
 Hi, is there a way to bind vectors to DBI query parameters? The
 following tells me that vectors are sent as separate values:

 library(RSQLite)
 c - dbConnect (SQLite())
 dbGetQuery(c, create table tst (x int, y int))
 dbGetQuery(c, insert into tst values (?, ?), data.frame(x=c (1,2,1,2),
 y=c(3, 4, 5, 6)))
 dbReadTable(c, tst)
   x y
 1 1 3
 2 2 4
 3 1 5
 4 2 6
 dbGetQuery(c, select * from tst where y not in (?), c(7,6))
   x y
 1 1 3
 2 2 4
 3 1 5
 4 2 6
 5 1 3
 6 2 4
 7 1 5

 This looks like 2 result sets (4 + 3 entries), not one.

 Is there to send multiple values to a '?' binding? Is this at all
 possible using the R DBI interface (not necessarily with rsqlite)?

 I don't really _know_ much, but what I would try would be something like:

 dbGetQuery(c,select * from tst where y not in
 (?),paste(c(7,6),collapse=','));

 The paste(c(7,6),collapse=',') results in the string 6,7. You could
 always subject yourself to a SQL injection attack by doing:

 dbGetQuery(c,paste(select * from tst where y not in
 (,c(7,6),),collapse=','));

 If you do this and use a variable instead of the c(7,6), make sure you
 cleanse the contents of the variable. Just as making sure that there
 is no bare semi-colon in it. And other things that don't come to
 mind off hand.

 Hum, perhaps better:

 values-c(7,6);
 dbGetQuery(c,paste(select * from tst where y not in (,

 paste(rep('?',length(values)),collapse=','),
 )),
 values);

 As you can see, this dynamically adjusts the number of ? marks in the
 SELECT statement, based on the number of elements in the values
 variable.

 --
 There is nothing more pleasant than traveling and meeting new people!
 Genghis Khan

 Maranatha! 
 John McKown


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generating a sequence of seconds

2014-08-12 Thread John McKown
And some people wonder why I absolutely abhor daylight saving time.
I'm not really fond of leap years and leap seconds either. Somebody
needs to fix the Earth's rotation and orbit!

On Tue, Aug 12, 2014 at 2:14 PM, William Dunlap wdun...@tibco.com wrote:
 What if I just want the seconds vector without the date, please?  Is there
 a convenient way to create such a vector, please?

 Why do you want such a thing?  E.g., do you want it to print the time
 of day without the date?  Or are you trying to avoid numeric problems
 when you do regressions with the seconds-since-1970 numbers around
 1414918800?  Or is there another problem you want solved?

 Note that the number of seconds in a day depends on the day and the
 time zone.  In US/Pacific time I get:

length(seq(from=as.POSIXct(2014-08-12
 00:00:00),to=as.POSIXct(2014-08-12 23:59:59), by=secs))
   [1] 86400
length(seq(from=as.POSIXct(2014-03-09
 00:00:00),to=as.POSIXct(2014-03-09 23:59:59), by=secs))
   [1] 82800
length(seq(from=as.POSIXct(2014-11-02
 00:00:00),to=as.POSIXct(2014-11-02 23:59:59), by=secs))
   [1] 9

 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com


 On Tue, Aug 12, 2014 at 11:51 AM, Erin Hodgess erinm.hodg...@gmail.com 
 wrote:
 Hello!

 If I would like to generate a sequence of seconds for a date, I would do
 the following:

 x - seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12
 23:59:59),by=secs)

 What if I just want the seconds vector without the date, please?  Is there
 a convenient way to create such a vector, please?

 thanks,
 Erin


 --
 Erin Hodgess
 Associate Professor
 Department of Mathematical and Statistics
 University of Houston - Downtown
 mailto: erinm.hodg...@gmail.com

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! 
John McKown

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generating a sequence of seconds

2014-08-12 Thread Marc Schwartz

On Aug 12, 2014, at 2:49 PM, John McKown john.archie.mck...@gmail.com wrote:

 And some people wonder why I absolutely abhor daylight saving time.
 I'm not really fond of leap years and leap seconds either. Somebody
 needs to fix the Earth's rotation and orbit!


I have been a longtime proponent of slowing the rotation of the Earth on its 
axis, so that we could have longer days to be more productive.

Unfortunately, so far, my wish has gone unfulfilled...at least as it is 
relevant within human lifetimes.

;-)

Regards,

Marc


 
 On Tue, Aug 12, 2014 at 2:14 PM, William Dunlap wdun...@tibco.com wrote:
 What if I just want the seconds vector without the date, please?  Is there
 a convenient way to create such a vector, please?
 
 Why do you want such a thing?  E.g., do you want it to print the time
 of day without the date?  Or are you trying to avoid numeric problems
 when you do regressions with the seconds-since-1970 numbers around
 1414918800?  Or is there another problem you want solved?
 
 Note that the number of seconds in a day depends on the day and the
 time zone.  In US/Pacific time I get:
 
 length(seq(from=as.POSIXct(2014-08-12
 00:00:00),to=as.POSIXct(2014-08-12 23:59:59), by=secs))
  [1] 86400
 length(seq(from=as.POSIXct(2014-03-09
 00:00:00),to=as.POSIXct(2014-03-09 23:59:59), by=secs))
  [1] 82800
 length(seq(from=as.POSIXct(2014-11-02
 00:00:00),to=as.POSIXct(2014-11-02 23:59:59), by=secs))
  [1] 9
 
 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com
 
 
 On Tue, Aug 12, 2014 at 11:51 AM, Erin Hodgess erinm.hodg...@gmail.com 
 wrote:
 Hello!
 
 If I would like to generate a sequence of seconds for a date, I would do
 the following:
 
 x - seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12
 23:59:59),by=secs)
 
 What if I just want the seconds vector without the date, please?  Is there
 a convenient way to create such a vector, please?
 
 thanks,
 Erin

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generating a sequence of seconds

2014-08-12 Thread John McKown
On Tue, Aug 12, 2014 at 2:26 PM, Erin Hodgess erinm.hodg...@gmail.com
wrote:
 What I would like to do is to look at several days and determine
activities
 that happened at times on those days.  I don't really care which days, I
 just care about what time.

 Thank you!


Ah! A light dawns. You want to subset your data based on some part of the
time. Such as between 13:23:00 and 15:10:01 of each day in the sample.
Ignoring the DST issue, which I shouldn't. It is left as an exercise for
the reader. But usually 13:23 is 13*3600+23*60, 48180, seconds after
midnight. 15:10:01 is 15*3600+10*60+1, 54601, seconds after midnight.
Suppose you have a data.frame() in a variable called myData. Further
suppose that the POSIXct variable in this data.frame is called when. You
want to subset this into another data.frame() and call it subsetMyData.

subsetMyData-myData[as.integer(myData$when)%%86400 = 48180 
as.integer(myData$when)%%86400 = 54601,];

Yes, this is ugly. You might make it look nicer, and be easier to
understand, by:

startTime - as.integer(as.difftime(13:23:00,units=secs)); # start on
or after 1:23 p.m.
endTime - as.integer(as.difftime(15:10:01,units=secs)); # end on or
before 3:10:01 p.m.
testTime - as.integer(myData$when)%%86400; #convert to seconds and
eliminate date portion.
subsetMyData -myData[testTime = startTime  testTime = endTime,];

This will work best if myData$when is in GMT instead of local time. Why? No
DST worries. Again, in my opinion, all time date should be recorded in GMT.
Only convert to local time when displaying the data to an ignorant user who
can't handle GMT. Personally, I love to tell people something like: it is
13:59:30 zulu. In my time zone, today, that is 08:59:30 a.m.

-- 
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! 
John McKown

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generating a sequence of seconds

2014-08-12 Thread Bert Gunter
Marc:

You just need to be more patient -- this is already happening:

http://en.wikipedia.org/wiki/Tidal_acceleration

Cheers,
Bert

Bert Gunter
Genentech Nonclinical Biostatistics
(650) 467-7374

Data is not information. Information is not knowledge. And knowledge
is certainly not wisdom.
Clifford Stoll




On Tue, Aug 12, 2014 at 1:10 PM, Marc Schwartz marc_schwa...@me.com wrote:

 On Aug 12, 2014, at 2:49 PM, John McKown john.archie.mck...@gmail.com wrote:

 And some people wonder why I absolutely abhor daylight saving time.
 I'm not really fond of leap years and leap seconds either. Somebody
 needs to fix the Earth's rotation and orbit!


 I have been a longtime proponent of slowing the rotation of the Earth on its 
 axis, so that we could have longer days to be more productive.

 Unfortunately, so far, my wish has gone unfulfilled...at least as it is 
 relevant within human lifetimes.

 ;-)

 Regards,

 Marc



 On Tue, Aug 12, 2014 at 2:14 PM, William Dunlap wdun...@tibco.com wrote:
 What if I just want the seconds vector without the date, please?  Is there
 a convenient way to create such a vector, please?

 Why do you want such a thing?  E.g., do you want it to print the time
 of day without the date?  Or are you trying to avoid numeric problems
 when you do regressions with the seconds-since-1970 numbers around
 1414918800?  Or is there another problem you want solved?

 Note that the number of seconds in a day depends on the day and the
 time zone.  In US/Pacific time I get:

 length(seq(from=as.POSIXct(2014-08-12
 00:00:00),to=as.POSIXct(2014-08-12 23:59:59), by=secs))
  [1] 86400
 length(seq(from=as.POSIXct(2014-03-09
 00:00:00),to=as.POSIXct(2014-03-09 23:59:59), by=secs))
  [1] 82800
 length(seq(from=as.POSIXct(2014-11-02
 00:00:00),to=as.POSIXct(2014-11-02 23:59:59), by=secs))
  [1] 9

 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com


 On Tue, Aug 12, 2014 at 11:51 AM, Erin Hodgess erinm.hodg...@gmail.com 
 wrote:
 Hello!

 If I would like to generate a sequence of seconds for a date, I would do
 the following:

 x - seq(from=as.POSIXct(2014-08-12 00:00:00),to=as.POSIXct(2014-08-12
 23:59:59),by=secs)

 What if I just want the seconds vector without the date, please?  Is there
 a convenient way to create such a vector, please?

 thanks,
 Erin

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generating a sequence of seconds

2014-08-12 Thread William Dunlap
  Again, in my opinion, all time date should be recorded in GMT.

It depends on context.  If you are studying traffic flow or
electricity usage, then you want local time with all its warts
(perhaps stated as time since 3am so any daylight savings time
problems are confined to a small portion of the data), perhaps along
with time since sunrise and time since sunset.

If you are studying astronomy, then UTC is appropropriate.


Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Tue, Aug 12, 2014 at 1:16 PM, John McKown
john.archie.mck...@gmail.com wrote:
 On Tue, Aug 12, 2014 at 2:26 PM, Erin Hodgess erinm.hodg...@gmail.com
 wrote:
 What I would like to do is to look at several days and determine
 activities
 that happened at times on those days.  I don't really care which days, I
 just care about what time.

 Thank you!


 Ah! A light dawns. You want to subset your data based on some part of the
 time. Such as between 13:23:00 and 15:10:01 of each day in the sample.
 Ignoring the DST issue, which I shouldn't. It is left as an exercise for the
 reader. But usually 13:23 is 13*3600+23*60, 48180, seconds after midnight.
 15:10:01 is 15*3600+10*60+1, 54601, seconds after midnight. Suppose you have
 a data.frame() in a variable called myData. Further suppose that the POSIXct
 variable in this data.frame is called when. You want to subset this into
 another data.frame() and call it subsetMyData.

 subsetMyData-myData[as.integer(myData$when)%%86400 = 48180 
 as.integer(myData$when)%%86400 = 54601,];

 Yes, this is ugly. You might make it look nicer, and be easier to
 understand, by:

 startTime - as.integer(as.difftime(13:23:00,units=secs)); # start on or
 after 1:23 p.m.
 endTime - as.integer(as.difftime(15:10:01,units=secs)); # end on or
 before 3:10:01 p.m.
 testTime - as.integer(myData$when)%%86400; #convert to seconds and
 eliminate date portion.
 subsetMyData -myData[testTime = startTime  testTime = endTime,];

 This will work best if myData$when is in GMT instead of local time. Why? No
 DST worries. Again, in my opinion, all time date should be recorded in GMT.
 Only convert to local time when displaying the data to an ignorant user who
 can't handle GMT. Personally, I love to tell people something like: it is
 13:59:30 zulu. In my time zone, today, that is 08:59:30 a.m.

 --
 There is nothing more pleasant than traveling and meeting new people!
 Genghis Khan

 Maranatha! 
 John McKown

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generating a sequence of seconds

2014-08-12 Thread John McKown
On Tue, Aug 12, 2014 at 3:23 PM, William Dunlap wdun...@tibco.com wrote:

   Again, in my opinion, all time date should be recorded in GMT.

 It depends on context.  If you are studying traffic flow or
 electricity usage, then you want local time with all its warts
 (perhaps stated as time since 3am so any daylight savings time
 problems are confined to a small portion of the data), perhaps along
 with time since sunrise and time since sunset.


I see your point. But if my data is in GMT, that is a unique timestamp
value. And, given that, along with location information, I should then be
able to generate a local time for human activity. E.g. when do people go
to lunch? Another plus of this is that there is no confusion during fall
back whether this is the 1st or 2nd instance of something like 02:27:00.
Long ago, I worked for a city government. The recorded everything on the
machine in local time. Including police log entries. Always made me wonder
why some lawyer didn't have a nice window of confusion if something
allegedly happened on time change day and was logged as 02:30:00.



 If you are studying astronomy, then UTC is appropropriate.


 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com


-- 
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! 
John McKown

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pass vector binding to DBI parameter (rsqlite)

2014-08-12 Thread John McKown
On Tue, Aug 12, 2014 at 2:46 PM, Dan Muresan danm...@gmail.com wrote:

 Yes, of course, that's an obvious work-around, thanks. Another one is
 to use temporary tables.

 But I'd like to know if binding a vector to an SQL parameter is
 possible in rsqlite (or even in the DBI API or with other drivers --
 it seems to me it isn't). This seems like a nasty shortcoming
 (especially in light of SQL injection, but there are other
 considerations).


That type of binding seems to be something that was overlooked when the API
was being designed. Or, as some vendor might say: we considered that, but
decided to reject it due to the difficulty of implementation and lack of
need in the vast majority of cases.

-- 
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! 
John McKown

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] A basic statistics question

2014-08-12 Thread Ron Michael
Hi,

I would need to get a clarification on a quite fundamental statistics property, 
hope expeRts here would not mind if I post that here.

I leant that variance-covariance matrix of the standardized data is equal to 
the correlation matrix for the unstandardized data. So I used following data.

Data - structure(c(7L, 5L, 9L, 7L, 8L, 7L, 6L, 6L, 5L, 7L, 8L, 6L, 7L,  7L, 
6L, 7L, 7L, 6L, 8L, 6L, 7L, 7L, 7L, 8L, 7L, 9L, 8L, 7L, 7L,  0L, 10L, 10L, 10L, 
7L, 6L, 8L, 5L, 5L, 6L, 6L, 7L, 11L, 9L, 10L,  0L, 13L, 13L, 10L, 7L, 7L, 7L, 
10L, 7L, 5L, 8L, 7L, 10L, 10L,  10L, 6L, 7L, 6L, 6L, 8L, 8L, 7L, 7L, 7L, 7L, 
8L, 7L, 8L, 6L,  6L, 8L, 7L, 4L, 7L, 7L, 10L, 10L, 6L, 7L, 7L, 12L, 12L, 8L, 
5L,  5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 5L, 4L, 5L, 5L, 5L, 6L,  7L, 5L, 
7L, 5L, 7L, 7L, 7L, 7L, 8L, 7L, 6L, 7L, 7L, 6L, 7L, 7L,  6L, 4L, 4L, 6L, 6L, 
7L, 8L, 7L, 11L, 10L, 8L, 7L, 6L, 6L, 11L,  5L, 4L, 6L, 6L, 6L, 7L, 8L, 7L, 
12L, 4L, 4L, 2L, 5L, 6L, 7L,  6L, 6L, 5L, 6L, 5L, 7L, 7L, 7L, 6L, 5L, 6L, 6L, 
5L, 5L, 6L, 6L,  4L, 4L, 5L, 10L, 10L, 7L, 7L, 6L, 4L, 6L, 10L, 7L, 4L, 6L, 6L, 
 6L, 8L, 8L, 8L, 7L, 8L, 9L, 10L, 7L, 6L, 6L, 8L, 6L, 8L, 3L,  3L, 4L, 5L, 5L, 
6L, 5L, 5L, 6L, 4L, 8L, 7L, 3L, 5L, 6L, 9L, 8L,  9L, 10L, 8L, 9L, 8L, 9L, 8L, 
8L, 9L, 11L, 10L, 9L, 9L, 13L,
 13L,  10L, 7L, 7L, 7L, 9L, 8L, 7L, 6L, 10L, 8L, 7L, 8L, 8L, 3L, 4L,  3L, 7L, 
6L, 6L, 6L, 6L, 5L, 6L, 6L, 6L, 2L, 5L, 7L, 9L, 8L, 9L,  10L, 8L, 8L, 9L, 9L, 
11L, 11L, 11L, 10L, 9L, 9L, 11L, 2L, 3L,  2L, 2L, 2L, 1L, 4L, 4L, 2L, 2L, 1L, 
1L, 1L, 3L, 3L, 4L, 6L, 4L,  5L, 2L, 3L, 5L, 4L, 4L, 2L, 4L, 4L, 5L, 4L, 2L, 
7L, 3L, 3L, 10L,  13L, 11L, 9L, 9L, 7L, 8L, 9L, 6L, 7L, 6L, 5L, 3L, 13L, 3L, 
3L,  0L, 1L, 4L, 5L, 3L, 3L, 0L, 2L, 20L, 3L, 2L, 6L, 5L, 5L, 5L,  2L, 2L, 5L, 
5L, 5L, 4L, 3L, 4L, 4L, 3L, 4L, 10L, 10L, 9L, 8L,  4L, 4L, 8L, 7L, 10L, 3L, 1L, 
9L, 5L, 11L, 9L), .Dim = c(45L,  8L), .Dimnames = list(NULL, c(V1, V7, 
V13, V19, V25,  V31, V37, V43))) 


Data_Normalized - apply(Data, 2, function(x) return((x - mean(x))/sd(x))) 

(t(Data_Normalized) %*% Data_Normalized)/dim(Data_Normalized)[1]



Point is that I am not getting exact CORR matrix. Can somebody point me what I 
am missing here?

Thanks for your pointer. 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] generating a sequence of seconds

2014-08-12 Thread Erin Hodgess
Great!

Thank you!

I think the function with the C-like function should do the trick.




On Tue, Aug 12, 2014 at 4:31 PM, John McKown john.archie.mck...@gmail.com
wrote:

 On Tue, Aug 12, 2014 at 3:23 PM, William Dunlap wdun...@tibco.com wrote:

   Again, in my opinion, all time date should be recorded in GMT.

 It depends on context.  If you are studying traffic flow or
 electricity usage, then you want local time with all its warts
 (perhaps stated as time since 3am so any daylight savings time
 problems are confined to a small portion of the data), perhaps along
 with time since sunrise and time since sunset.


 I see your point. But if my data is in GMT, that is a unique timestamp
 value. And, given that, along with location information, I should then be
 able to generate a local time for human activity. E.g. when do people go
 to lunch? Another plus of this is that there is no confusion during fall
 back whether this is the 1st or 2nd instance of something like 02:27:00.
 Long ago, I worked for a city government. The recorded everything on the
 machine in local time. Including police log entries. Always made me wonder
 why some lawyer didn't have a nice window of confusion if something
 allegedly happened on time change day and was logged as 02:30:00.



 If you are studying astronomy, then UTC is appropropriate.


 Bill Dunlap
 TIBCO Software
 wdunlap tibco.com


 --
 There is nothing more pleasant than traveling and meeting new people!
 Genghis Khan

 Maranatha! 
 John McKown




-- 
Erin Hodgess
Associate Professor
Department of Mathematical and Statistics
University of Houston - Downtown
mailto: erinm.hodg...@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A basic statistics question

2014-08-12 Thread Ted Harding
On 12-Aug-2014 19:57:29 Ron Michael wrote:
 Hi,
 
 I would need to get a clarification on a quite fundamental statistics
 property, hope expeRts here would not mind if I post that here.
 
 I leant that variance-covariance matrix of the standardized data is equal to
 the correlation matrix for the unstandardized data. So I used following data.
 
 Data - structure(c(7L, 5L, 9L, 7L, 8L, 7L, 6L, 6L, 5L, 7L, 8L, 6L, 7L,  7L,
 6L, 7L, 7L, 6L, 8L, 6L, 7L, 7L, 7L, 8L, 7L, 9L, 8L, 7L, 7L,  0L, 10L, 10L,
 10L, 7L, 6L, 8L, 5L, 5L, 6L, 6L, 7L, 11L, 9L, 10L,  0L, 13L, 13L, 10L, 7L,
 7L, 7L, 10L, 7L, 5L, 8L, 7L, 10L, 10L,  10L, 6L, 7L, 6L, 6L, 8L, 8L, 7L, 7L,
 7L, 7L, 8L, 7L, 8L, 6L,  6L, 8L, 7L, 4L, 7L, 7L, 10L, 10L, 6L, 7L, 7L, 12L,
 12L, 8L, 5L,  5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 5L, 4L, 5L, 5L, 5L, 6L,
 7L, 5L, 7L, 5L, 7L, 7L, 7L, 7L, 8L, 7L, 6L, 7L, 7L, 6L, 7L, 7L,  6L, 4L, 4L,
 6L, 6L, 7L, 8L, 7L, 11L, 10L, 8L, 7L, 6L, 6L, 11L,  5L, 4L, 6L, 6L, 6L, 7L,
 8L, 7L, 12L, 4L, 4L, 2L, 5L, 6L, 7L,  6L, 6L, 5L, 6L, 5L, 7L, 7L, 7L, 6L, 5L,
 6L, 6L, 5L, 5L, 6L, 6L,  4L, 4L, 5L, 10L, 10L, 7L, 7L, 6L, 4L, 6L, 10L, 7L,
 4L, 6L, 6L,  6L, 8L, 8L, 8L, 7L, 8L, 9L, 10L, 7L, 6L, 6L, 8L, 6L, 8L, 3L, 
 3L, 4L, 5L, 5L, 6L, 5L, 5L, 6L, 4L, 8L, 7L, 3L, 5L, 6L, 9L, 8L,  9L, 10L, 8L,
 9L, 8L, 9L, 8L, 8L, 9L, 11L, 10L, 9L, 9L, 13L,
  13L,  10L, 7L, 7L, 7L, 9L, 8L, 7L, 6L, 10L, 8L, 7L, 8L, 8L, 3L, 4L,  3L, 7L,
 6L, 6L, 6L, 6L, 5L, 6L, 6L, 6L, 2L, 5L, 7L, 9L, 8L, 9L,  10L, 8L, 8L, 9L, 9L,
 11L, 11L, 11L, 10L, 9L, 9L, 11L, 2L, 3L,  2L, 2L, 2L, 1L, 4L, 4L, 2L, 2L, 1L,
 1L, 1L, 3L, 3L, 4L, 6L, 4L,  5L, 2L, 3L, 5L, 4L, 4L, 2L, 4L, 4L, 5L, 4L, 2L,
 7L, 3L, 3L, 10L,  13L, 11L, 9L, 9L, 7L, 8L, 9L, 6L, 7L, 6L, 5L, 3L, 13L, 3L,
 3L,  0L, 1L, 4L, 5L, 3L, 3L, 0L, 2L, 20L, 3L, 2L, 6L, 5L, 5L, 5L,  2L, 2L,
 5L, 5L, 5L, 4L, 3L, 4L, 4L, 3L, 4L, 10L, 10L, 9L, 8L,  4L, 4L, 8L, 7L, 10L,
 3L, 1L, 9L, 5L, 11L, 9L), .Dim = c(45L,  8L), .Dimnames = list(NULL, c(V1,
 V7, V13, V19, V25,  V31, V37, V43))) 
 
   
 Data_Normalized - apply(Data, 2, function(x) return((x - mean(x))/sd(x))) 
 
 (t(Data_Normalized) %*% Data_Normalized)/dim(Data_Normalized)[1]
 
 
 
 Point is that I am not getting exact CORR matrix. Can somebody point me
 what I am missing here?
 
 Thanks for your pointer.

Try:
  Data_Normalized - apply(Data, 2, function(x) return((x - mean(x))/sd(x)))
  (t(Data_Normalized) %*% Data_Normalized)/(dim(Data_Normalized)[1]-1)

and compare the result with

  cor(Data)

And why? Look at

  ?sd

and note that:

  Details:
 Like 'var' this uses denominator n - 1.

Hoping this helps,
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 12-Aug-2014  Time: 22:32:26
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A basic statistics question

2014-08-12 Thread Rolf Turner

On 13/08/14 07:57, Ron Michael wrote:

Hi,

I would need to get a clarification on a quite fundamental statistics property, 
hope expeRts here would not mind if I post that here.

I leant that variance-covariance matrix of the standardized data is equal to 
the correlation matrix for the unstandardized data. So I used following data.


SNIP


(t(Data_Normalized) %*% Data_Normalized)/dim(Data_Normalized)[1]



Point is that I am not getting exact CORR matrix. Can somebody point me what I 
am missing here?


You are using a denominator of n in calculating your covariance 
matrix for your normalized data.  But these data were normalized using 
the sd() function which (correctly) uses a denominator of n-1 so as to 
obtain an unbiased estimator of the population standard deviation.


If you calculated

   (t(Data_Normalized) %*% Data_Normalized)/(dim(Data_Normalized)[1]-1)

then you would get the same result as you get from cor(Data) (to within 
about 1e-15).


cheers,

Rolf Turner

--
Rolf Turner
Technical Editor ANZJS

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] A basic statistics question

2014-08-12 Thread Ted Harding
On 12-Aug-2014 21:41:52 Rolf Turner wrote:
 On 13/08/14 07:57, Ron Michael wrote:
 Hi,

 I would need to get a clarification on a quite fundamental statistics
 property, hope expeRts here would not mind if I post that here.

 I leant that variance-covariance matrix of the standardized data is equal to
 the correlation matrix for the unstandardized data. So I used following
 data.
 
 SNIP
 
 (t(Data_Normalized) %*% Data_Normalized)/dim(Data_Normalized)[1]

 Point is that I am not getting exact CORR matrix. Can somebody point
 me what I am missing here?
 
 You are using a denominator of n in calculating your covariance 
 matrix for your normalized data.  But these data were normalized using 
 the sd() function which (correctly) uses a denominator of n-1 so as to 
 obtain an unbiased estimator of the population standard deviation.
 
 If you calculated
 
 (t(Data_Normalized) %*% Data_Normalized)/(dim(Data_Normalized)[1]-1)
 
 then you would get the same result as you get from cor(Data) (to within 
 about 1e-15).
 
 cheers,
 Rolf Turner

One could argue about (correctly)!

From the descriptive statistics point of view, if one is given a single
number x, then this dataset has no variation, so one could say that
sd(x) = 0. And this is what one would get with a denominator of n.

But if the single value x is viewed as sampled from a distribution
(with positive dispersion), then the value of x gives no information
about the SD of the distribution. If you use denominator (n-1) then
sd(x) = NA, i.e. is indeterminate (as it should be in this application).

The important thing when using pre-programmed functions is to know
which is being used. R uses (n-1), and this can be found from
looking at

  ?sd

or (with more detail) at

  ?cor

Ron had assumed that the denominator was n, apparently not being aware
that R uses (n-1).

Just a few thoughts ...
Ted.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 12-Aug-2014  Time: 23:22:09
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] pass vector binding to DBI parameter (rsqlite)

2014-08-12 Thread Jeff Newmiller
I am not quite sure what you are complaining about. The ODBC interface 
definition is not vectorized, and that has nothing to do with R... that applies 
across all platforms I have seen. The DBI API is consistent with that. There 
are some proprietary APIs that implement bulk data transfers, but then you are 
stuck with that API.
It might be appropriate to discuss this on R-sig-db if you have better 
information than I do.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

On August 12, 2014 12:46:30 PM PDT, Dan Muresan danm...@gmail.com wrote:
Yes, of course, that's an obvious work-around, thanks. Another one is
to use temporary tables.

But I'd like to know if binding a vector to an SQL parameter is
possible in rsqlite (or even in the DBI API or with other drivers --
it seems to me it isn't). This seems like a nasty shortcoming
(especially in light of SQL injection, but there are other
considerations).

On 8/12/14, John McKown john.archie.mck...@gmail.com wrote:
 On Tue, Aug 12, 2014 at 10:55 AM, Dan Muresan danm...@gmail.com
wrote:
 Hi, is there a way to bind vectors to DBI query parameters? The
 following tells me that vectors are sent as separate values:

 library(RSQLite)
 c - dbConnect (SQLite())
 dbGetQuery(c, create table tst (x int, y int))
 dbGetQuery(c, insert into tst values (?, ?), data.frame(x=c
(1,2,1,2),
 y=c(3, 4, 5, 6)))
 dbReadTable(c, tst)
   x y
 1 1 3
 2 2 4
 3 1 5
 4 2 6
 dbGetQuery(c, select * from tst where y not in (?), c(7,6))
   x y
 1 1 3
 2 2 4
 3 1 5
 4 2 6
 5 1 3
 6 2 4
 7 1 5

 This looks like 2 result sets (4 + 3 entries), not one.

 Is there to send multiple values to a '?' binding? Is this at all
 possible using the R DBI interface (not necessarily with rsqlite)?

 I don't really _know_ much, but what I would try would be something
like:

 dbGetQuery(c,select * from tst where y not in
 (?),paste(c(7,6),collapse=','));

 The paste(c(7,6),collapse=',') results in the string 6,7. You could
 always subject yourself to a SQL injection attack by doing:

 dbGetQuery(c,paste(select * from tst where y not in
 (,c(7,6),),collapse=','));

 If you do this and use a variable instead of the c(7,6), make sure
you
 cleanse the contents of the variable. Just as making sure that
there
 is no bare semi-colon in it. And other things that don't come to
 mind off hand.

 Hum, perhaps better:

 values-c(7,6);
 dbGetQuery(c,paste(select * from tst where y not in (,

 paste(rep('?',length(values)),collapse=','),
 )),
 values);

 As you can see, this dynamically adjusts the number of ? marks in the
 SELECT statement, based on the number of elements in the values
 variable.

 --
 There is nothing more pleasant than traveling and meeting new people!
 Genghis Khan

 Maranatha! 
 John McKown


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] populating matrix with binary variable after matching data from data frame

2014-08-12 Thread Adrian Johnson
Hi:
sorry I have a basic question.

I have a data frame with two columns:
 x1
  V1   V2
1   AKT3TCL1A
2  AKTIPVPS41
3  AKTIPPDPK1
4  AKTIP   GTF3C1
5  AKTIPHOOK2
6  AKTIPPOLA2
7  AKTIP KIAA1377
8  AKTIP FAM160A2
9  AKTIPVPS16
10 AKTIPVPS18


I have a matrix 1211x1211 (using some elements in x1$V1 and some from
x1$V2). I want to populate for every match for example AKT3 = TCL1A = 1
whereas AKT3 - VPS41 gets 0)
How can i map this binary relations in x.


x
   TCLA1 VPS41 ABCA13 ABCA4
AKT3   0 0  0 0
AKTIP  0 0  0 0
ABCA13 0 0  0 0
ABCA4  0 0  0 0


dput -

x = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), .Dim =
c(4L,
4L), .Dimnames = list(c(AKT3, AKTIP, ABCA13, ABCA4
), c(TCLA1, VPS41, ABCA13, ABCA4)))

x1 = structure(list(V1 = c(AKT3, AKTIP, AKTIP, AKTIP, AKTIP,
AKTIP, AKTIP, AKTIP, AKTIP, AKTIP), V2 = c(TCL1A,
VPS41, PDPK1, GTF3C1, HOOK2, POLA2, KIAA1377, FAM160A2,
VPS16, VPS18)), .Names = c(V1, V2), row.names = c(NA,
10L), class = data.frame)



Thanks
Adrian

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] populating matrix with binary variable after matching data from data frame

2014-08-12 Thread arun
You could try:
x1$V2[1] - TCLA1


  x[outer(rownames(x), colnames(x), FUN=paste) %in% 
as.character(interaction(x1, sep= ))] - 1
x
   TCLA1 VPS41 ABCA13 ABCA4
AKT3   1 0  0 0
AKTIP  0 1  0 0
ABCA13 0 0  0 0
ABCA4  0 0  0 0
A.K.


On Tuesday, August 12, 2014 8:16 PM, Adrian Johnson oriolebaltim...@gmail.com 
wrote:
Hi:
sorry I have a basic question.

I have a data frame with two columns:
 x1
      V1       V2
1   AKT3    TCL1A
2  AKTIP    VPS41
3  AKTIP    PDPK1
4  AKTIP   GTF3C1
5  AKTIP    HOOK2
6  AKTIP    POLA2
7  AKTIP KIAA1377
8  AKTIP FAM160A2
9  AKTIP    VPS16
10 AKTIP    VPS18


I have a matrix 1211x1211 (using some elements in x1$V1 and some from
x1$V2). I want to populate for every match for example AKT3 = TCL1A = 1
whereas AKT3 - VPS41 gets 0)
How can i map this binary relations in x.


x
       TCLA1 VPS41 ABCA13 ABCA4
AKT3       0     0      0     0
AKTIP      0     0      0     0
ABCA13     0     0      0     0
ABCA4      0     0      0     0


dput -

x = structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), .Dim =
c(4L,
4L), .Dimnames = list(c(AKT3, AKTIP, ABCA13, ABCA4
), c(TCLA1, VPS41, ABCA13, ABCA4)))

x1 = structure(list(V1 = c(AKT3, AKTIP, AKTIP, AKTIP, AKTIP,
AKTIP, AKTIP, AKTIP, AKTIP, AKTIP), V2 = c(TCL1A,
VPS41, PDPK1, GTF3C1, HOOK2, POLA2, KIAA1377, FAM160A2,
VPS16, VPS18)), .Names = c(V1, V2), row.names = c(NA,
10L), class = data.frame)



Thanks
Adrian

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Cox regression model for matched data with replacement

2014-08-12 Thread John Pura
I am curious about this problem as well. How do you go about creating the 
weights for each pair, and are you suggesting that we can just incorporate a 
weight statement in the model as opposed to the strata statement? And Dr. 
Therneau, let's say I have 140 cases matched with replacement to 2 controls. Is 
my id variable the number of cases?

Thanks,
John

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.