date:20120806


Hello,

Works with me:

d1 - data.frame(V1 = 1:3,
V2 = c(some text = 9, some tèxt = 9, some other text = 9))

regexpr(some text = 9, d1$V2)
[1]  1 -1 -1
attr(,match.length)
[1] 13 -1 -1
regexpr(some tèxt = 9, d1$V2)
[1] -1  1 -1
attr(,match.length)
[1] -1 13 -1
d1$V1[regexpr(some text = 9,d1$V2)  0] - 9
d1$V1[regexpr(some tèxt = 9,d1$V2)  0] - 9
d1
  V1  V2
1  9   some text = 9
2  9   some tèxt = 9
3  3 some other text = 9

What do you mean by it did not work? What was the contents of 'd1'?

sessionInfo()
R version 2.15.1 (2012-06-22)
Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] LC_COLLATE=Portuguese_Portugal.1252 LC_CTYPE=Portuguese_Portugal.1252
[3] LC_MONETARY=Portuguese_Portugal.1252 LC_NUMERIC=C
[5] LC_TIME=Portuguese_Portugal.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods base

loaded via a namespace (and not attached):
[1] fortunes_1.5-0

Hope this helps,

Rui Barradas

Em 06-08-2012 06:55, Luca Meyer escreveu:

Hello,

I have build a syntax to find out if a given substring is included in a larger 
string that works like this:

d1$V1[regexpr(some text = 9,d1$V2)0] - 9

and this works all right till some text contains standard ASCII set. However, 
it does not work when accents are included as the following:

d1$V1[regexpr(some tèxt = 9,d1$V2)0] - 9

I have tried to substitute è with several wildcards but it did not work, can 
anyone suggest how to have the syntax parse the string ignoring the accent?

Thank you in advance,

Luca

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regexpr with accents

HI,

It works with me.  I am using R 2.15 on Ubuntu 12.04.

 d1 - data.frame(V1 = 1:5, V2=c(some text = 9, some téxt=9,sóme tèxt=9,  
söme text=9, some têxt=9))
d1
#  V1    V2
#1  1 some text = 9
#2  2   some téxt=9
#3  3   sóme tèxt=9
#4  4   söme text=9
#5  5   some têxt=9
  
d1$V1[regexpr(some téxt=9,d1$V2)0]-9
d1$V1[regexpr(söme text=9,d1$V2)0] -9
d1$V1[regexpr(some têxt=9,d1$V2)0] -9
d1$V1[regexpr(sóme tèxt=9,d1$V2)0] -9
d1$V1[regexpr(some text = 9,d1$V2)0] -9

d1
#  V1    V2
#1  9 some text = 9
#2  9   some téxt=9
#3  9   sóme tèxt=9
#4  9   söme text=9
#5  9   some têxt=9

A.K.




- Original Message -
From: Luca Meyer lucam1...@gmail.com
To: r-help@r-project.org
Cc: 
Sent: Monday, August 6, 2012 1:55 AM
Subject: [R] regexpr with accents

Hello,

I have build a syntax to find out if a given substring is included in a larger 
string that works like this:

d1$V1[regexpr(some text = 9,d1$V2)0] - 9

and this works all right till some text contains standard ASCII set. However, 
it does not work when accents are included as the following:

d1$V1[regexpr(some tèxt = 9,d1$V2)0] - 9

I have tried to substitute è with several wildcards but it did not work, can 
anyone suggest how to have the syntax parse the string ignoring the accent?

Thank you in advance,

Luca

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] deleting columns from a dataframe where NA is more than 15 percent of the column length

2012-08-06 Thread Faz Jones

Thank you. It works great.

Sent from my iPhone

On Aug 5, 2012, at 9:08 PM, Jorge I Velez jorgeivanve...@gmail.com wrote:

 Hi Faz,
 
 Here is one way of doing it where x is your data frame:
 
 x[, colMeans(is.na(x)) = .15]
 
 HTH,
 Jorge.-
 
 
 On Sun, Aug 5, 2012 at 9:04 PM, Faz Jones  wrote:
 I have a dataframe of 10 different columns (length of each column is
 the same). I want to eliminate any column that has 'NA' greater than
 15% of the column length. Do i first need to make a function for
 calculating the percentage of NA for each column and then make another
 dataframe where i apply the function? Whats the best way to do this.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] deleting columns from a dataframe where NA is more than 15 percent of the column length

2012-08-06 Thread Faz Jones

Thank you.. It was very informative and helpful. It works 

Sent from my iPhone

On Aug 5, 2012, at 10:21 PM, arun smartpink...@yahoo.com wrote:

 HI,
 
 Try this:
 dat1-data.frame(x=c(NA,NA,rnorm(6,15),NA),y=c(NA,rnorm(8,15)),z=c(rnorm(7,15),NA,NA))
 dat1[which(colMeans(is.na(dat1))=.15)]
  y
 1   NA
 2 13.53085
 3 12.89453
 4 15.02625
 5 14.00387
 6 15.34618
 7 15.69293
 8 15.62377
 9 14.76479
 
 #You can also use apply, sapply etc.
 dat2-data.frame(x=c(NA,NA,rnorm(6,15),NA),y=c(NA,rnorm(8,15)),z=c(rnorm(7,15),NA,NA),u=c(rnorm(9,15)))
 dat2[apply(dat2,2,function(x) mean(is.na(x))=.15)]  
 
 #dat2[sapply(dat2,function(x) mean(is.na(x))=.15)]
 #dat2[which(colMeans(is.na(dat2))=.15)] 
 
yu
 1   NA 14.56278
 2 16.49940 16.25761
 3 14.11368 14.08768
 4 14.95139 14.01923
 5 14.99517 15.91936
 6 14.46359 14.07573
 7 15.09702 13.94888
 8 15.99967 14.97171
 9 15.51924 15.59981
 
 A.K.
 
 
 
 
 
 - Original Message -
 From: Faz Jones jonesf...@gmail.com
 To: r-help@r-project.org
 Cc: 
 Sent: Sunday, August 5, 2012 9:04 PM
 Subject: [R] deleting columns from a dataframe where NA is more than 15 
 percent of the column length
 
 I have a dataframe of 10 different columns (length of each column is
 the same). I want to eliminate any column that has 'NA' greater than
 15% of the column length. Do i first need to make a function for
 calculating the percentage of NA for each column and then make another
 dataframe where i apply the function? Whats the best way to do this.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory limit for Windows 64bit build of R



On Aug 5, 2012, at 3:52 PM, alan.x.simp...@nab.com.au wrote:


Dear all

I have a Windows Server 2008 R2 Enterprise machine, with 64bit R  
installed
running on 2 x Quad-core Intel Xeon 5500 processor with 24GB DDR3  
1066 Mhz

RAM.  I am seeking to analyse very large data sets (perhaps as much as
10GB), without the addtional coding overhead of a package such as
bigmemory().


It may depend in part on how that number is arrived at. And what you  
plan on doing with it. (Don't consider creating a dist-object.)


My question is this - if we were to increase the RAM on the machine to
(say) 128GB, would this become a possibility?  I have read the
documentation on memory limits and it seems so, but would like some
additional confirmation before investing in any extra RAM.


The trypical advices is you will need memory that is 3 times as large  
as a large dataset, and I find that even more headroom is needed. I  
have 32GB and my larger datasets occupy 5-6 GB and I generally have  
few problems. I had quite a few problems with 18 GB, so I think the  
ratio should be 4-5 x your 10GB object.  I predict you could get by  
with 64GB. (please send check for half the difference in cost between  
64GB abd 128 GB.)


--
David.



Kind regards

Alan

Alan Simpson
Technical Lead, Retail Model Development
Retail Models Project
National Australia Bank

Level 15, 500 Bourke St, Melbourne VIC
Tel: +61 (0) 3 8697 7135  |  Mob: +61 (0) 412 975 955
Email: alan.x.simp...@nab.com.au


The information contained in this email and its attachments may be  
confidential.
If you have received this email in error, please notify the sender  
by return email,

delete this email and destroy any copy.

Any advice contained in this email has been prepared without taking  
into
account your objectives, financial situation or needs. Before acting  
on any
advice in this email, National Australia Bank Limited ABN 12 004 044  
937 AFSL and Australian Credit Licence 230686 (NAB) recommends that

you consider whether it is appropriate for your circumstances.
If this email contains reference to any financial products, NAB  
recommends
you consider the Product Disclosure Statement (PDS) or other  
disclosure

document available from NAB, before making any decisions regarding any
products.

If this email contains any promotional content that you do not wish  
to receive,

please reply to the original sender and write Don't email promotional
material in the subject.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to identify values from a column of a dataframe, and insert them in other data.frame with the corresponding id?

2012-08-06 Thread Nerea Lezama

Thank you very much John, can you read it now?

Hello,

I'd like to do next, see if you could help me please:
I have a csv called datuak with a id called calee_id and a colunm
called poids.
I have another csv called datuak2 with the same id called calee_id,
(although there are calee_id that are in datuak but not in datuak2
and inverse), and a column called kg_totales in which the values are
repeteated for each calee_id because are the sum of the colum kg for
each row.

I show you the table datuak and datuak2:

Datuak (in the example the calee_id is the same, but there are a lot):

   poids   calee_idmaree_id
   10  1.27E+120.3013157
   20  1.27E+120.05726046
   20  1.27E+120.73631699
   25  1.27E+120.74492002
   3   1.27E+120.74492002
   27  1.27E+120.31776439
   43  1.27E+120.31776439


Datuak2:

   calee_id  maree_id  kg_totales  effectif
1 1.33959e+12 0.782835873  129.7 30
2 1.33959e+12 0.782835873  129.7 40
3 1.33959e+12 0.782835873  129.7 10
4 1.33959e+12 0.782835873  129.7  5
5 1.33959e+12 0.782835873  129.71.7
6 1.33959e+12 0.782835873  129.7 20
7 1.33959e+12 0.782835873  129.7 20
8 1.33959e+12 0.782835873  129.7  1
9 1.33959e+12 0.782835873  129.7  2

I would like to identify in the csv datuak2 the corresponding
calee_id that also are in datuak, and create a new column in
datuak with the values for each calee_id from kg_totales, and not
repeat them.
So the final table would be datuak, with calee_id, poids, and the
new column kg_totales with its corresponding value for each row.

Thank you very much,
Nerea

-Mensaje original-
De: John Kane [mailto:jrkrid...@inbox.com] 
Enviado el: 03 August 2012 20:17
Para: Nerea Lezama; r-help@r-project.org
Asunto: RE: [R] how to identify values from a column of a dataframe, and
insert them in other data.frame with the corresponding id?

Hi Nerea,

For some reason your post is badl garbled and close to imposible to
read.  
Perhaps you need to check your text encoding?

Also to send sample data it is better to use the dput() command.
Do dput(myfile) and then paste the results into your email

Sorry not to be of more help.

John Kane
Kingston ON Canada


 -Original Message-
 From: nlez...@azti.es
 Sent: Fri, 3 Aug 2012 12:34:07 +0200
 To: r-help@r-project.org
 Subject: [R] how to identify values from a column of a dataframe, and 
 insert them in other data.frame with the corresponding id?
 
 
 
 Hello,
 
 Ib??d like to do next, see if you could help me please:
 I have a csv called b??datuakb?? with a id called b??calee_idb?? and a

 colunm called b??poidsb??.
 
 I have another csv called b??datuak2b?? with the same id called 
 b??calee_idb??, (although there are b??calee_idb?? that are in 
 b??datuakb?? but not in b??datuak2b?? and inverse), and a column 
 called b??kg_totalesb?? in which the values are repeteated for each 
 calee_id because are the sum of the colum b??kgb?? for each row.
 
 
 
 I show you the table b??datuakb?? and b??datuak2b??:
 
 
 
 Datuak (in the example the calee_id is the same, but there are a lot):
 
 
 
   poids
 
 calee_id
 
 maree_id
 
   10
 
 1.27E+12
 
 0.3013157
 
   20
 
 1.27E+12
 
 0.05726046
 
   20
 
 1.27E+12
 
 0.73631699
 
   25
 
 1.27E+12
 
 0.74492002
 
   3
 
 1.27E+12
 
 0.74492002
 
   27
 
 1.27E+12
 
 0.31776439
 
   43
 
 1.27E+12
 
 0.31776439
 
 
 
 
 
 Datuak2:
 
 
 
calee_id  maree_id  kg_totales  effectif
 
 1 1.33959e+12 0.782835873  129.7 30
 
 2 1.33959e+12 0.782835873  129.7 40
 
 3 1.33959e+12 0.782835873  129.7 10
 
 4 1.33959e+12 0.782835873  129.7  5
 
 5 1.33959e+12 0.782835873  129.71.7
 
 6 1.33959e+12 0.782835873  129.7 20
 
 7 1.33959e+12 0.782835873  129.7 20
 
 8 1.33959e+12 0.782835873  129.7  1
 
 9 1.33959e+12 0.782835873  129.7  2
 
 I would like to identify in the csv b??datuak2b?? the corresponding 
 b??calee_idb?? that also are in b??datuakb??, and create a new column 
 in b??datuakb?? with the values for each b??calee_idb?? from 
 b??kg_totalesb??, and not repeat them.
 
 So the final table would be b??datuakb??, with b??calee_idb??, 
 b??poidsb??, and the new column b??kg_totalesb?? with its 
 corresponding value for each row.
 
 
 
 Thank you very much,
 
 Nerea
 
 
 
 
 
 
 --
 
 
 
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide

Re: [R] find date between two other dates

2012-08-06 Thread penguins

Thanks arun and Rui; 3 fantastic suggestions. 

The Season interval is not always a month so arun's suggestion works better
for this dataset. I couldn't get the as.between function to work on arun's
second suggestion, it only returned NAs. 

However, arun's first suggestion worked a treat!

Many thanks 



--
View this message in context: 
http://r.789695.n4.nabble.com/find-date-between-two-other-dates-tp4639231p4639253.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory limit for Windows 64bit build of R

2012-08-06 Thread Uwe Ligges




On 06.08.2012 09:34, David Winsemius wrote:


On Aug 5, 2012, at 3:52 PM, alan.x.simp...@nab.com.au wrote:


Dear all

I have a Windows Server 2008 R2 Enterprise machine, with 64bit R
installed
running on 2 x Quad-core Intel Xeon 5500 processor with 24GB DDR3 1066
Mhz
RAM.  I am seeking to analyse very large data sets (perhaps as much as
10GB), without the addtional coding overhead of a package such as
bigmemory().


It may depend in part on how that number is arrived at. And what you
plan on doing with it. (Don't consider creating a dist-object.)


My question is this - if we were to increase the RAM on the machine to
(say) 128GB, would this become a possibility?  I have read the
documentation on memory limits and it seems so, but would like some
additional confirmation before investing in any extra RAM.


The trypical advices is you will need memory that is 3 times as large as
a large dataset, and I find that even more headroom is needed. I have
32GB and my larger datasets occupy 5-6 GB and I generally have few
problems. I had quite a few problems with 18 GB, so I think the ratio
should be 4-5 x your 10GB object.  I predict you could get by with 64GB.
(please send check for half the difference in cost between 64GB abd 128
GB.)




10Gb objects should be fine, but note that a vector/array/matrix cannot 
exceed  2^31-1 elements, hence a 17Gb vector/matrix/array of doubles / 
reals.


Best,
Uwe Ligges

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory limit for Windows 64bit build of R

2012-08-06 Thread Prof Brian Ripley


On 06/08/2012 09:42, Uwe Ligges wrote:



On 06.08.2012 09:34, David Winsemius wrote:


On Aug 5, 2012, at 3:52 PM, alan.x.simp...@nab.com.au wrote:


Dear all

I have a Windows Server 2008 R2 Enterprise machine, with 64bit R
installed
running on 2 x Quad-core Intel Xeon 5500 processor with 24GB DDR3 1066
Mhz
RAM.  I am seeking to analyse very large data sets (perhaps as much as
10GB), without the addtional coding overhead of a package such as
bigmemory().


It may depend in part on how that number is arrived at. And what you
plan on doing with it. (Don't consider creating a dist-object.)


My question is this - if we were to increase the RAM on the machine to
(say) 128GB, would this become a possibility?  I have read the
documentation on memory limits and it seems so, but would like some
additional confirmation before investing in any extra RAM.


The trypical advices is you will need memory that is 3 times as large as
a large dataset, and I find that even more headroom is needed. I have


The advice is 'at least 3 times'.  It all depends what you are doing 
(and how slow your swap is -- on Windows it is likely to be slow; on a 
Linux box with a fast SSD it can be viable to use swap).



32GB and my larger datasets occupy 5-6 GB and I generally have few
problems. I had quite a few problems with 18 GB, so I think the ratio
should be 4-5 x your 10GB object.  I predict you could get by with 64GB.


But 3 x 18GB  32GB!


(please send check for half the difference in cost between 64GB abd 128
GB.)




10Gb objects should be fine, but note that a vector/array/matrix cannot
exceed  2^31-1 elements, hence a 17Gb vector/matrix/array of doubles /
reals.


That's true for R 2.15.1, but not the development version.  Further, 
R-devel makes substantially fewer copies of objects, most of which 
improvements have been ported to R-patched.


dist() is one example of substantial improvements.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] low resolution word map

2012-08-06 Thread Ray Brownrigg


On 2/08/2012 8:06 a.m., Thomas Steiner wrote:

Hi Ray,

2012/7/31 Ray Brownrigg ray.brownr...@ecs.vuw.ac.nz:

On 07/28/12 23:46, Thomas Steiner wrote:

Hi,
I'd like to have a low resolution word map in R.
The maps package has this option, but if I use the argument, the map
looses sense: Russia and Australia get empty etc

library(maps)
m=map(col=skyblue,fill=TRUE,plot=TRUE,resolution=10)
length(m$x)

If I drop the fill=TRUE, the effect of resolutaion=1 is lost,

I don't know why this is so, I hadn't realised that effect of resolution
with fill=TRUE.

I don't understand either, but I left this observation to the developpers ;-)


ie no change. Is there any other package or could I use the resolution
argument differently?

Well, you could say:
m=map(col=0,fill=TRUE, resolution=10)

Is that what you want?

no, if you look at the output (ie the map) you know why: little
islands like hawaii do still exist, but brazil is a rectangle of say 8
points...
Unfortunately, I don't know why.  What exactly do you mean by low 
resolution map?  A point is a point at whatever resolution you choose.  
What do you expect to see?


Perhaps you need to read the posting guide again, and provide 
reproducible code (resolutaion=1 is not a valid option to map(), 
and does not match your earlier resolution=10).


Ray Brownrigg

Ray


Thanks,
Thomas



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Odp: Problem with segmented function

2012-08-06 Thread Petr PIKAL

Hi
 
 Hi,
 
 I appreciate your help with the segmented function. I am relatively new 
to
 R. I followed the introduction of the 'segmented'-package by Vito 
Muggeo,
 but still it does not work.
 Here are the lines I wrote:
 
 data_test-data.frame(x=c(1:10),y=c(1,1,1,1,1,2,3,4,5,6))
 lr_test-lm(y~x,data_test)
 seg_test-segmented(lr_test,seg.Z~x,psi=1)

You did not read help page correctly. seg.Z is named parameter in which 
you specify formula without LHS. psi shall be x near the expected slope 
change.

seg_test-segmented(lr_test,seg.Z=~x,psi=5)

works corretly

Regards
Petr

 
 
 /error in segmented.lm(lr_test, seg.Z ~ x, psi = 1) : 
  A wrong number of terms in `seg.Z' or `psi'/
 
 Thank you very much,
 Stella
 
 
 
 --
 View this message in context: 
http://r.789695.n4.nabble.com/Problem-with-
 segmented-function-tp4639227.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to identify values from a column of a dataframe, and insert them in other data.frame with the corresponding id?

2012-08-06 Thread Petr PIKAL

Hi

It is better to use dput for presenting data for others. You probably want 
?merge.

Something like

merge(datuak, datuak2, by = calee_id, all.x=TRUE)

However calee_id seems to be a floating point number and it may be rounded 
so you shall beware of it. 

Regards
Petr

 
 Thank you very much John, can you read it now?
 
 Hello,
 
 I'd like to do next, see if you could help me please:
 I have a csv called datuak with a id called calee_id and a colunm
 called poids.
 I have another csv called datuak2 with the same id called calee_id,
 (although there are calee_id that are in datuak but not in datuak2
 and inverse), and a column called kg_totales in which the values are
 repeteated for each calee_id because are the sum of the colum kg for
 each row.
 
 I show you the table datuak and datuak2:
 
 Datuak (in the example the calee_id is the same, but there are a lot):
 
poids   calee_id   maree_id
10   1.27E+12   0.3013157
20   1.27E+12   0.05726046
20   1.27E+12   0.73631699
25   1.27E+12   0.74492002
3   1.27E+12   0.74492002
27   1.27E+12   0.31776439
43   1.27E+12   0.31776439
 
 
 Datuak2:
 
calee_id  maree_id  kg_totales  effectif
 1 1.33959e+12 0.782835873  129.7 30
 2 1.33959e+12 0.782835873  129.7 40
 3 1.33959e+12 0.782835873  129.7 10
 4 1.33959e+12 0.782835873  129.7  5
 5 1.33959e+12 0.782835873  129.71.7
 6 1.33959e+12 0.782835873  129.7 20
 7 1.33959e+12 0.782835873  129.7 20
 8 1.33959e+12 0.782835873  129.7  1
 9 1.33959e+12 0.782835873  129.7  2
 
 I would like to identify in the csv datuak2 the corresponding
 calee_id that also are in datuak, and create a new column in
 datuak with the values for each calee_id from kg_totales, and not
 repeat them.
 So the final table would be datuak, with calee_id, poids, and the
 new column kg_totales with its corresponding value for each row.
 
 Thank you very much,
 Nerea
 
 -Mensaje original-
 De: John Kane [mailto:jrkrid...@inbox.com] 
 Enviado el: 03 August 2012 20:17
 Para: Nerea Lezama; r-help@r-project.org
 Asunto: RE: [R] how to identify values from a column of a dataframe, and
 insert them in other data.frame with the corresponding id?
 
 Hi Nerea,
 
 For some reason your post is badl garbled and close to imposible to
 read. 
 Perhaps you need to check your text encoding?
 
 Also to send sample data it is better to use the dput() command.
 Do dput(myfile) and then paste the results into your email
 
 Sorry not to be of more help.
 
 John Kane
 Kingston ON Canada
 
 
  -Original Message-
  From: nlez...@azti.es
  Sent: Fri, 3 Aug 2012 12:34:07 +0200
  To: r-help@r-project.org
  Subject: [R] how to identify values from a column of a dataframe, and 
  insert them in other data.frame with the corresponding id?
  
  
  
  Hello,
  
  Ib??d like to do next, see if you could help me please:
  I have a csv called b??datuakb?? with a id called b??calee_idb?? and a
 
  colunm called b??poidsb??.
  
  I have another csv called b??datuak2b?? with the same id called 
  b??calee_idb??, (although there are b??calee_idb?? that are in 
  b??datuakb?? but not in b??datuak2b?? and inverse), and a column 
  called b??kg_totalesb?? in which the values are repeteated for each 
  calee_id because are the sum of the colum b??kgb?? for each row.
  
  
  
  I show you the table b??datuakb?? and b??datuak2b??:
  
  
  
  Datuak (in the example the calee_id is the same, but there are a lot):
  
  
  
 poids
  
  calee_id
  
  maree_id
  
 10
  
  1.27E+12
  
  0.3013157
  
 20
  
  1.27E+12
  
  0.05726046
  
 20
  
  1.27E+12
  
  0.73631699
  
 25
  
  1.27E+12
  
  0.74492002
  
 3
  
  1.27E+12
  
  0.74492002
  
 27
  
  1.27E+12
  
  0.31776439
  
 43
  
  1.27E+12
  
  0.31776439
  
  
  
  
  
  Datuak2:
  
  
  
 calee_id  maree_id  kg_totales  effectif
  
  1 1.33959e+12 0.782835873  129.7 30
  
  2 1.33959e+12 0.782835873  129.7 40
  
  3 1.33959e+12 0.782835873  129.7 10
  
  4 1.33959e+12 0.782835873  129.7  5
  
  5 1.33959e+12 0.782835873  129.71.7
  
  6 1.33959e+12 0.782835873  129.7 20
  
  7 1.33959e+12 0.782835873  129.7 20
  
  8 1.33959e+12 0.782835873  129.7  1
  
  9 1.33959e+12 0.782835873  129.7  2
  
  I would like to identify in the csv b??datuak2b?? the corresponding 
  b??calee_idb?? that also are in b??datuakb??, and create a new column 
  in b??datuakb?? with the values for each b??calee_idb?? from 
  b??kg_totalesb??, and not repeat them.
  
  So the final table would be b??datuakb??, with b??calee_idb??, 
  b??poidsb??, and the new column b??kg_totalesb??

Re: [R] no font could be found for family Arial

2012-08-06 Thread tibr

I hope the original poster fixed this a long time ago, but I had the same
problem and here is how I fixed it:

- go to the application Fontbook
- check if the Arial font has duplicates, and delete them, even if they are
set to Off
- restart the computer.




emmats wrote
 
 I was re-running some code that I hadn't run in a couple of months to make
 barplots in R.  I didn't change a single thing in the script, but the
 plots wouldn't work this time around.  The plot itself (the bars and axes)
 will graph in the window, but no text appears.  In the console it says I
 have a number of errors, all of which say no font could be found for
 family 'Arial'.  
 I have not knowingly changed anything in R and I would like to be able to
 make barplots with labels and titles again.  Does anyone know how to fix
 this?
 
 




--
View this message in context: 
http://r.789695.n4.nabble.com/no-font-could-be-found-for-family-Arial-tp3233322p4639257.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] line type lty

2012-08-06 Thread ravibioinfo

http://www.google.com/url?sa=trct=jq=lty+in+r+line+typessource=webcd=1ved=0CFQQFjAAurl=http%3A%2F%2Fstudents.washington.edu%2Fmclarkso%2Fdocuments%2Fline%2520styles%2520Ver2.pdfei=HYgfUMPgGYLJrQfWjIGYBwusg=AFQjCNGL8xBzLN2je0RQFc5e8Hk5eRnS9Q



--
View this message in context: 
http://r.789695.n4.nabble.com/line-type-lty-tp3466345p4639258.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] LDA for topic modeling in R

2012-08-06 Thread davyfeng

Hi, All,

I am using the supervised lda function (slda) from 'lda' package in R for
topic modeling (http://cran.r-project.org/web/packages/lda/index.html), my
data is  a collection of documents, and within which each doc has a label.
There are about 97 different categories and 18K documents  in total. I tried
to use the slda for training a model from the data (the whole dataset and
sub_dataset), but failed with some strange problems during the procedure:
Iteration 0
Iteration 1
Iteration 2
Iteration 3
Error in structure(.Call(collapsedGibbsSampler, documents, as.integer(K), 
: 
  Numerical problems (-789.682, 0.0454545).

I am using R 2.15.0 and lda 1.3.1. Any one has any idea?  Thanks very much!





--
View this message in context: 
http://r.789695.n4.nabble.com/LDA-for-topic-modeling-in-R-tp4639263.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Package to remove collinear variables

2012-08-06 Thread Gabor Grothendieck

On Sat, Aug 4, 2012 at 11:27 PM, Roberto rmosce...@unitus.it wrote:
 Hi,
 I need to remove collinear variables to my Near-Infrared table of spectra.

 What package can I use?

 Something simple, because I am a novice about statistic.



There many methods of assessing multicollinearlity but to pick one
that has a good help page try vif in the HH package. (There are also
other packages that have implemented vif or variations of it.)


-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] regexpr with accents

2012-08-06 Thread Luca Meyer

Sorry but my previous email did not go through properly. Instead of the ? you 
should really read an egrave or #232 according to 
http://www.lookuptables.com/.

So there are extended ASCII characters I need to deal with.

I have tried

d1$V1[regexpr(some tegravext = 9,d1$V2)0] - 9
and 

d1$V1[regexpr(some t#232xt = 9,d1$V2)0] - 9

without success...

Thanks,
Luca




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regexpr with accents

2012-08-06 Thread Luca Meyer

Thanks Arun,

It works all right, I just found out that my problem was not with accents but 
with the correct spelling of some text.

Kind regards,

Luca

Il giorno 06/ago/2012, alle ore 15.01, arun ha scritto:

 
 
 Hi,
 
 Here, the string with in the quotes are read exactly like that.  So, you may 
 have to use the symbol instead of friendly or numeric from the link.  Or 
 you have to convert those.
 
 d1 - data.frame(V1 = 1:4,
 V2 = c(some text = 9, some tegravext = 9, some tèxt = 9, some 
 t#232xt = 9))
 
 d1$V1[regexpr(some tegravext = 9,d1$V2)0] - 9
  d1$V1[regexpr(some t#232xt = 9,d1$V2)0] - 9
 d1$V1[regexpr(some tèxt = 9,d1$V2)0] - 9
 
 d1
   V1  V2
 1  1   some text = 9
 2  9 some tegravext = 9
 3  9   some tèxt = 9
 4  9   some t#232xt = 9
 
 A.K.
 
 
 - Original Message -
 From: Luca Meyer lucam1...@gmail.com
 To: r-help@r-project.org
 Cc: 
 Sent: Monday, August 6, 2012 8:25 AM
 Subject: [R]  regexpr with accents
 
 Sorry but my previous email did not go through properly. Instead of the ? you 
 should really read an egrave or #232 according to 
 http://www.lookuptables.com/.
 
 So there are extended ASCII characters I need to deal with.
 
 I have tried
 
 d1$V1[regexpr(some tegravext = 9,d1$V2)0] - 9
 and 
 
 d1$V1[regexpr(some t#232xt = 9,d1$V2)0] - 9
 
 without success...
 
 Thanks,
 Luca
 
 
 
 
 [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Memory limit for Windows 64bit build of R

2012-08-06 Thread Jay Emerson

Alan,

More RAM will definitely help.  But if you have an object needing more than
2^31-1 ~ 2 billion elements, you'll hit a wall regardless.  This could be
particularly limiting for matrices.  It is less limiting for data.frame
objects (where each column could be 2 billion elements).  But many R
analytics under the hood use matrices, so you may not know up front where
you could hit a limit.

Jay

 Original message 
I have a Windows Server 2008 R2 Enterprise machine, with 64bit R installed
running on 2 x Quad-core Intel Xeon 5500 processor with 24GB DDR3 1066 Mhz
RAM.  I am seeking to analyse very large data sets (perhaps as much as
10GB), without the addtional coding overhead of a package such as
bigmemory().

My question is this - if we were to increase the RAM on the machine to
(say) 128GB, would this become a possibility?  I have read the
documentation on memory limits and it seems so, but would like some
additional confirmation before investing in any extra RAM.
-


-- 
John W. Emerson (Jay)
Associate Professor of Statistics, Adjunct, and Acting Director of Graduate
Studies
Department of Statistics
Yale University
http://www.stat.yale.edu/~jay

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] issue with nzchar() ?

Dear all
I'm a bit surprised by the results output from nzchar(). The help page
says: nzchar is a fast way to find out if elements of a character
vector are *non-empty strings*. (my emphasis. However, if you do
 x - c(letters, NA, '')
 nzchar(x)
 [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[13]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[25]  TRUE  TRUE  TRUE FALSE
 any(is.na(x))
[1] TRUE

the NA value in the character vector will be considered as a non-empty
string, something that I find strange. At best NA is the equivalent of
an empty string. In this sense, if you Hmisc::describe() the vector
you get, as I would expect, that in the context of character vectors
NA and '' values are considered together:
 require(Hmisc)
 describe(x)
x
  n missing  unique
 26   2  26

lowest : a b c d e, highest: v w x y z

So is this a bug in the function or in the help page? Regards
Liviu


-- 
Do you know how to read?
http://www.alienetworks.com/srtest.cfm
http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
Do you know how to write?
http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] issue with nzchar() ?

On Mon, Aug 6, 2012 at 4:48 PM, Liviu Andronic landronim...@gmail.com wrote:
 string, something that I find strange. At best NA is the equivalent of
 an empty string. In this sense, if you Hmisc::describe() the vector
 you get, as I would expect, that in the context of character vectors
 NA and '' values are considered together:


By the way, same question holds for nchar(): Should NA values be
reported as 2-char strings, or as 0-char empty/missing values?
 x - c(letters, NA, '')
 nchar(x)
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 0


Liviu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Identify points that lie within polygon

2012-08-06 Thread Ally

I have a complex 2D polygon with thousands of vertices, and I'd like to be
able to identify points from a large set contained within the polygon, and
was wondering if there might be an efficient way of doing this?  Any advice
would be useful!  Here is a small example of what I mean:

# make polygon
v1-c(0,1,1,2,1,3,6,7)
v2-c(1,3,3,5,6,7,8,9)
plot(v1, v2, type = n )
polygon(v1, v2, lwd = 2, col = red)

# plot a set of candidate grid points
grid-seq(0, 10, length.out = 30)
pts-expand.grid(grid, grid)
points(pts, pch = 19, col = 1, cex = 1)

Many thanks!

Alastair




--
View this message in context: 
http://r.789695.n4.nabble.com/Identify-points-that-lie-within-polygon-tp4639289.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Parallel runs of an external executable with snow in local

2012-08-06 Thread Xavier Portell/UPC

Thanks Uwe but, actually, I did so.

Since #8220;filetorun.exe#8221; looks in the current folder for 
#8220;input.txt#8221;, I tried moving all needed files to a newly created 
temporary folder #8220;tmp.id#8221; (say, tmp.1) and running the executable. 
This works fine by doing it directly from the windows command line but not by 
doing it from R, since using:

#
System(#8220;C:/Users/#8230;/currentworkdirectory/temp.1/filetorun.exe#8221;)
#

Causes #8220;filetorun.exe#8221; to look for #8220;input.txt#8221; in 
#8220;C:/Users/#8230;/currentworkdirectory#8221;. So there#8217;s no point 
on moving files to a folder, it seems that input file must be situated in the 
current R work directory. Does anybody know how to avoid this behaviour?

I hope I#8217;ve explained that clearly,

Xavier Portell Canal, PhD candidate.
Department of Agri-food Engineering,
Universitat Politècnica de Catalunya

-Uwe Ligges lig...@statistik.tu-dortmund.de ha escrit: -

Per a: Xavier Portell/UPC xavier.port...@upc.edu
De: Uwe Ligges lig...@statistik.tu-dortmund.de
Data: 05/08/2012 07:46PM
a/c: r-help@r-project.org
Assumpte: Re: [R] Parallel runs of an external executable with snow in local



On 03.08.2012 19:21, Xavier Portell/UPC wrote:
 Hi everyone,

 I'm aiming to run an external executable (say filetorun.EXE) in parallel. The 
 external executable collect needed data from a file, say input.txt and, in 
 turn,generates several output files, say output.txt. I need to generate 
 input.txt, run the executable and keep input.txt and output.txt. I'm 
 using Windows 7, R version 2.15.1 (2012-06-22) on RStudio and platform: 
 i386.pc.mingw32/i386 (32-bit).

 My first attempt was a R code which, by using
System(filetorun.EXE, intern = F, ignore.stdout = F,
ignore.stderr = F, wait = T, input = NULL,
show.output.on.console = T, minimized = F, invisible = T))
 , ran the executable and kept required files to a conveniently named folder. 
 After that I changed my previous R script so I could use the function 
 lapply().This script apparently worked fine.

 Finally, I tried to parallelize the problem by using snow and parLapply(). 
 The resulting script looks like this:

 ## Not run
 #
 library(snow)cl - makeCluster(3, type = SOCK)
 clusterExport(cl,list('param.esp','copy.files','for12.template','program.executor'))
 parLapply(cl,a.list,a.function))stopCluster(cl)
 #
 ##End not run

 Although it runs, the parallelized version is messing up the input parameters 
 to pass to the executable (see table below, where parameters P1 and P2 are 
 considered. .s comes from the serial code and .p from the parallelized 
 one):
s r P1.s P2.s P1.p P2.p
 1 1 1  1.0 3.00  2.0 3.00
 2 2 1  1.5 3.00  2.0 3.75
 3 3 1  2.0 3.00  2.0 3.00
 4 4 1  1.0 3.75  1.5 3.00
 5 5 1  1.5 3.75  1.5 3.00
 6 6 1  2.0 3.75  2.0 3.75

 My first thought to avoid the described behaviour was creating a temporary 
 file, say tmp.id with id being an identification run number, and copying 
 filetorun.EXE and Input.txt to tmp.id. However, while doing so, I 
 realised that although running the correct filetorun.EXE copy (i.e., the 
 one in tmp.id) R looks for input.txt in the work directory.


Not sure about the real setup, but you can actually specify the path, 
not only filenames.

Uwe Ligges




 I've been looking thoroughly for a solution but I got nothing.

 Thanks for any help in advance,


 Xavier Portell Canal

 PhD candidate
 Department of Agri-food engineering,
 Universitat Politècnica de Catalunya

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Identify points that lie within polygon


Hello,

With your example run and click 10 black points inside the area.

ploc - locator(n=10)
points(ploc$x, ploc$y, pch = 19, col = green, cex = 1)

Hope this helps,

Rui Barradas

Em 06-08-2012 16:05, Ally escreveu:

I have a complex 2D polygon with thousands of vertices, and I'd like to be
able to identify points from a large set contained within the polygon, and
was wondering if there might be an efficient way of doing this?  Any advice
would be useful!  Here is a small example of what I mean:

# make polygon
v1-c(0,1,1,2,1,3,6,7)
v2-c(1,3,3,5,6,7,8,9)
plot(v1, v2, type = n )
polygon(v1, v2, lwd = 2, col = red)

# plot a set of candidate grid points
grid-seq(0, 10, length.out = 30)
pts-expand.grid(grid, grid)
points(pts, pch = 19, col = 1, cex = 1)

Many thanks!

Alastair




--
View this message in context: 
http://r.789695.n4.nabble.com/Identify-points-that-lie-within-polygon-tp4639289.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] issue with nzchar() ?

On Mon, Aug 6, 2012 at 9:53 AM, Liviu Andronic landronim...@gmail.com wrote:
 On Mon, Aug 6, 2012 at 4:48 PM, Liviu Andronic landronim...@gmail.com wrote:
 string, something that I find strange. At best NA is the equivalent of
 an empty string.

Certainly not to my mind, unless you think that zero and NA should be
the same for integers and doubles as well. NA (in whatever form) is,
to my mind, _unknown_ which is very different than knowing 0.

 In this sense, if you Hmisc::describe() the vector
 you get, as I would expect, that in the context of character vectors
 NA and '' values are considered together:


 By the way, same question holds for nchar(): Should NA values be
 reported as 2-char strings, or as 0-char empty/missing values?
 x - c(letters, NA, '')
 nchar(x)
  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 0


I'm not sure why that's the case, but it's documented on the help page
(under value):

 For ‘nchar’, an integer vector giving the sizes of each element,
 currently always ‘2’ for missing values (for ‘NA’).

so I don't see any bug.

My guess is that it's this way for back-compatability from a time when
there probably wasn't a proper NA_character_ (that's the parser
literal for a character NA) and they really were just NA (the
string) -- perhaps in some far distant R 3.0 we'll see
nchar(NA_character_) = NA_integer_

Best,
Michael


 Liviu

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] bibtex::read.bib -- extracting bibentry keys

2012-08-06 Thread Michael Friendly

I have two versions of a bibtex database which have gotten badly out of 
sync. I need to find find all the entries in

bib2 which are not contained in bib1, according to their bibtex keys.
But I can't figure out how to extract a list of the bibentry keys in 
these databases.


A minor question: Is there someway to prevent read.bib from ignoring 
entries that do not contain all required fields?


A suggestion: it would be nice if bibtex provided some extractor 
functions for bibentry fields.


 bib1 - read.bib(C:/localtexmf/bibtex/bib/timeref.bib)
ignoring entry 'Donoho-etal:1988' (line 40) because :
A bibentry of bibtype ‘InCollection’ has to correctly specify the 
field(s): booktitle

... snipping other similar warnings ...

 length(bib1)
[1] 628

 bib2 - read.bib(W:/texmf/bibtex/bib/timeref.bib)
ignoring entry 'Donoho-etal:1988' (line 57) because :
A bibentry of bibtype ‘InCollection’ has to correctly specify the 
field(s): booktitle

... snipping other similar warnings ...

 length(bib2)
[1] 611

# The first bibentry:
 bib1[[1]]
Godfrey EH (1918). “History and Development of Statistics in Canada.” In 
Koren J (ed.), pp.

179-198. Macmillan, New York.
 str(bib1[[1]])
Class 'bibentry' hidden list of 1
$ :List of 9
..$ author :Class 'person' hidden list of 1
.. ..$ :List of 5
.. .. ..$ given : chr [1:2] Ernest H.
.. .. ..$ family : chr Godfrey
.. .. ..$ role : NULL
.. .. ..$ email : NULL
.. .. ..$ comment: NULL
..$ title : chr History and Development of Statistics in {Canada}
..$ booktitle: chr History of Statistics, their Development and 
Progress in Many Countries

..$ publisher: chr Macmillan
..$ year : chr 1918
..$ editor :Class 'person' hidden list of 1
.. ..$ :List of 5
.. .. ..$ given : chr John
.. .. ..$ family : chr Koren
.. .. ..$ role : NULL
.. .. ..$ email : NULL
.. .. ..$ comment: NULL
..$ pages : chr 179--198
..$ address : chr New York
..$ crossref : chr Koren:1918
..- attr(*, bibtype)= chr InCollection
..- attr(*, key)= chr Godfrey:1918

So, I try to get the key attribute for this entry, but it returns 
NULL, and I don't understand why.

 attr(bib1[[1]], key)
NULL
 attr(bib1[1], key)
NULL

-Michael

--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University  Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Identify points that lie within polygon

2012-08-06 Thread Jeff Newmiller

This is off-topic (not about R), and a quick Web search of test within 
polygon yields many results, and adding R to the search when using Google 
provides hints about applying the algorithms in R.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Ally a.rushwo...@stats.gla.ac.uk wrote:

I have a complex 2D polygon with thousands of vertices, and I'd like to
be
able to identify points from a large set contained within the polygon,
and
was wondering if there might be an efficient way of doing this?  Any
advice
would be useful!  Here is a small example of what I mean:

# make polygon
v1-c(0,1,1,2,1,3,6,7)
v2-c(1,3,3,5,6,7,8,9)
plot(v1, v2, type = n )
polygon(v1, v2, lwd = 2, col = red)

# plot a set of candidate grid points
grid-seq(0, 10, length.out = 30)
pts-expand.grid(grid, grid)
points(pts, pch = 19, col = 1, cex = 1)

Many thanks!

Alastair




--
View this message in context:
http://r.789695.n4.nabble.com/Identify-points-that-lie-within-polygon-tp4639289.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] issue with nzchar() ?

2012-08-06 Thread Bert Gunter

Liviu:

Well, as usual, to a certain extent this is arbitrary and the only
issue is whether it is documented correctly.

To me, NA (of whatever mode) means indeterminate or unknown, so
since  is known and of length 0, I would have expected NA as a
return. But the point is, not what our particular tastes are (You say
'tomayto', I say 'tomahto,' an old song goes), but what the docs say.
And in both cases, they tell you exactly what you'll get.

For nchar():  an integer vector giving the sizes of each element,
currently always 2 for missing values (for NA)

and for nzchar: a logical vector of the same length as x, true if and
only if the element has non-zero length. (note the 'only if').

So I see no error or inconsistencies anywhere.

-- Bert

On Mon, Aug 6, 2012 at 7:53 AM, Liviu Andronic landronim...@gmail.com wrote:
 On Mon, Aug 6, 2012 at 4:48 PM, Liviu Andronic landronim...@gmail.com wrote:
 string, something that I find strange. At best NA is the equivalent of
 an empty string. In this sense, if you Hmisc::describe() the vector
 you get, as I would expect, that in the context of character vectors
 NA and '' values are considered together:


 By the way, same question holds for nchar(): Should NA values be
 reported as 2-char strings, or as 0-char empty/missing values?
 x - c(letters, NA, '')
 nchar(x)
  [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 0


 Liviu

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Identify points that lie within polygon

2012-08-06 Thread Ally

Thanks for the suggestion, got exactly what I needed from

library(splancs)
?pip

Alastair

Jeff Newmiller wrote

This is off-topic (not about R), and a quick Web search of test within
polygon yields many results, and adding R to the search when using
Google provides hints about applying the algorithms in R.
---
Jeff NewmillerThe . . Go
Live...
DCN:lt;jdnewmil@.cagt;Basics: ##.#. ##.#. Live Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/BatteriesO.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#.
rocks...1k
---
Sent from my phone. Please excuse my brevity.

Ally lt;a.rushworth@.acgt; wrote:

I have a complex 2D polygon with thousands of vertices, and I'd like to
be
able to identify points from a large set contained within the polygon,
and
was wondering if there might be an efficient way of doing this? Any
advice
would be useful! Here is a small example of what I mean:

# make polygon
v1-c(0,1,1,2,1,3,6,7)
v2-c(1,3,3,5,6,7,8,9)
plot(v1, v2, type = n )
polygon(v1, v2, lwd = 2, col = red)

# plot a set of candidate grid points
grid-seq(0, 10, length.out = 30)
pts-expand.grid(grid, grid)
points(pts, pch = 19, col = 1, cex = 1)

Many thanks!

Alastair

--
View this message in context:
http://r.789695.n4.nabble.com/Identify-points-that-lie-within-polygon-tp4639289.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@ mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
View this message in context:
http://r.789695.n4.nabble.com/Identify-points-that-lie-within-polygon-tp4639289p4639296.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] bibtex::read.bib -- extracting bibentry keys

2012-08-06 Thread Achim Zeileis


On Mon, 6 Aug 2012, Michael Friendly wrote:

I have two versions of a bibtex database which have gotten badly out of 
sync. I need to find find all the entries in bib2 which are not 
contained in bib1, according to their bibtex keys. But I can't figure 
out how to extract a list of the bibentry keys in these databases.


read.bib() returns a bibentry object so you can simply do this as usual
for bibentry objects with $key:

x - read.bib(...)
x$key

or maybe

unlist(x$key)

Whatever is more convenient for you. See ?bibentry for more details.

A minor question: Is there someway to prevent read.bib from ignoring 
entries that do not contain all required fields?


Also not really an issue with read.bib itself. read.bib() wants to return 
a bibentry object but bibentry() just allows to create objects that are 
valid BibTeX, i.e., have all required fields.


A suggestion: it would be nice if bibtex provided some extractor functions 
for bibentry fields.


So that only a subset of fields is read as opposed to all fields?

If you read all fields, you can easily subset afterwards (again using 
$-notation).


hth,
Z


bib1 - read.bib(C:/localtexmf/bibtex/bib/timeref.bib)

ignoring entry 'Donoho-etal:1988' (line 40) because :
A bibentry of bibtype ?InCollection? has to correctly specify the field(s): 
booktitle

... snipping other similar warnings ...


length(bib1)

[1] 628


bib2 - read.bib(W:/texmf/bibtex/bib/timeref.bib)

ignoring entry 'Donoho-etal:1988' (line 57) because :
A bibentry of bibtype ?InCollection? has to correctly specify the field(s): 
booktitle

... snipping other similar warnings ...


length(bib2)

[1] 611

# The first bibentry:

bib1[[1]]
Godfrey EH (1918). ?History and Development of Statistics in Canada.? In 
Koren J (ed.), pp.

179-198. Macmillan, New York.

str(bib1[[1]])

Class 'bibentry' hidden list of 1
$ :List of 9
..$ author :Class 'person' hidden list of 1
.. ..$ :List of 5
.. .. ..$ given : chr [1:2] Ernest H.
.. .. ..$ family : chr Godfrey
.. .. ..$ role : NULL
.. .. ..$ email : NULL
.. .. ..$ comment: NULL
..$ title : chr History and Development of Statistics in {Canada}
..$ booktitle: chr History of Statistics, their Development and Progress in 
Many Countries

..$ publisher: chr Macmillan
..$ year : chr 1918
..$ editor :Class 'person' hidden list of 1
.. ..$ :List of 5
.. .. ..$ given : chr John
.. .. ..$ family : chr Koren
.. .. ..$ role : NULL
.. .. ..$ email : NULL
.. .. ..$ comment: NULL
..$ pages : chr 179--198
..$ address : chr New York
..$ crossref : chr Koren:1918
..- attr(*, bibtype)= chr InCollection
..- attr(*, key)= chr Godfrey:1918

So, I try to get the key attribute for this entry, but it returns NULL, and 
I don't understand why.

attr(bib1[[1]], key)

NULL

attr(bib1[1], key)

NULL

-Michael

--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University  Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele StreetWeb:   http://www.datavis.ca
Toronto, ONT  M3J 1P3 CANADA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] test if elements of a character vector contain letters

Dear all
I'm pretty sure that I'm approaching the problem in a wrong way.
Suppose the following character vector:
 (x[1:10] - paste(x[1:10], sample(1:10, 10), sep=''))
 [1] a10 b7  c2  d3  e6  f1  g5  h8  i9  j4
 x
 [1] a10 b7  c2  d3  e6  f1  g5  h8  i9  j4  k
l   m   n
[15] o   p   q   r   s   t   u   v   w   x   y
z   1   2
[29] 3   4   5   6   7   8   9   10  11  12  13
14  15  16
[43] 17  18  19  20  21  22  23  24  25  26


How do you test whether the elements of the vector contain at least
one letter (or at least one digit) and obtain a logical vector of the
same dimension? I came up with the following awkward function:
is_letter - function(x, pattern=c(letters, LETTERS)){
sapply(x, function(y){
any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
})
}

 is_letter(x)
  a10b7c2d3e6f1g5h8i9j4 k
l m n o
 TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
TRUE  TRUE  TRUE  TRUE
p q r s t u v w x y z
1 2 3 4
 TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
FALSE FALSE FALSE FALSE
5 6 7 8 9101112131415
16171819
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE
   20212223242526
FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 is_letter(x, 0:9)  ##function slightly misnamed
  a10b7c2d3e6f1g5h8i9j4 k
l m n o
 TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE
FALSE FALSE FALSE FALSE
p q r s t u v w x y z
1 2 3 4
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
TRUE  TRUE  TRUE  TRUE
5 6 7 8 9101112131415
16171819
 TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
TRUE  TRUE  TRUE  TRUE
   20212223242526
 TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE


Is there a nicer way to do this? Regards
Liviu


-- 
Do you know how to read?
http://www.alienetworks.com/srtest.cfm
http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
Do you know how to write?
http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to assing unique ID in a table and do regression


Sorry, forgot to Cc the list.

Em 06-08-2012 17:29, Rui Barradas escreveu:

Hello,

I'm glad it helped.

The result of function cut() is a factor variable so you can coerce it 
to integer, giving more normal names, or, if you want to keep track 
of the intervals the adjusted r2 belong to, got straight to the last 
two lines in the following code.



#dat1$groups - as.integer( cut( ...etc... ) )

[...rest of your code... ]

adj - summary(lin.temp1)$adj.r.squared
class(adj) - list


That's it. It has as names the intervals produced by cut that appear 
in the output you've posted.


Rui Barradas

Em 06-08-2012 17:07, Kristi Glover escreveu:




Dear Rui,
Thanks for the help. I really appricated . It helped me out.
I modified some of the script you gave me becasue I found the package 
'nlme'  can also do it. But I do use the script you gave me to split 
the data

  dat1$groups-cut(dat1$LATITUDE, seq(-56,79, by=2.5))
lin.temp1-lmList(S~mean_temp|groups,data=dat1)
  could you please give me an idea how I can extract r adjusted and 
put them in a table?
I called summary but it gave me the value of r2 adjusted for each 
group but I don't know how I can put teh r2 adjusted in table (like: 
group , r2 sqaure, r2 adjusted)

summary(lin.temp1)$adj.r.squared

(-56,-53.5] :
[1] 0.2565786
(-53.5,-51] :
[1] 0.0715485
(-51,-48.5] :
[1] 0.2265334

Thanks
Kristi


Date: Sat, 4 Aug 2012 16:15:57 +0100
From: ruipbarra...@sapo.pt
To: kristi.glo...@hotmail.com
CC: r-help@r-project.org
Subject: Re: [R] how to assing unique ID in a table and do regression

Hello,

Try the following.


id.groups - with(dat, cut(ID, breaks=0:ceiling(max(ID
sp - split(dat, id.groups)
regressors - grep(en, names(dat))
models - lapply(sp, function(.df)
  lapply(regressors, function(x) lm(.df[[S]] ~ .df[[x]])))

mod.summ - lapply(models, function(x) lapply(x, summary))
# First R2
mod.r2 - lapply(mod.summ, function(x) lapply(x, `[[`, r.squared))
mod.r2

# Now p-values
mod.coef - lapply(mod.summ, function(x) lapply(x, coef))
mod.pvalue - lapply(mod.coef,  function(x) lapply(x, `[`, , 4))
# p-values in matrix form, columns are 'en2', en3', etc
#lapply(mod.pvalue, function(x) do.call(cbind, x))

Hope this helps,

Rui Barradas

Em 04-08-2012 15:22, Kristi Glover escreveu:

Hi R- User
I have very big data set (5000 rows). I wanted to make classes 
based on a column of that table (that column has the data which is 
continuous .) After converting into different class, this class 
would be Unique ID. I want to run regression for each ID.

For example I have a data set

dput(dat)

structure(list(ID = c(0.1, 0.8, 0.1, 1.5, 1.1, 0.9, 1.8, 2.5,
2, 2.5, 2.8, 3, 3.1, 3.2, 3.9, 1, 4, 4.7, 4.3, 4.9, 2.1, 2.4),
  S = c(4L, 7L, 9L, 10L, 10L, 8L, 8L, 8L, 17L, 18L, 13L, 13L,
  11L, 1L, 10L, 20L, 22L, 20L, 18L, 16L, 7L, 20L), en2 = 
c(-2.5767,

  -2.5767, -2.5767, -2.5767, -2.5767, -2.5767, -2.5767, -2.5347,
  -2.5347, -2.5347, -2.5347, -2.5347, -2.5347, -2.4939, -2.4939,
  -2.4939, -2.4939, -2.4939, -2.4939, -2.4939, -2.4543, -2.4543
  ), en3 = c(-1.1785, -0.6596, -0.6145, -0.6437, -0.6593, -0.7811,
  -1.1785, -1.1785, -1.1785, -0.6596, -0.6145, -0.6437, -0.6593,
  -1.1785, -0.1342, -0.2085, -0.4428, -0.5125, -0.8075, -1.1785,
  -1.1785, -0.1342), en4 = c(-1.4445, -1.3645, -1.1634, -0.7735,
  -0.6931, -1.1105, -1.4127, -1.5278, -1.4445, -1.3645, -1.1634,
  -0.7735, -0.6931, -1.0477, -0.8655, -0.1759, 0.1203, -0.2962,
  -0.4473, -1.0436, -0.9705, -0.8953), en5 = c(-0.4783, -0.3296,
  -0.2026, -0.3579, -0.5154, -0.5726, -0.6415, -0.3996, -0.4529,
  -0.5762, -0.561, -0.6891, -0.7408, -0.6287, -0.4337, -0.4586,
  -0.5249, -0.6086, -0.7076, -0.7114, -0.4952, 0.1091)), .Names 
= c(ID,
S, en2, en3, en4, en5), class = data.frame, row.names = 
c(NA,

-22L))

Here ID has continuous value, I want to make groups with value 0-1, 
1-2, 2-3, 3-4 from the column ID.
and then. I wanted to run regression with S (dependent variable) 
and en2 (independent variable); again regression of S and en3 , and 
so on.

After that, I wanted to have a table with r2 and p value.

would you help me how I can do it? I was trying it manually - but 
it took so much time. therefore I thought to write you for your help.


Thanks for your help.
Kristi




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] more efficient way to parallel

2012-08-06 Thread Jie

After searching online, I found that clusterCall or foreach might be the
solution.

Best wishes,
Jie

On Sun, Aug 5, 2012 at 10:23 PM, Jie jimmycl...@gmail.com wrote:

 Dear All,

 Suppose I have a program as below: Outside is a loop for simulation (with
 random generated data), inside there are several sapply()'s (10~100) over
 the data and something else, but these sapply's have to be sequential. And
 each sapply do not involve very intensive calculation (a few seconds only).
 So the outside loop takes minutes to finish one iteration.
 I guess the better way is not to parallel sapply but the outer loop.
 But I have no idea how to modify it. I have a simple code here. Only two
 sapply's involved for simplicity. The logical in the sapply is not
  important.
 Thank you for your attention and suggestion.

 library(parallel)
 library(MASS)
 result.seq=c()
 Maxi - 100
 for (i in 1:Maxi)
 {
 ## initialization, not of interest
 Sigmahalf - matrix(sample(1:1,size = 1,replace =T ),  100)
 Sigma - t(Sigmahalf)%*%Sigmahalf
 x - mvrnorm(n=1000, rep(0, 10), Sigma)
 xlist - list()
 for (j in 1:1000)
 {
 xlist[[j]] - list(X = matrix( x [j, ],5))
 }
 ## end of initialization

 dd1 - sapply(xlist,function(s) {min(abs((eigen(s$X))$values))})
  ##
 sumdd1=sum(dd1)
 for (j in 1:1000)
 {
 xlist[[j]]$dd1 - dd1[j]/sumdd1
 }
   ## Assume dd2 and dd1 can not be combined in one sapply()
 dd2 - sapply(xlist, function(s){min(abs((eigen(s$X))$values))+s$dd1})
 result.seq[i] - sum(dd1*dd2)

 }



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] test if elements of a character vector contain letters

2012-08-06 Thread Bert Gunter

nzchar(x)  !is.na(x)

No?

-- Bert

On Mon, Aug 6, 2012 at 9:25 AM, Liviu Andronic landronim...@gmail.com wrote:
 Dear all
 I'm pretty sure that I'm approaching the problem in a wrong way.
 Suppose the following character vector:
 (x[1:10] - paste(x[1:10], sample(1:10, 10), sep=''))
  [1] a10 b7  c2  d3  e6  f1  g5  h8  i9  j4
 x
  [1] a10 b7  c2  d3  e6  f1  g5  h8  i9  j4  k
 l   m   n
 [15] o   p   q   r   s   t   u   v   w   x   y
 z   1   2
 [29] 3   4   5   6   7   8   9   10  11  12  13
 14  15  16
 [43] 17  18  19  20  21  22  23  24  25  26


 How do you test whether the elements of the vector contain at least
 one letter (or at least one digit) and obtain a logical vector of the
 same dimension? I came up with the following awkward function:
 is_letter - function(x, pattern=c(letters, LETTERS)){
 sapply(x, function(y){
 any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
 })
 }

 is_letter(x)
   a10b7c2d3e6f1g5h8i9j4 k
 l m n o
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 TRUE  TRUE  TRUE  TRUE
 p q r s t u v w x y z
 1 2 3 4
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 FALSE FALSE FALSE FALSE
 5 6 7 8 9101112131415
 16171819
 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 FALSE FALSE FALSE FALSE
20212223242526
 FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 is_letter(x, 0:9)  ##function slightly misnamed
   a10b7c2d3e6f1g5h8i9j4 k
 l m n o
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE
 FALSE FALSE FALSE FALSE
 p q r s t u v w x y z
 1 2 3 4
 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 TRUE  TRUE  TRUE  TRUE
 5 6 7 8 9101112131415
 16171819
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 TRUE  TRUE  TRUE  TRUE
20212223242526
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE


 Is there a nicer way to do this? Regards
 Liviu


 --
 Do you know how to read?
 http://www.alienetworks.com/srtest.cfm
 http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
 Do you know how to write?
 http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] more efficient way to parallel

2012-08-06 Thread John Kerpel

Not that I've had a chance to really look at the problem, but I've removed
outer loops using parLapply from the parallel package.  Works great.

On Mon, Aug 6, 2012 at 11:41 AM, Jie jimmycl...@gmail.com wrote:

 After searching online, I found that clusterCall or foreach might be the
 solution.

 Best wishes,
 Jie

 On Sun, Aug 5, 2012 at 10:23 PM, Jie jimmycl...@gmail.com wrote:

  Dear All,
 
  Suppose I have a program as below: Outside is a loop for simulation (with
  random generated data), inside there are several sapply()'s (10~100) over
  the data and something else, but these sapply's have to be sequential.
 And
  each sapply do not involve very intensive calculation (a few seconds
 only).
  So the outside loop takes minutes to finish one iteration.
  I guess the better way is not to parallel sapply but the outer loop.
  But I have no idea how to modify it. I have a simple code here. Only two
  sapply's involved for simplicity. The logical in the sapply is not
   important.
  Thank you for your attention and suggestion.
 
  library(parallel)
  library(MASS)
  result.seq=c()
  Maxi - 100
  for (i in 1:Maxi)
  {
  ## initialization, not of interest
  Sigmahalf - matrix(sample(1:1,size = 1,replace =T ),  100)
  Sigma - t(Sigmahalf)%*%Sigmahalf
  x - mvrnorm(n=1000, rep(0, 10), Sigma)
  xlist - list()
  for (j in 1:1000)
  {
  xlist[[j]] - list(X = matrix( x [j, ],5))
  }
  ## end of initialization
 
  dd1 - sapply(xlist,function(s) {min(abs((eigen(s$X))$values))})
   ##
  sumdd1=sum(dd1)
  for (j in 1:1000)
  {
  xlist[[j]]$dd1 - dd1[j]/sumdd1
  }
## Assume dd2 and dd1 can not be combined in one sapply()
  dd2 - sapply(xlist, function(s){min(abs((eigen(s$X))$values))+s$dd1})
  result.seq[i] - sum(dd1*dd2)
 
  }
 
 

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] more efficient way to parallel

2012-08-06 Thread Martin Morgan


On 08/06/2012 09:41 AM, Jie wrote:

After searching online, I found that clusterCall or foreach might be the
solution.


Re-write your outer loop as an lapply, then on non-Windows use 
parallel::mclapply. Or on windows use makePSOCKcluster and parLapply. I 
ended with


library(parallel)
library(MASS)
Maxi - 10
Maxj - 1000

doit - function(i, Maxi, Maxj)
{
## initialization, not of interest
Sigmahalf - matrix(sample(1, replace=TRUE),  100)
Sigma - t(Sigmahalf) %*% Sigmahalf
x - mvrnorm(n=Maxj, rep(0, 100), Sigma)
xlist - lapply(seq_len(nrow(x)), function(i, x) matrix(x[i,], 10), x)
## end of initialization

fun - function(x) {
v - eigen(x, symmetric=FALSE, only.values=TRUE)$values
min(abs(v))
}
dd1 - sapply(xlist, fun)
dd2 - dd1 + dd1 / sum(dd1)
sum(dd1 * dd2)
}

 system.time(lapply(1:8, doit, Maxi, Maxj))
   user  system elapsed
  6.677   0.016   6.714
 system.time(mclapply(1:64, doit, Maxi, Maxj, mc.cores=8))
   user  system elapsed
 68.857   1.032  10.398

the extra arguments to eigen are important, as is avoiding unnecessary 
repeated calculations. The strategy of allocate-and-grow 
(result.vec=numeric(); result.vec[i] - ...) is very inefficient 
(result.vec is copied in its entirety for each new value of i); better 
preallocate-and-fill (result.vec = integer(Maxi); result.vec[i] = ...) 
or let lapply manage the allocation.


Martin



Best wishes,
Jie

On Sun, Aug 5, 2012 at 10:23 PM, Jie jimmycl...@gmail.com wrote:


Dear All,

Suppose I have a program as below: Outside is a loop for simulation (with
random generated data), inside there are several sapply()'s (10~100) over
the data and something else, but these sapply's have to be sequential. And
each sapply do not involve very intensive calculation (a few seconds only).
So the outside loop takes minutes to finish one iteration.
I guess the better way is not to parallel sapply but the outer loop.
But I have no idea how to modify it. I have a simple code here. Only two
sapply's involved for simplicity. The logical in the sapply is not
  important.
Thank you for your attention and suggestion.

library(parallel)
library(MASS)
result.seq=c()
Maxi - 100
for (i in 1:Maxi)
{
## initialization, not of interest
Sigmahalf - matrix(sample(1:1,size = 1,replace =T ),  100)
Sigma - t(Sigmahalf)%*%Sigmahalf
x - mvrnorm(n=1000, rep(0, 10), Sigma)
xlist - list()
for (j in 1:1000)
{
xlist[[j]] - list(X = matrix( x [j, ],5))
}
## end of initialization

dd1 - sapply(xlist,function(s) {min(abs((eigen(s$X))$values))})
  ##
sumdd1=sum(dd1)
for (j in 1:1000)
{
xlist[[j]]$dd1 - dd1[j]/sumdd1
}
   ## Assume dd2 and dd1 can not be combined in one sapply()
dd2 - sapply(xlist, function(s){min(abs((eigen(s$X))$values))+s$dd1})
result.seq[i] - sum(dd1*dd2)

}




[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] test if elements of a character vector contain letters


Hello,

Fun as an exercise in vectorization. 30 times faster. Don't look, guess.

Gave it up? Ok, here it is.


is_letter - function(x, pattern=c(letters, LETTERS)){
sapply(x, function(y){
any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
})
}
# test ascii codes, just one loop.
has_letter - function(x){
sapply(x, function(y){
y - as.integer(charToRaw(y))
any((65 = y  y = 90) | (97 = y  y = 122))
})
}

x - c(letters, 1:26)
x[1:10] - paste(x[1:10], sample(1:10, 10), sep='')
x - rep(x, 1e3)

t1 - system.time(is_letter(x))
t2 - system.time(has_letter(x))
rbind(t1, t2, t1/t2)
   user.self sys.self elapsed user.child sys.child
t1 15.690   15.74 NANA
t2  0.5000.50 NANA
   31.38  NaN   31.48 NANA


Em 06-08-2012 17:25, Liviu Andronic escreveu:

Dear all
I'm pretty sure that I'm approaching the problem in a wrong way.
Suppose the following character vector:

(x[1:10] - paste(x[1:10], sample(1:10, 10), sep=''))

  [1] a10 b7  c2  d3  e6  f1  g5  h8  i9  j4

x

  [1] a10 b7  c2  d3  e6  f1  g5  h8  i9  j4  k
l   m   n
[15] o   p   q   r   s   t   u   v   w   x   y
z   1   2
[29] 3   4   5   6   7   8   9   10  11  12  13
14  15  16
[43] 17  18  19  20  21  22  23  24  25  26


How do you test whether the elements of the vector contain at least
one letter (or at least one digit) and obtain a logical vector of the
same dimension? I came up with the following awkward function:
is_letter - function(x, pattern=c(letters, LETTERS)){
 sapply(x, function(y){
 any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
 })
}


is_letter(x)

   a10b7c2d3e6f1g5h8i9j4 k
l m n o
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
TRUE  TRUE  TRUE  TRUE
 p q r s t u v w x y z
1 2 3 4
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
FALSE FALSE FALSE FALSE
 5 6 7 8 9101112131415
16171819
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE
20212223242526
FALSE FALSE FALSE FALSE FALSE FALSE FALSE

is_letter(x, 0:9)  ##function slightly misnamed

   a10b7c2d3e6f1g5h8i9j4 k
l m n o
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE
FALSE FALSE FALSE FALSE
 p q r s t u v w x y z
1 2 3 4
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
TRUE  TRUE  TRUE  TRUE
 5 6 7 8 9101112131415
16171819
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
TRUE  TRUE  TRUE  TRUE
20212223242526
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE


Is there a nicer way to do this? Regards
Liviu




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] test if elements of a character vector contain letters

2012-08-06 Thread Martin Morgan


On 08/06/2012 09:51 AM, Rui Barradas wrote:

Hello,

Fun as an exercise in vectorization. 30 times faster. Don't look, guess.


 system.time(res0 - grepl([[:alpha:]], x))
   user  system elapsed
  0.060   0.000   0.061
 system.time(res1 - has_letter(x))
   user  system elapsed
  3.728   0.008   3.747
 all.equal(res0, res1, check.attributes=FALSE)
[1] TRUE



Gave it up? Ok, here it is.


is_letter - function(x, pattern=c(letters, LETTERS)){
 sapply(x, function(y){
 any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
 })
}
# test ascii codes, just one loop.
has_letter - function(x){
 sapply(x, function(y){
 y - as.integer(charToRaw(y))
 any((65 = y  y = 90) | (97 = y  y = 122))
 })
}

x - c(letters, 1:26)
x[1:10] - paste(x[1:10], sample(1:10, 10), sep='')
x - rep(x, 1e3)

t1 - system.time(is_letter(x))
t2 - system.time(has_letter(x))
rbind(t1, t2, t1/t2)
user.self sys.self elapsed user.child sys.child
t1 15.690   15.74 NANA
t2  0.5000.50 NANA
31.38  NaN   31.48 NANA


Em 06-08-2012 17:25, Liviu Andronic escreveu:

Dear all
I'm pretty sure that I'm approaching the problem in a wrong way.
Suppose the following character vector:

(x[1:10] - paste(x[1:10], sample(1:10, 10), sep=''))

  [1] a10 b7  c2  d3  e6  f1  g5  h8  i9  j4

x

  [1] a10 b7  c2  d3  e6  f1  g5  h8  i9  j4  k
l   m   n
[15] o   p   q   r   s   t   u   v   w   x   y
z   1   2
[29] 3   4   5   6   7   8   9   10  11  12  13
14  15  16
[43] 17  18  19  20  21  22  23  24  25  26


How do you test whether the elements of the vector contain at least
one letter (or at least one digit) and obtain a logical vector of the
same dimension? I came up with the following awkward function:
is_letter - function(x, pattern=c(letters, LETTERS)){
 sapply(x, function(y){
 any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
 })
}


is_letter(x)

   a10b7c2d3e6f1g5h8i9j4 k
l m n o
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
TRUE  TRUE  TRUE  TRUE
 p q r s t u v w x y z
1 2 3 4
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
FALSE FALSE FALSE FALSE
 5 6 7 8 9101112131415
16171819
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE
20212223242526
FALSE FALSE FALSE FALSE FALSE FALSE FALSE

is_letter(x, 0:9)  ##function slightly misnamed

   a10b7c2d3e6f1g5h8i9j4 k
l m n o
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE
FALSE FALSE FALSE FALSE
 p q r s t u v w x y z
1 2 3 4
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
TRUE  TRUE  TRUE  TRUE
 5 6 7 8 9101112131415
16171819
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
TRUE  TRUE  TRUE  TRUE
20212223242526
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE


Is there a nicer way to do this? Regards
Liviu




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] test if elements of a character vector contain letters

2012-08-06 Thread Marc Schwartz

Perhaps I am missing something, but why use sapply() when grepl() is already 
vectorized?

is.letter - function(x) grepl([:alpha:], x)
is.number - function(x) grepl([:digit:], x)

x - c(letters, 1:26)

x[1:10] - paste(x[1:10], sample(1:10, 10), sep='')

x - rep(x, 1e3)

 str(x)
 chr [1:52000] a2 b10 c8 d3 e6 f1 g5 ...

 system.time(is.letter(x))
   user  system elapsed 
  0.011   0.000   0.010 

 system.time(is.number(x))
   user  system elapsed 
  0.010   0.000   0.011 


Regards,

Marc Schwartz

On Aug 6, 2012, at 11:51 AM, Rui Barradas ruipbarra...@sapo.pt wrote:

 Hello,
 
 Fun as an exercise in vectorization. 30 times faster. Don't look, guess.
 
 Gave it up? Ok, here it is.
 
 
 is_letter - function(x, pattern=c(letters, LETTERS)){
sapply(x, function(y){
any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
})
 }
 # test ascii codes, just one loop.
 has_letter - function(x){
sapply(x, function(y){
y - as.integer(charToRaw(y))
any((65 = y  y = 90) | (97 = y  y = 122))
})
 }
 
 x - c(letters, 1:26)
 x[1:10] - paste(x[1:10], sample(1:10, 10), sep='')
 x - rep(x, 1e3)
 
 t1 - system.time(is_letter(x))
 t2 - system.time(has_letter(x))
 rbind(t1, t2, t1/t2)
   user.self sys.self elapsed user.child sys.child
 t1 15.690   15.74 NANA
 t2  0.5000.50 NANA
   31.38  NaN   31.48 NANA
 
 
 Em 06-08-2012 17:25, Liviu Andronic escreveu:
 Dear all
 I'm pretty sure that I'm approaching the problem in a wrong way.
 Suppose the following character vector:
 (x[1:10] - paste(x[1:10], sample(1:10, 10), sep=''))
  [1] a10 b7  c2  d3  e6  f1  g5  h8  i9  j4
 x
  [1] a10 b7  c2  d3  e6  f1  g5  h8  i9  j4  k
 l   m   n
 [15] o   p   q   r   s   t   u   v   w   x   y
 z   1   2
 [29] 3   4   5   6   7   8   9   10  11  12  13
 14  15  16
 [43] 17  18  19  20  21  22  23  24  25  26
 
 
 How do you test whether the elements of the vector contain at least
 one letter (or at least one digit) and obtain a logical vector of the
 same dimension? I came up with the following awkward function:
 is_letter - function(x, pattern=c(letters, LETTERS)){
 sapply(x, function(y){
 any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
 })
 }
 
 is_letter(x)
   a10b7c2d3e6f1g5h8i9j4 k
 l m n o
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 TRUE  TRUE  TRUE  TRUE
 p q r s t u v w x y z
 1 2 3 4
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 FALSE FALSE FALSE FALSE
 5 6 7 8 9101112131415
 16171819
 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 FALSE FALSE FALSE FALSE
20212223242526
 FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 is_letter(x, 0:9)  ##function slightly misnamed
   a10b7c2d3e6f1g5h8i9j4 k
 l m n o
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE
 FALSE FALSE FALSE FALSE
 p q r s t u v w x y z
 1 2 3 4
 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 TRUE  TRUE  TRUE  TRUE
 5 6 7 8 9101112131415
 16171819
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 TRUE  TRUE  TRUE  TRUE
20212223242526
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 
 
 Is there a nicer way to do this? Regards
 Liviu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] issue with nzchar() ?

2012-08-06 Thread David L Carlson

It would be nice to be able to trigger NA returning NA with an argument to
the function, but you can easily get that result:

 ifelse(is.na(x), NA, nzchar(x))
 [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[13]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[25]  TRUE  TRUENA FALSE

--
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77843-4352

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Bert Gunter
 Sent: Monday, August 06, 2012 10:43 AM
 To: Liviu Andronic
 Cc: r-help@r-project.org Help
 Subject: Re: [R] issue with nzchar() ?
 
 Liviu:
 
 Well, as usual, to a certain extent this is arbitrary and the only
 issue is whether it is documented correctly.
 
 To me, NA (of whatever mode) means indeterminate or unknown, so
 since  is known and of length 0, I would have expected NA as a
 return. But the point is, not what our particular tastes are (You say
 'tomayto', I say 'tomahto,' an old song goes), but what the docs say.
 And in both cases, they tell you exactly what you'll get.
 
 For nchar():  an integer vector giving the sizes of each element,
 currently always 2 for missing values (for NA)
 
 and for nzchar: a logical vector of the same length as x, true if and
 only if the element has non-zero length. (note the 'only if').
 
 So I see no error or inconsistencies anywhere.
 
 -- Bert
 
 On Mon, Aug 6, 2012 at 7:53 AM, Liviu Andronic landronim...@gmail.com
 wrote:
  On Mon, Aug 6, 2012 at 4:48 PM, Liviu Andronic
 landronim...@gmail.com wrote:
  string, something that I find strange. At best NA is the equivalent
 of
  an empty string. In this sense, if you Hmisc::describe() the vector
  you get, as I would expect, that in the context of character vectors
  NA and '' values are considered together:
 
 
  By the way, same question holds for nchar(): Should NA values be
  reported as 2-char strings, or as 0-char empty/missing values?
  x - c(letters, NA, '')
  nchar(x)
   [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 0
 
 
  Liviu
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 --
 
 Bert Gunter
 Genentech Nonclinical Biostatistics
 
 Internal Contact Info:
 Phone: 467-7374
 Website:
 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-
 biostatistics/pdb-ncb-home.htm
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-
 guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Tukey HSD not fully displayed in R console

2012-08-06 Thread ulrike

Dear all,

I would like to test the differences in dependent variable X depending on 2
grouping variables of each 10 levels. 
I do this with a 2-way ANOVA, followed by a Tukey HSD test (TukeyHSD(x)).
However, since a lot of combinations are possible with 2 grouping variables,
each of 10 levels, the result of the Tukey test is not fully displayed in
the console.

I tried to print it as a table (write.table () ) and open it afterwards in
Notepad or print e.g. only the first 30 rows of the result, but both without
success ...

Anyone an idea how I can deal with this problem?

Many thanks,

Ulrike



--
View this message in context: 
http://r.789695.n4.nabble.com/Tukey-HSD-not-fully-displayed-in-R-console-tp4639285.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] OO code organization

2012-08-06 Thread Michael Meyer

Greetings,
 
I am using S4 classes and the code is organized by putting each class into a 
separate source file
in a separate folder:
 
In Folder base/base.R:defines class base and does
 
setGeneric(showSelf,
function(this) standardGeneric(showSelf)
)
setMethod(showSelf,
signature(this=base),
definition=
function(this){ ... }
)
 
 
 
For j=1,...,n:
in  derived_j/derived_j.R:   defines class derived_j and does
 
setMethod(showSelf,
signature(this=derived_j),
definition=
function(this){ ... }
)
 
Finally in tests/tests.R    we do
 
source(../base/base.R)
source(../derived_1/derived_1.R)
source(../derived_2/derived_2.R)

source(../derived_n/derived_n.R)
 
 now we check which methods showSelf are known at this point:
 
showMethods(showSelf)
 
and get
showMethods(showSelf)
Function: showSelf (package .GlobalEnv)
this=base
this=derived_n
 
 
 
The methods with signature 
this=derived_j,  jn
are not known.
 
Needless to say this makes the code useless.
How can I remedy this evil?
 
Many thanks,
 
Michael Meyer
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] find date between two other dates

Hi,

I run the second list of codes (is.between()) again from the sent mail.  It 
works fine for me.  I am using R 2.15 on Ubuntu 12.04.    

sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C  
 [3] LC_TIME=en_US.UTF-8    LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=C LC_NAME=C 
 [9] LC_ADDRESS=C   LC_TELEPHONE=C    
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base 

other attached packages:
[1] stringr_0.6   reshape_0.8.4 plyr_1.7.1   


is.between-function(x,a,b){
  xa x=b
  }
 ddate -  c(29/12/1998 20:00:33, 02/01/1999 05:20:44, 02/01/1999 
06:18:36, 02/02/1999 07:06:59, 02/03/1999 07:10:56, 02/03/1999 07:57:18)
 ddate - as.POSIXct(strptime(ddate, %d/%m/%Y %H:%M:%S), GMT)
 ddate1-data.frame(date=ddate)
 date2-c(01/12/1998 00:00:00, 31/12/1998 23:59:59, 01/01/1999 00:00:00, 
31/01/1999 23:59:59, 01/02/1999 00:00:00, 28/02/1999 23:59:59,
 01/03/1999 00:00:00, 31/03/1999 23:59:59)
 date3-as.POSIXct(strptime(date2, %d/%m/%Y %H:%M:%S), GMT) 
 ddate1[is.between(ddate1$date,date3[2],date3[1]),Season]-1 
  ddate1[is.between(ddate1$date,date3[4],date3[3]),Season]-2
  ddate1[is.between(ddate1$date,date3[6],date3[5]),Season]-3
  ddate1[is.between(ddate1$date,date3[8],date3[7]),Season]-4 
 ddate1
 date Season
1 1998-12-29 20:00:33  1
2 1999-01-02 05:20:44  2
3 1999-01-02 06:18:36  2
4 1999-02-02 07:06:59  3
5 1999-03-02 07:10:56  4
6 1999-03-02 07:57:18  4
#
Not sure how you are getting NA.  One possibility is that if you used 
date2(which is not converted) instead of date3 (as in date3 
-as.POSIXct)
If you did this:

    ddate1[is.between(ddate1$date,date2[2],date2[1]),Season]-1
    ddate1[is.between(ddate1$date,date2[4],date2[3]),Season]-2
    ddate1[is.between(ddate1$date,date2[6],date2[5]),Season]-3
    ddate1[is.between(ddate1$date,date2[8],date2[7]),Season]-4 
    ddate1
 date Season
1 1998-12-29 20:00:33 NA
2 1999-01-02 05:20:44 NA
3 1999-01-02 06:18:36 NA
4 1999-02-02 07:06:59 NA
5 1999-03-02 07:10:56 NA
6 1999-03-02 07:57:18 NA
A.K.





- Original Message -
From: penguins cat...@bas.ac.uk
To: r-help@r-project.org
Cc: 
Sent: Monday, August 6, 2012 4:13 AM
Subject: Re: [R] find date between two other dates

Thanks arun and Rui; 3 fantastic suggestions. 

The Season interval is not always a month so arun's suggestion works better
for this dataset. I couldn't get the as.between function to work on arun's
second suggestion, it only returned NAs. 

However, arun's first suggestion worked a treat!

Many thanks 



--
View this message in context: 
http://r.789695.n4.nabble.com/find-date-between-two-other-dates-tp4639231p4639253.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] regexpr with accents



Hi,

Here, the string with in the quotes are read exactly like that.  So, you may 
have to use the symbol instead of friendly or numeric from the link.  Or 
you have to convert those.

d1 - data.frame(V1 = 1:4,
    V2 = c(some text = 9, some tegravext = 9, some tèxt = 9, some 
t#232xt = 9))

d1$V1[regexpr(some tegravext = 9,d1$V2)0] - 9
 d1$V1[regexpr(some t#232xt = 9,d1$V2)0] - 9
d1$V1[regexpr(some tèxt = 9,d1$V2)0] - 9

d1
  V1  V2
1  1   some text = 9
2  9 some tegravext = 9
3  9   some tèxt = 9
4  9   some t#232xt = 9

A.K.


- Original Message -
From: Luca Meyer lucam1...@gmail.com
To: r-help@r-project.org
Cc: 
Sent: Monday, August 6, 2012 8:25 AM
Subject: [R]  regexpr with accents

Sorry but my previous email did not go through properly. Instead of the ? you 
should really read an egrave or #232 according to 
http://www.lookuptables.com/.

So there are extended ASCII characters I need to deal with.

I have tried

d1$V1[regexpr(some tegravext = 9,d1$V2)0] - 9
and 

d1$V1[regexpr(some t#232xt = 9,d1$V2)0] - 9

without success...

Thanks,
Luca




    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] program of matrix

2012-08-06 Thread hafida

Hi 
can ANY body  help me to programme this formula: 

c[lj]  and  c[l'j]   are  matrix

A[j]^-1  is an invertible diagonal matrix

g[ll']=i[ll'] - sum *#from j=1 to k#*  c[lj]c[l'j]A[j]^-1 

WHERE 

i[ll']= 1/n  sum from i=1 to n  z[il] z[il']

n,k,m  are given.  j=1...k,l,l'=1...m,  

it s complicate for me ; hope you can help me 
thank you a lot



--
View this message in context: 
http://r.789695.n4.nabble.com/program-of-matrix-tp4639288.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cannot find function simpleRDA2

2012-08-06 Thread Lindsey Leigh Sloat

Hi,

I am trying to run the command forward.sel.par, however I receive
the error message: Error: could not find function 'simpleRDA2'. I
have the vegan library loaded. The documentation on varpart has not
helped me to understand why I cannot call this function. Maybe I am
missing something obvious because I am still an 'R' novice.

Below is a reproducible example for you.

Thank you always for all of your help.
Lindsey

example:

X=matrix(rnorm(30),10,3)
Y=matrix(rnorm(50),10,5)

forward.sel.par - function(Y, X, alpha = 0.05, K = nrow(X)-1,
R2thresh = 0.99, R2more = 0.001, adjR2thresh = 0.99, Yscale = FALSE,
verbose=TRUE)
##
## Parametric forward selection of explanatory variables in regression and RDA.
## Y is the response, X is the table of explanatory variables.
##
## If Y is univariate, this function implements FS in regression.
## If Y is multivariate, this function implements FS using the F-test described
## by Miller and Farr (1971). This test requires that
##   -- the Y variables be standardized,
##   -- the error in the response variables be normally distributed
(to be verified by the user).
##
## This function uses 'simpleRDA2' and 'RsquareAdj' developed for
'varpart' in 'vegan'.
##
##Pierre Legendre  Guillaume Blanchet, May 2007
##
## Arguments --
##
## Y Response data matrix with n rows and m columns containing
quantitative variables.
## X Explanatory data matrix with n rows and p columns
containing quantitative variables.
## alpha Significance level. Stop the forward selection procedure
if the p-value of a variable is higher than alpha. The default is
0.05.
## K Maximum number of variables to be selected. The default
is one minus the number of rows.
## R2thresh  Stop the forward selection procedure if the R-square of
the model exceeds the stated value. This parameter can vary from 0.001
to 1.
## R2moreStop the forward selection procedure if the difference in
model R-square with the previous step is lower than R2more. The
default setting is 0.001.
## adjR2thresh Stop the forward selection procedure if the adjusted
R-square of the model exceeds the stated value. This parameter can
take any value (positive or negative) smaller than 1.
## YscaleStandardize the variables in table Y to variance 1. The
default setting is FALSE. The setting is automatically changed to TRUE
if Y contains more than one variable. This is a validity condition for
the parametric test of significance (Miller and Farr 1971).
##
## Reference:
## Miller, J. K., and S. D. Farr. 1971. Bimultivariate redundancy: a
comprehensive measure of
##interbattery relationship. Multivariate Behavioral Research 6: 313-324.

{
  require(vegan)
  FPval - function(R2cum,R2prev,n,mm,p)
## Compute the partial F and p-value after adding a single
explanatory variable to the model.
## In FS, the number of df of the numerator of F is always 1. See
Sokal  Rohlf 1995, eq 16.14.
##
## The amendment, based on Miller and Farr (1971), consists in
multiplying the numerator and
## denominator df by 'p', the number of variables in Y, when
computing the p-value.
##
##Pierre Legendre, May 2007
{
  df2 - (n-1-mm)
  Fstat - ((R2cum-R2prev)*df2) / (1-R2cum)
  pval - pf(Fstat,1*p,df2*p,lower.tail=FALSE)
  return(list(Fstat=Fstat,pval=pval))
}

  Y - as.matrix(Y)
  X - apply(as.matrix(X),2,scale,center=TRUE,scale=TRUE)
  var.names = colnames(as.data.frame(X))
  n - nrow(X)
  m - ncol(X)
  if(nrow(Y) != n) stop(Numbers of rows not the same in Y and X)
  p - ncol(Y)
  if(p  1) {
Yscale = TRUE
if(verbose) cat(The variables in response matrix Y have been
standardized,'\n')
  }
  Y - apply(Y,2,scale,center=TRUE,scale=Yscale)
  SS.Y - sum(Y^2)

  X.out - c(1:m)

  ## Find the first variable X to include in the model
  R2prev - 0
  R2cum - 0
  for(j in 1:m) {
toto - simpleRDA2(Y,X[,j],SS.Y)
if(toto$Rsquare  R2cum) {
  R2cum - toto$Rsquare
  no.sup - j
}
  }
  mm - 1
  FP - FPval(R2cum,R2prev,n,mm,p)
  if(FP$pval = alpha) {
adjRsq - RsquareAdj(R2cum,n,mm)
res1 - var.names[no.sup]
res2 - no.sup
res3 - R2cum
res4 - R2cum
res5 - adjRsq
res6 - FP$Fstat
res7 - FP$pval
X.out[no.sup] - 0
delta - R2cum
  } else {
stop(Procedure stopped (alpha criterion): pvalue for variable
,no.sup, is ,FP$pval)
  }

  ## Add variables X to the model
  while((FP$pval = alpha)  (mm = K)  (R2cum = R2thresh)  (delta
= R2more)  (adjRsq = adjR2thresh)) {
mm - mm+1
R2prev - R2cum
R2cum - 0
for(j in 1:m) {
  if(X.out[j] != 0) {
toto - simpleRDA2(Y,X[,c(res2,j)],SS.Y)
if(toto$Rsquare  R2cum) {
  R2cum - toto$Rsquare
  no.sup - j
}
  }
}
FP - FPval(R2cum,R2prev,n,mm,p)
delta - R2cum-R2prev
adjRsq - RsquareAdj(R2cum,n,mm)
res1 - c(res1,var.names[no.sup])
res2 - c(res2,no.sup)
res3 - c(res3,delta)
res4 - c(res4,R2cum)

Re: [R] test if elements of a character vector contain letters

Hi,

Not sure whether this is you wanted.
x-letters
  (x[1:10] - paste(x[1:10], sample(1:10, 10), sep=''))
 x1-c(x,1:26)


x1
 [1] a4  b3  c5  d2  e9  f6  g1  h8  i10 j7  k   l  
[13] m   n   o   p   q   r   s   t   u   v   w   x  
[25] y   z   1   2   3   4   5   6   7   8   9   10 
[37] 11  12  13  14  15  16  17  18  19  20  21  22 
[49] 23  24  25  26 


 grepl(^[[:alpha:]][[:digit:]],x1)
 [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE FALSE FALSE

A.K.



- Original Message -
From: Liviu Andronic landronim...@gmail.com
To: r-help@r-project.org Help r-help@r-project.org
Cc: 
Sent: Monday, August 6, 2012 12:25 PM
Subject: [R] test if elements of a character vector contain letters

Dear all
I'm pretty sure that I'm approaching the problem in a wrong way.
Suppose the following character vector:
 (x[1:10] - paste(x[1:10], sample(1:10, 10), sep=''))
[1] a10 b7  c2  d3  e6  f1  g5  h8  i9  j4
 x
[1] a10 b7  c2  d3  e6  f1  g5  h8  i9  j4  k
l   m   n
[15] o   p   q   r   s   t   u   v   w   x   y
z   1   2
[29] 3   4   5   6   7   8   9   10  11  12  13
14  15  16
[43] 17  18  19  20  21  22  23  24  25  26


How do you test whether the elements of the vector contain at least
one letter (or at least one digit) and obtain a logical vector of the
same dimension? I came up with the following awkward function:
is_letter - function(x, pattern=c(letters, LETTERS)){
    sapply(x, function(y){
        any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
    })
}

 is_letter(x)
  a10    b7    c2    d3    e6    f1    g5    h8    i9    j4     k
l     m     n     o
TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
TRUE  TRUE  TRUE  TRUE
    p     q     r     s     t     u     v     w     x     y     z
1     2     3     4
TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
FALSE FALSE FALSE FALSE
    5     6     7     8     9    10    11    12    13    14    15
16    17    18    19
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE
   20    21    22    23    24    25    26
FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 is_letter(x, 0:9)  ##function slightly misnamed
  a10    b7    c2    d3    e6    f1    g5    h8    i9    j4     k
l     m     n     o
TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE
FALSE FALSE FALSE FALSE
    p     q     r     s     t     u     v     w     x     y     z
1     2     3     4
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
TRUE  TRUE  TRUE  TRUE
    5     6     7     8     9    10    11    12    13    14    15
16    17    18    19
TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
TRUE  TRUE  TRUE  TRUE
   20    21    22    23    24    25    26
TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE


Is there a nicer way to do this? Regards
Liviu


-- 
Do you know how to read?
http://www.alienetworks.com/srtest.cfm
http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
Do you know how to write?
http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to convert data to 'normal' if they are in the form of standard scientific notations?

2012-08-06 Thread HJ YAN

Dear R users

I read two csv data files into R and  called them Tem1 and Tem5.

For the first column, data in Tem1 has 13 digits where in Tem5 there are 14
digits for each observation.

Originally there are 'numerical' as can be seen in my code below.  But how
can I display/convert them using other form rather than scientific
notations which seems a standard/default?

 I want them to be in the form like '20110911001084', but I'm very confused
why when I used 'as.factor' call it works for my 'Tem1' but not for
'Tem5'...??


Many thanks!

HJ

 Tem1[1:5,1][1] 2.10004e+12 2.10004e+12 2.10004e+12 2.10004e+12 2.10004e+12 
 Tem5[1:5,1][1] 2.011091e+13 2.011091e+13 2.011091e+13 2.011091e+13 
 2.011091e+13 class(Tem1[1:5,1])[1] numeric class(Tem5[1:5,1])[1] 
 numeric as.factor(Tem1[1:5,1])[1] 2.10004e+12 2.10004e+12 2.10004e+12 
 2.10004e+12 2.10004e+12
Levels: 2.10004e+12 as.factor(Tem5[1:5,1])[1] 20110911001084
20110911001084 20110911001084 20110911001084 20110911001084
Levels: 20110911001084

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] test if elements of a character vector contain letters

2012-08-06 Thread Marc Schwartz


On Aug 6, 2012, at 12:06 PM, Marc Schwartz marc_schwa...@me.com wrote:

 Perhaps I am missing something, but why use sapply() when grepl() is already 
 vectorized?
 
 is.letter - function(x) grepl([:alpha:], x)
 is.number - function(x) grepl([:digit:], x)

Sorry, typos in the above from my CP. Should be:

is.letter - function(x) grepl([[:alpha:]], x)
is.number - function(x) grepl([[:digit:]], x)

Marc

 
 x - c(letters, 1:26)
 
 x[1:10] - paste(x[1:10], sample(1:10, 10), sep='')
 
 x - rep(x, 1e3)
 
 str(x)
 chr [1:52000] a2 b10 c8 d3 e6 f1 g5 ...
 
 system.time(is.letter(x))
   user  system elapsed 
  0.011   0.000   0.010 
 
 system.time(is.number(x))
   user  system elapsed 
  0.010   0.000   0.011 
 
 
 Regards,
 
 Marc Schwartz
 
 On Aug 6, 2012, at 11:51 AM, Rui Barradas ruipbarra...@sapo.pt wrote:
 
 Hello,
 
 Fun as an exercise in vectorization. 30 times faster. Don't look, guess.
 
 Gave it up? Ok, here it is.
 
 
 is_letter - function(x, pattern=c(letters, LETTERS)){
   sapply(x, function(y){
   any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
   })
 }
 # test ascii codes, just one loop.
 has_letter - function(x){
   sapply(x, function(y){
   y - as.integer(charToRaw(y))
   any((65 = y  y = 90) | (97 = y  y = 122))
   })
 }
 
 x - c(letters, 1:26)
 x[1:10] - paste(x[1:10], sample(1:10, 10), sep='')
 x - rep(x, 1e3)
 
 t1 - system.time(is_letter(x))
 t2 - system.time(has_letter(x))
 rbind(t1, t2, t1/t2)
  user.self sys.self elapsed user.child sys.child
 t1 15.690   15.74 NANA
 t2  0.5000.50 NANA
  31.38  NaN   31.48 NANA
 
 
 Em 06-08-2012 17:25, Liviu Andronic escreveu:
 Dear all
 I'm pretty sure that I'm approaching the problem in a wrong way.
 Suppose the following character vector:
 (x[1:10] - paste(x[1:10], sample(1:10, 10), sep=''))
 [1] a10 b7  c2  d3  e6  f1  g5  h8  i9  j4
 x
 [1] a10 b7  c2  d3  e6  f1  g5  h8  i9  j4  k
 l   m   n
 [15] o   p   q   r   s   t   u   v   w   x   y
 z   1   2
 [29] 3   4   5   6   7   8   9   10  11  12  13
 14  15  16
 [43] 17  18  19  20  21  22  23  24  25  26
 
 
 How do you test whether the elements of the vector contain at least
 one letter (or at least one digit) and obtain a logical vector of the
 same dimension? I came up with the following awkward function:
 is_letter - function(x, pattern=c(letters, LETTERS)){
sapply(x, function(y){
any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
})
 }
 
 is_letter(x)
  a10b7c2d3e6f1g5h8i9j4 k
 l m n o
 TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 TRUE  TRUE  TRUE  TRUE
p q r s t u v w x y z
 1 2 3 4
 TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 FALSE FALSE FALSE FALSE
5 6 7 8 9101112131415
 16171819
 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 FALSE FALSE FALSE FALSE
   20212223242526
 FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 is_letter(x, 0:9)  ##function slightly misnamed
  a10b7c2d3e6f1g5h8i9j4 k
 l m n o
 TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE
 FALSE FALSE FALSE FALSE
p q r s t u v w x y z
 1 2 3 4
 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 TRUE  TRUE  TRUE  TRUE
5 6 7 8 9101112131415
 16171819
 TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 TRUE  TRUE  TRUE  TRUE
   20212223242526
 TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 
 
 Is there a nicer way to do this? Regards
 Liviu
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Splitting Data Into Different Series

2012-08-06 Thread Henrique Andrade

Dear R Community,

I'm trying to write a loop to split my data into different series. I
need to make a
new matrix (or series) according to the series code.

For instance, every time the code column assumes the value 433 I need to
save date, value, and code into the dados433 matrix.

Please take a look at the following example:

dados - 
matrix(c(2012-01-01,2012-02-01,2012-03-01,2012-04-01,2012-05-01,2012-06-01,

2012-01-01,2012-02-01,2012-03-01,2012-04-01,2012-05-01,2012-06-01,

2012-01-01,2012-02-01,2012-03-01,2012-04-01,2012-05-01,2012-06-01,

0.56,0.45,0.21,0.64,0.36,0.08,152136,153081,155872,158356,162157,166226,

33.47,34.48,35.24,38.42,35.33,34.43,433,433,433,433,433,433,2005,2005,2005,
2005,2005,2005,3939,3939,3939,3939,3939,3939),
nrow=18, ncol=3, byrow=FALSE,

dimnames=list(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18),
c(date, value, code)))

dados433 - matrix(data = NA, nrow = 6, ncol = 3, byrow= FALSE)
dados2005 - matrix(data = NA, nrow = 6, ncol = 3, byrow= FALSE)
dados3939 - matrix(data = NA, nrow = 6, ncol = 3, byrow= FALSE)

for(i in seq(along=dados[,3])) {
if(dados[i,3] == 433) {dados433[i,1:3] - dados[i,1:3]}
}

for(i in seq(along=dados[,3])) {
if(dados[i,3] == 2005) {dados2005[i,1:3] - dados[i,1:3]}
}

for(i in seq(along=dados[,3])) {
if(dados[i,3] == 3939) {dados3939[i,1:3] - dados[i,1:3]}
}

Best regards,
Henrique Andrade

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Fw: test if elements of a character vector contain letters





- Forwarded Message -
From: arun smartpink...@yahoo.com
To: Liviu Andronic landronim...@gmail.com
Cc: R help r-help@r-project.org
Sent: Monday, August 6, 2012 12:56 PM
Subject: Re: [R] test if elements of a character vector contain letters

Hi,

Not sure whether this is you wanted.
x-letters
  (x[1:10] - paste(x[1:10], sample(1:10, 10), sep=''))
 x1-c(x,1:26)


x1
 [1] a4  b3  c5  d2  e9  f6  g1  h8  i10 j7  k   l  
[13] m   n   o   p   q   r   s   t   u   v   w   x  
[25] y   z   1   2   3   4   5   6   7   8   9   10 
[37] 11  12  13  14  15  16  17  18  19  20  21  22 
[49] 23  24  25  26 


 grepl(^[[:alpha:]][[:digit:]],x1)
 [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE FALSE FALSE

A.K.



- Original Message -
From: Liviu Andronic landronim...@gmail.com
To: r-help@r-project.org Help r-help@r-project.org
Cc: 
Sent: Monday, August 6, 2012 12:25 PM
Subject: [R] test if elements of a character vector contain letters

Dear all
I'm pretty sure that I'm approaching the problem in a wrong way.
Suppose the following character vector:
 (x[1:10] - paste(x[1:10], sample(1:10, 10), sep=''))
[1] a10 b7  c2  d3  e6  f1  g5  h8  i9  j4
 x
[1] a10 b7  c2  d3  e6  f1  g5  h8  i9  j4  k
l   m   n
[15] o   p   q   r   s   t   u   v   w   x   y
z   1   2
[29] 3   4   5   6   7   8   9   10  11  12  13
14  15  16
[43] 17  18  19  20  21  22  23  24  25  26


How do you test whether the elements of the vector contain at least
one letter (or at least one digit) and obtain a logical vector of the
same dimension? I came up with the following awkward function:
is_letter - function(x, pattern=c(letters, LETTERS)){
    sapply(x, function(y){
        any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
    })
}

 is_letter(x)
  a10    b7    c2    d3    e6    f1    g5    h8    i9    j4     k
l     m     n     o
TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
TRUE  TRUE  TRUE  TRUE
    p     q     r     s     t     u     v     w     x     y     z
1     2     3     4
TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
FALSE FALSE FALSE FALSE
    5     6     7     8     9    10    11    12    13    14    15
16    17    18    19
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
FALSE FALSE FALSE FALSE
   20    21    22    23    24    25    26
FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 is_letter(x, 0:9)  ##function slightly misnamed
  a10    b7    c2    d3    e6    f1    g5    h8    i9    j4     k
l     m     n     o
TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE
FALSE FALSE FALSE FALSE
    p     q     r     s     t     u     v     w     x     y     z
1     2     3     4
FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
TRUE  TRUE  TRUE  TRUE
    5     6     7     8     9    10    11    12    13    14    15
16    17    18    19
TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
TRUE  TRUE  TRUE  TRUE
   20    21    22    23    24    25    26
TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE


Is there a nicer way to do this? Regards
Liviu


-- 
Do you know how to read?
http://www.alienetworks.com/srtest.cfm
http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
Do you know how to write?
http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] no font could be found for family Arial

On Aug 6, 2012, at 2:01 AM, tibr wrote:

I hope the original poster fixed this a long time ago, but I had the
same

problem and here is how I fixed it:

- go to the application Fontbook
- check if the Arial font has duplicates, and delete them, even if
they are

set to Off
- restart the computer.

Which you would have found with a search of the SIG-Mac list had you
followed the advice that Prof. Ripley gave at the time. I do not
delete both copies of the duplicated fonts when this has occurred on
my machine ... only the ones that were defective.

emmats wrote

I was re-running some code that I hadn't run in a couple of months
to make

barplots in R. I didn't change a single thing in the script, but the
plots wouldn't work this time around. The plot itself (the bars
and axes)
will graph in the window, but no text appears. In the console it
says I

have a number of errors, all of which say no font could be found for
family 'Arial'.
I have not knowingly changed anything in R and I would like to be
able to
make barplots with labels and titles again. Does anyone know how
to fix

this?

--
View this message in context:
http://r.789695.n4.nabble.com/no-font-could-be-found-for-family-Arial-tp3233322p4639257.html
Sent from the R help mailing list archive at Nabble.com.

Please realize that there are multiple R mailing lists and that
posting to this list is the wrong one. The Nabble interface obscures
that fact among many other facts, including where the real Archives are.

David Winsemius, MD
Alameda, CA, USA

Re: [R] trouble with looping for effect of sampling interval increase

2012-08-06 Thread Jean V Adams

You would make it much easier for R-help readers to solve your problem if 
you provided a small example data set with your code, so that we could 
reproduce your results and troubleshoot the issues.

Jean


Naidraug white@wright.edu wrote on 08/05/2012 09:08:25 AM:
 
 I've looked everywhere and tinkered for three days now, so I figure 
asking
 might be good. 
 So here's a general rundown of what I am trying to get my code to do I 
am
 giving you the whole rundown because I need a solution that retain 
certain
 ways of doing things because they give me the information i need. 
 I want to examine the effect of increasing my sampling interval on my 
data.
 Example: what if instead of sampling every hour I sampled every two, oh
 yeah, how about every three?.. etc ad nausea.  How I want to do this is 
to
 take the data I have now, add an index  to it, that contains counters. 
Those
 counters will look something like 1,2,1,2,.. for the first one,
 1,2,3,1,2,3.. for the next one. I have a lot of them, like say a 
thousand...
 Then for each column in the index my loops should start in the first 
column,
 run only the ones, store that, then run the twos, and store that in the 
same
 column of output in a different row. Then move to the next column run 
the
 ones, store in the next column of output, run the twos, store in the 
next
 row of that column, run the threes, etc on out until there is no more. I
 want to use this index for a number of reasons. The first is that after 
this
 I will be going back through and using a different method for 
sub-sampling
 but keeping all else the same. So all I have to do there is change the 
way I
 generate the index. The second is that it allows me to run  many 
subsamples
 and see their range.  So the code I have made, generates my index, and 
does
 the heavy lifting all correctly, as well as my averages, and quartiles, 
but
 a look at the head () of my key output (IntervalBetas)  shows that 
something
 has gone a miss. You have to look close to catch it.  The values 
generated
 for each row of output are identical, this should not be the case, as 
row
 one of the first output column should be generated from all values 
indexed
 by a one in the first column, whereas in column two there are different
 values indexed by the number one. I've checked about everything I can 
think
 of, done print() on my loop sequence things (those little i and j) and
 wiggled about everything. I am flummoxed. I think the bit that is 
messing up
 is in here :
 #Here is the loop for betas from sampling interval increase
  c - WHOLESIZE[2]-1
  for (i in 1:c)
  {
  x - length(unique(index[,i]))
 
  for (j in 1:x) 
  {
 
  data - WHOLE [WHOLE[,x]==j,1]
 
 But also here is the whole code in case I am wrong that that is the 
problem
 area: 
 
 #loop for making index
 
 
  #clean dataset of empty cells
  dataset - na.omit (datasetORIGINAL)
  #how messed up was the data?
  holeyDATA - datasetORIGINAL - dataset
 
  D - dim(dataset)
 
 #what is the smallest sample? 
 tinysample - 100 
 
 
 
 
 #how long is the dataset?
  datalength - length (dataset)
 
 
  #MD - how many divisions
 
 MD - datalength/tinysample
 
  #clear things up for the index loop
  WHOLE - NULL
 index - NULL
  #do the index loop
 
  for (a in 1:MD)
  {
  index - cbind (index, rep (1:a, length = D[1]))
  }
 index - subset(index, select = -c(1) )
 
  #merge dataset and index loop
  WHOLE - cbind (dataset, index)
 
  WHOLESIZE - dim (WHOLE)
 
 #Housekeeping before loops
 IntervalBetas - NULL
 
 
 IntervalBetas - c(NA,NA)
 IntervalBetas - as.data.frame (IntervalBetas)
 IntervalLowerQ - NULL
 IntervalUpperQ - NULL
 IntervalMean - NULL
 IntervalMedian - NULL
 
 #Here is the loop for betas from sampling interval increase
  c - WHOLESIZE[2]-1
  for (i in 1:c)
  {
  x - length(unique(index[,i]))
 
  for (j in 1:x) 
  {
 
  data - WHOLE [WHOLE[,x]==j,1]
 
 
 
 
  #get power spectral density
 
  PSDPLOT - spectrum (data, detrend = TRUE, plot = FALSE)
  frequency - PSDPLOT$freq
  PSD - PSDPLOT$spec
  #log transform the power spectral density 
  Logfrequency - log(frequency)
  LogPSD- log(PSD)
  #fit my line to the data 
  Line - lm (LogPSD ~ Logfrequency)
  #store the slope of the line
  Betas - rbind (Betas, -coef(Line)[2])
 
 #Get values on the curve shape
 BSkew - skew (Betas)
 BMean - mean (Betas)
 BMedian - median (Betas)
 Q - quantile (Betas) 
 
 
 #store curve shape values
 IntervalLowerQ - rbind (IntervalLowerQ , Q[2]) 
 IntervalUpperQ - rbind (IntervalUpperQ , Q[4]) 
 IntervalSkew - rbind (IntervalSkew , BSkew) 
 IntervalMean - rbind (IntervalMean , BMean)
 IntervalMedian - rbind (IntervalMedian , BMedian)
 
 #Store the Betas
 #This is a pain
 
 
 BetaSave - Betas 
 no.r - nrow(IntervalBetas)
 l.v - length(BetaSave)
 difer - no.r - l.v
 difers - abs(difer)
 if (no.r  l.v){ 
 IntervalBetas - rbind(IntervalBetas,rep(NA,difers))
 }
 else {
 (BetaSave - rbind(BetaSave,rep(NA,difers)))
 }
 
 IntervalBetas - cbind (IntervalBetas, BetaSave)

Re: [R] test if elements of a character vector contain letters

2012-08-06 Thread David L Carlson

Only an extra set of brackets:

is.letter - function(x) grepl([[:alpha:]], x)
is.number - function(x) grepl([[:digit:]], x)

Without them, the functions are fast, but wrong.

 x
 [1] a8  b5  c10 d1  e6  f2  g4  h3  i7  j9  k   l  
[13] m   n   o   p   q   r   s   t   u   v   w   x  
[25] y   z   1   2   3   4   5   6   7   8   9   10 
[37] 11  12  13  14  15  16  17  18  19  20  21  22 
[49] 23  24  25  26 
 is.letter - function(x) grepl([:alpha:], x)
 is.letter(x)
 [1]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE  TRUE
[13] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE FALSE FALSE
 is.letter - function(x) grepl([[:alpha:]], x)
 is.letter(x)
 [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[13]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[25]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE FALSE FALSE 

--
David L Carlson
Associate Professor of Anthropology
Texas AM University
College Station, TX 77843-4352

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-bounces@r-
 project.org] On Behalf Of Marc Schwartz
 Sent: Monday, August 06, 2012 12:07 PM
 To: Rui Barradas
 Cc: r-help
 Subject: Re: [R] test if elements of a character vector contain letters
 
 Perhaps I am missing something, but why use sapply() when grepl() is
 already vectorized?
 
 is.letter - function(x) grepl([:alpha:], x)
 is.number - function(x) grepl([:digit:], x)
 
 x - c(letters, 1:26)
 
 x[1:10] - paste(x[1:10], sample(1:10, 10), sep='')
 
 x - rep(x, 1e3)
 
  str(x)
  chr [1:52000] a2 b10 c8 d3 e6 f1 g5 ...
 
  system.time(is.letter(x))
user  system elapsed
   0.011   0.000   0.010
 
  system.time(is.number(x))
user  system elapsed
   0.010   0.000   0.011
 
 
 Regards,
 
 Marc Schwartz
 
 On Aug 6, 2012, at 11:51 AM, Rui Barradas ruipbarra...@sapo.pt wrote:
 
  Hello,
 
  Fun as an exercise in vectorization. 30 times faster. Don't look,
 guess.
 
  Gave it up? Ok, here it is.
 
 
  is_letter - function(x, pattern=c(letters, LETTERS)){
 sapply(x, function(y){
 any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
 })
  }
  # test ascii codes, just one loop.
  has_letter - function(x){
 sapply(x, function(y){
 y - as.integer(charToRaw(y))
 any((65 = y  y = 90) | (97 = y  y = 122))
 })
  }
 
  x - c(letters, 1:26)
  x[1:10] - paste(x[1:10], sample(1:10, 10), sep='')
  x - rep(x, 1e3)
 
  t1 - system.time(is_letter(x))
  t2 - system.time(has_letter(x))
  rbind(t1, t2, t1/t2)
user.self sys.self elapsed user.child sys.child
  t1 15.690   15.74 NANA
  t2  0.5000.50 NANA
31.38  NaN   31.48 NANA
 
 
  Em 06-08-2012 17:25, Liviu Andronic escreveu:
  Dear all
  I'm pretty sure that I'm approaching the problem in a wrong way.
  Suppose the following character vector:
  (x[1:10] - paste(x[1:10], sample(1:10, 10), sep=''))
   [1] a10 b7  c2  d3  e6  f1  g5  h8  i9  j4
  x
   [1] a10 b7  c2  d3  e6  f1  g5  h8  i9  j4  k
  l   m   n
  [15] o   p   q   r   s   t   u   v   w   x   y
  z   1   2
  [29] 3   4   5   6   7   8   9   10  11  12
 13
  14  15  16
  [43] 17  18  19  20  21  22  23  24  25  26
 
 
  How do you test whether the elements of the vector contain at least
  one letter (or at least one digit) and obtain a logical vector of
 the
  same dimension? I came up with the following awkward function:
  is_letter - function(x, pattern=c(letters, LETTERS)){
  sapply(x, function(y){
  any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
  })
  }
 
  is_letter(x)
a10b7c2d3e6f1g5h8i9j4 k
  l m n o
   TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
  TRUE  TRUE  TRUE  TRUE
  p q r s t u v w x y z
  1 2 3 4
   TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
  FALSE FALSE FALSE FALSE
  5 6 7 8 9101112131415
  16171819
  FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
  FALSE FALSE FALSE FALSE
 20212223242526
  FALSE FALSE FALSE FALSE FALSE FALSE FALSE
  is_letter(x, 0:9)  ##function slightly misnamed
a10b7c2d3e6f1g5h8i9j4 k
  l m n o
   TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE
  FALSE FALSE FALSE FALSE
  p q r s t u v w x y z
  1 2 3 4
  FALSE FALSE FALSE FALSE

Re: [R] bibtex::read.bib -- extracting bibentry keys

2012-08-06 Thread Michael Friendly


On 8/6/2012 11:54 AM, Achim Zeileis wrote:

On Mon, 6 Aug 2012, Michael Friendly wrote:

I have two versions of a bibtex database which have gotten badly out 
of sync. I need to find find all the entries in bib2 which are not 
contained in bib1, according to their bibtex keys. But I can't figure 
out how to extract a list of the bibentry keys in these databases.


read.bib() returns a bibentry object so you can simply do this as usual
for bibentry objects with $key:
One thing that was confusing was that read.bib returns a bibentry 
object, all of whose

elements are also bibentry objects.


x - read.bib(...)
x$key

or maybe

unlist(x$key)

Whatever is more convenient for you. See ?bibentry for more details.
That is what I was missing -- it would have helped to find a link to 
utils::bibentry in the [rather scanty] documentation for

read.bib. I'm now a happy camper in this regard. What I wanted is given by:

bib1 - read.bib(C:/localtexmf/bibtex/bib/timeref.bib)
length(bib1)
keys1 - unlist(bib1$key)

bib2 - read.bib(W:/texmf/bibtex/bib/timeref.bib)
length(bib2)
keys2 - unlist(bib2$key)


 which(! keys1 %in% keys2)
[1] 133 249 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 
627 628

 keys1[which(! keys1 %in% keys2)]
[1] Langren:1646 Fisher:1915a Stigler:2012
[4] Wainer:2011 Minard:1860a CNAM:1906
[7] Wainer:2012 Wainer-Ramsay:2010 Stephenson-Galneder:1969
[10] Waters:1964 Agathe:1988 Gascoigne:2007
[13] Krzywinski:2009 Bolle:1929 Balbi:1829
[16] Bills-Li:2005 Lewi:2006 Fletcher:1851
[19] Perrot:1976


As a side note, I searched extensively for bibtex tools that would help 
me resolve the differences between two
related bibtex files, but none was as simple as this, once I could get 
the keys. Thanks to Roman for providing this

infrastructure!

So, ignoring for now differences in the contents of the bibentries, a 
useful tool for my purpose is bibdiff(),


bibdiff - function(bib1, bib2) {
keys1 - unlist(bib1$key)
keys2 - unlist(bib2$key)
only1 - keys1[which(! keys1 %in% keys2)]
only2 - keys2[which(! keys2 %in% keys1)]
cat(Only in bib1:\n)
print(only1)
cat(Only in bib2:\n)
print(only2)
}

 bibdiff(bib1, bib2)
Only in bib1:
[1] Langren:1646 Fisher:1915a Stigler:2012
[4] Wainer:2011 Minard:1860a CNAM:1906
[7] Wainer:2012 Wainer-Ramsay:2010 Stephenson-Galneder:1969
[10] Waters:1964 Agathe:1988 Gascoigne:2007
[13] Krzywinski:2009 Bolle:1929 Balbi:1829
[16] Bills-Li:2005 Lewi:2006 Fletcher:1851
[19] Perrot:1976
Only in bib2:
[1] Langren:1644 Quetelet:1842


which gives me the complete answer, as far as it goes.



A minor question: Is there someway to prevent read.bib from ignoring 
entries that do not contain all required fields?


Also not really an issue with read.bib itself. read.bib() wants to 
return a bibentry object but bibentry() just allows to create 
objects that are valid BibTeX, i.e., have all required fields.


It turns out that read.bib seems to be pickier than bibtex itself -- it 
does not accommodate crossref= fields, used for

InCollection items; these resolve correctly using bibtex.
For some books in my database, the publisher is unknown. bibtex generates
warnings (I think) and does include the references. It would be nicer if 
there was an argument to read.bib, e.g.,
strict = {T/F} where strict=FALSE would allow entries not containing all 
required fields. But perhaps that's buried

too deep in the implementation.

 bib1 - read.bib(C:/localtexmf/bibtex/bib/timeref.bib)
ignoring entry 'Donoho-etal:1988' (line 40) because :
A bibentry of bibtype ‘InCollection’ has to correctly specify the 
field(s): booktitle


ignoring entry 'Martonne:1919:map' (line 90) because :
A bibentry of bibtype ‘InCollection’ has to correctly specify the 
field(s): booktitle, publisher, year


ignoring entry 'Touraine:2002' (line 5423) because :
A bibentry of bibtype ‘Book’ has to correctly specify the field(s): 
publisher


ignoring entry 'Cotes:1722' (line 6004) because :
A bibentry of bibtype ‘Book’ has to correctly specify the field(s): 
publisher


ignoring entry 'Quetelet:1842' (line 6605) because :
A bibentry of bibtype ‘Book’ has to correctly specify the field(s): 
publisher


ignoring entry 'Wenzlick:1950' (line 6663) because :
A bibentry of bibtype ‘Unpublished’ has to correctly specify the 
field(s): note


ignoring entry 'Verniquet:1791' (line 6695) because :
A bibentry of bibtype ‘Book’ has to correctly specify the field(s): 
publisher


 length(bib1)
[1] 628


A suggestion: it would be nice if bibtex provided some extractor 
functions for bibentry fields.


So that only a subset of fields is read as opposed to all fields?

If you read all fields, you can easily subset afterwards (again using 
$-notation).


No, it was only lack of documentation, and perhaps an example or two for 
read.bib that caused me to

stumble.


hth,
Z



--
Michael Friendly Email: friendly AT yorku DOT ca
Professor, Psychology Dept.
York University  Voice: 416 736-2100 x66249 Fax: 416 736-5814
4700 Keele Street

Re: [R] sapply() and by()

2012-08-06 Thread Jean V Adams

Dominic,

It's great that you provided some example data, but a much smaller data 
frame would have sufficed.  For example, 10 randomly selected rows from 
your data ...

LF - structure(list(Serra.da.Foladoira = c(27.335652173913, 
25.4632608695652, 
24.464652173913, 22.550652173913, 22.2177826086956, 29.3744782608695, 
24.1317826086956, 25.5464782608695, 27.7517391304348, 25.172), 
Santiago = c(32.6199565217391, 27.9597826086956, 32.7863913043478, 
25.2136086956521, 23.7573043478261, 32.6199565217391, 
28.6671304347826, 
27.9597826086956, 29.7489565217391, 23.5492608695652), Sergude = 
c(31.7877826086956, 
27.4604782608695, 26.1706086956521, 25.8377391304348, 
26.5034782608695, 
33.2856956521739, 30.4979130434782, 30.7059565217391, 
30.8307826086956, 
31.9542173913043), Rio.Do.Sol = c(30.3730869565217, 25.7545217391304, 
25.421652173913, 24.1317826086956, 23.4660434782608, 31.1220434782608, 

25.8377391304348, 25.8793478260869, 30.7059565217391, 24.464652173913
), V5 = c(10L, 2L, 2L, 11L, 3L, 8L, 8L, 3L, 8L, 6L)), .Names = 
c(Serra.da.Foladoira, 
Santiago, Sergude, Rio.Do.Sol, V5), row.names = c(1017L, 
778L, 400L, 1403L, 86L, 1311L, 598L, 1536L, 605L, 520L), class = 
data.frame)

Try this code to calculate the mean of each of the first four columns for 
each value of the fifth column ...

aggregate(LF[, 1:4], list(month=LF$V5), mean)

The sapply() approach doesn't have a built in by type of argument.

Jean


Dominic Roye dominic.r...@gmail.com wrote on 08/06/2012 09:34:58 AM:
 
 Hello everyone,
 
 
 I have a dataset with 5 colums (4 colums with thresholds of weather
 stations and one with month - data of 5 years). Now I would like to
 calculate the average for each month.
 
 I tried this unsuccessfully:
 
 lf.med - sapply(LF[,1:4],mean,LF[,5])
 
 Error in mean.default(X[[1L]], ...) :
   'trim' must be numeric and have length 1
 
 
 With
 
 lf.med - by(LF[,1:4],LF[,5],mean)
 
 It works, but its deprecated.
 
 
 
 Any help is greatly appreciated!!!  Thanky everybody`!!
 
 Dominic
 
  dput(LC)
 structure(list(Serra.da.Foladoira = c(21.1359565217391, 
21.7184782608695,
 23.5492608695652, 23.4660434782608, 23.6740869565217, 21.1775652173913,

 SNIPPED 

 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L,
 12L, 12L)), row.names = c(NA, -1826L), .Names = c(Serra.da.Foladoira,
 Santiago, Sergude, Rio.Do.Sol, V5), class = data.frame)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sapply() and by()



On Aug 6, 2012, at 7:34 AM, Dominic Roye wrote:


Hello everyone,


I have a dataset with 5 colums (4 colums with thresholds of weather
stations and one with month - data of 5 years). Now I would like to
calculate the average for each month.

I tried this unsuccessfully:

lf.med - sapply(LF[,1:4],mean,)


If you want to group calculations within categories then sapply is not  
the right function to turn to immediately. Use one of 'aggregate',  
'tapply' or 'ave'.



Error in mean.default(X[[1L]], ...) :
 'trim' must be numeric and have length 1


It is telling you that the unnamed third argument was matched to the  
'trim' parameter of the function 'mean'.



Perhaps:

aggregate( LF[,1:4], list(LF[,5]), mean)




With

lf.med - by(LF[,1:4],LF[,5],mean)

It works, but its deprecated.


Actually what is deprecated is the function `mean.data.frame`.




Any help is greatly appreciated!!!  Thanky everybody`!!



Minimal example. PLEASE.


Dominic


dput(LC)


Please do note that you offered an object 'LC' but you code referred  
to 'LF'.


structure(list(Serra.da.Foladoira = c(21.1359565217391,  
21.7184782608695,
23.5492608695652, 23.4660434782608, 23.6740869565217,  
21.1775652173913,
19.8460869565217, 23.3412173913043, 22.8835217391304,  
24.3398260869565,


snipped 1800+ length vector





[[alternative HTML version deleted]]





and provide commented, minimal, self-contained, reproducible code.



--
David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to convert data to 'normal' if they are in the form of standard scientific notations?

2012-08-06 Thread Jean V Adams

HJ,

You don't provide any reproducible code, so I had to make up my own. 

dat - data.frame(a=letters[1:5], x=c(20110911001084, 20110911001084, 
20110911001084, 20110911001084, 20110911001084),
y=c(2.10004e+12, 2.10004e+12, 2.10004e+12, 2.10004e+12, 
2.10004e+12))

In my example, the long numbers print out without scientific notation.

dat
  a  x y
1 a 20110911001084 210004000
2 b 20110911001084 210004000
3 c 20110911001084 210004000
4 d 20110911001084 210004000
5 e 20110911001084 210004000

I can make it print with scientific notation using the digits argument to 
the print() function.

print(dat, digits=3)
  ax   y
1 a 2.01e+13 2.1e+12
2 b 2.01e+13 2.1e+12
3 c 2.01e+13 2.1e+12
4 d 2.01e+13 2.1e+12
5 e 2.01e+13 2.1e+12

What is your default number of digits?
getOption(digits)

Jean


HJ YAN yhj...@googlemail.com wrote on 08/06/2012 11:14:17 AM:
 
 Dear R users
 
 I read two csv data files into R and  called them Tem1 and Tem5.
 
 For the first column, data in Tem1 has 13 digits where in Tem5 there are 
14
 digits for each observation.
 
 Originally there are 'numerical' as can be seen in my code below.  But 
how
 can I display/convert them using other form rather than scientific
 notations which seems a standard/default?
 
  I want them to be in the form like '20110911001084', but I'm very 
confused
 why when I used 'as.factor' call it works for my 'Tem1' but not for
 'Tem5'...??
 
 
 Many thanks!
 
 HJ
 
  Tem1[1:5,1][1] 2.10004e+12 2.10004e+12 2.10004e+12 2.10004e+12 2.
 10004e+12 Tem5[1:5,1][1] 2.011091e+13 2.011091e+13 2.011091e+13 2.
 011091e+13 2.011091e+13 class(Tem1[1:5,1])[1] numeric class(Tem5
 [1:5,1])[1] numeric as.factor(Tem1[1:5,1])[1] 2.10004e+12 2.
 10004e+12 2.10004e+12 2.10004e+12 2.10004e+12
 Levels: 2.10004e+12 as.factor(Tem5[1:5,1])[1] 20110911001084
 20110911001084 20110911001084 20110911001084 20110911001084
 Levels: 20110911001084

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] program of matrix



On Aug 6, 2012, at 8:04 AM, hafida wrote:


Hi
can ANY body  help me to programme this formula:

c[lj]  and  c[l'j]   are  matrix

A[j]^-1  is an invertible diagonal matrix

g[ll']=i[ll'] - sum *#from j=1 to k#*  c[lj]c[l'j]A[j]^-1

WHERE

i[ll']= 1/n  sum from i=1 to n  z[il] z[il']

n,k,m  are given.  j=1...k,l,l'=1...m,

it s complicate for me ; hope you can help me
thank you a lot



--
View this message in context: 
http://r.789695.n4.nabble.com/program-of-matrix-tp4639288.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Splitting Data Into Different Series


Hello,

Try the following.

split(data.frame(dados), dados[, code])

Also, it's better to have data like 'dados' in a data.frame, like this 
you would have dates of class Date, and numbers of classes numeric or 
integer:



dados2 - data.frame(dados)
dados2$date - as.Date(dados2$date)
dados2$value - as.numeric(dados2$value)
dados2$code - as.integer(dados2$code)

#See the STRucture
str(dados2)

The code above would be simplified to split(dados2, dados2$code)

And it's also better to keep the result in a list, they are all in one 
place and you can access the components as


result[[ 433 ]]  # etc.

Hope this helps

Rui Barradas

Em 06-08-2012 18:06, Henrique Andrade escreveu:

Dear R Community,

I'm trying to write a loop to split my data into different series. I
need to make a
new matrix (or series) according to the series code.

For instance, every time the code column assumes the value 433 I need to
save date, value, and code into the dados433 matrix.

Please take a look at the following example:

dados - 
matrix(c(2012-01-01,2012-02-01,2012-03-01,2012-04-01,2012-05-01,2012-06-01,

2012-01-01,2012-02-01,2012-03-01,2012-04-01,2012-05-01,2012-06-01,

2012-01-01,2012-02-01,2012-03-01,2012-04-01,2012-05-01,2012-06-01,

0.56,0.45,0.21,0.64,0.36,0.08,152136,153081,155872,158356,162157,166226,

33.47,34.48,35.24,38.42,35.33,34.43,433,433,433,433,433,433,2005,2005,2005,
 2005,2005,2005,3939,3939,3939,3939,3939,3939),
nrow=18, ncol=3, byrow=FALSE,

dimnames=list(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18),
 c(date, value, code)))

dados433 - matrix(data = NA, nrow = 6, ncol = 3, byrow= FALSE)
dados2005 - matrix(data = NA, nrow = 6, ncol = 3, byrow= FALSE)
dados3939 - matrix(data = NA, nrow = 6, ncol = 3, byrow= FALSE)

for(i in seq(along=dados[,3])) {
 if(dados[i,3] == 433) {dados433[i,1:3] - dados[i,1:3]}
}

for(i in seq(along=dados[,3])) {
 if(dados[i,3] == 2005) {dados2005[i,1:3] - dados[i,1:3]}
}

for(i in seq(along=dados[,3])) {
 if(dados[i,3] == 3939) {dados3939[i,1:3] - dados[i,1:3]}
}

Best regards,
Henrique Andrade

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] program of matrix

Well,  I just posted the fourth copy to the list which I apologize  
for. (I meant to delete the response I wrote.)


Re-re-posting an unclear message seems unwise on your part, 'hafida'.  
You are not following the advice in the footer to all messages and you  
are not following the advice in the Positng Guide, so it is no  
surprise that people are not responding.


Read the Posting Guide. From there you should take away lessons:

Learn to use a shift key.
Learn to post your real name and academic or professional affiliation.  
This is a technical mailing list and anonymity will lower people's  
level of willingness to offer advice.

Learn to post R code that constructs a data example.
Learn to provide background.  (.i.e. What are you really trying to do?)
Learn that providing the background regarding why you want to do this  
will reduce the concern that this is just a homework problem.  
(Homework submissions are generally ignored.)


--
David.


On Aug 6, 2012, at 8:04 AM, hafida wrote:


Hi
can ANY body  help me to programme this formula:

c[lj]  and  c[l'j]   are  matrix

A[j]^-1  is an invertible diagonal matrix

g[ll']=i[ll'] - sum *#from j=1 to k#*  c[lj]c[l'j]A[j]^-1

WHERE

i[ll']= 1/n  sum from i=1 to n  z[il] z[il']

n,k,m  are given.  j=1...k,l,l'=1...m,

it s complicate for me ; hope you can help me
thank you a lot

--
View this message in context: 
http://r.789695.n4.nabble.com/program-of-matrix-tp4639288.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Tukey HSD not fully displayed in R console

2012-08-06 Thread John Kane

Would just saving the results onto an object saving it work?

From the help page example

summary(fm1 - aov(breaks ~ wool + tension, data = warpbreaks))
myresults-  TukeyHSD(fm1, tension, ordered = TRUE)

write.table (myresults, file = ksksk)



John Kane
Kingston ON Canada


 -Original Message-
 From: ulrikebraeck...@hotmail.com
 Sent: Mon, 6 Aug 2012 07:19:09 -0700 (PDT)
 To: r-help@r-project.org
 Subject: [R] Tukey HSD not fully displayed in R console
 
 Dear all,
 
 I would like to test the differences in dependent variable X depending on
 2
 grouping variables of each 10 levels.
 I do this with a 2-way ANOVA, followed by a Tukey HSD test (TukeyHSD(x)).
 However, since a lot of combinations are possible with 2 grouping
 variables,
 each of 10 levels, the result of the Tukey test is not fully displayed in
 the console.
 
 I tried to print it as a table (write.table () ) and open it afterwards
 in
 Notepad or print e.g. only the first 30 rows of the result, but both
 without
 success ...
 
 Anyone an idea how I can deal with this problem?
 
 Many thanks,
 
 Ulrike
 
 
 
 --
 View this message in context:
 http://r.789695.n4.nabble.com/Tukey-HSD-not-fully-displayed-in-R-console-tp4639285.html
 Sent from the R help mailing list archive at Nabble.com.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


FREE 3D EARTH SCREENSAVER - Watch the Earth right on your desktop!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] test if elements of a character vector contain letters

On Mon, Aug 6, 2012 at 6:42 PM, Bert Gunter gunter.ber...@gene.com wrote:
 nzchar(x)  !is.na(x)

 No?


It doesn't work for what I need:
 x
 [1] a10 b8  c9  d2  e3  f4  g1  h7  i6  j5  k
l   m   n
[15] o   p   q   r   s   t   u   v   w   x   y
z   1   2
[29] 3   4   5   6   7   8   9   10  11  12  13
14  15  16
[43] 17  18  19  20  21  22  23  24  25  26
 nzchar(x)  !is.na(x)
 [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE
[18] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE
[35] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
TRUE TRUE TRUE TRUE
[52] TRUE


I need to have TRUE when an element contains a letter, and FALSE when
an element contains only numbers. The above returns TRUE for the
entire vector.

Regards
Liviu


 On Mon, Aug 6, 2012 at 9:25 AM, Liviu Andronic landronim...@gmail.com wrote:
 Dear all
 I'm pretty sure that I'm approaching the problem in a wrong way.
 Suppose the following character vector:
 (x[1:10] - paste(x[1:10], sample(1:10, 10), sep=''))
  [1] a10 b7  c2  d3  e6  f1  g5  h8  i9  j4
 x
  [1] a10 b7  c2  d3  e6  f1  g5  h8  i9  j4  k
 l   m   n
 [15] o   p   q   r   s   t   u   v   w   x   y
 z   1   2
 [29] 3   4   5   6   7   8   9   10  11  12  13
 14  15  16
 [43] 17  18  19  20  21  22  23  24  25  26


 How do you test whether the elements of the vector contain at least
 one letter (or at least one digit) and obtain a logical vector of the
 same dimension? I came up with the following awkward function:
 is_letter - function(x, pattern=c(letters, LETTERS)){
 sapply(x, function(y){
 any(sapply(pattern, function(z) grepl(z, y, fixed=T)))
 })
 }

 is_letter(x)
   a10b7c2d3e6f1g5h8i9j4 k
 l m n o
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 TRUE  TRUE  TRUE  TRUE
 p q r s t u v w x y z
 1 2 3 4
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 FALSE FALSE FALSE FALSE
 5 6 7 8 9101112131415
 16171819
 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 FALSE FALSE FALSE FALSE
20212223242526
 FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 is_letter(x, 0:9)  ##function slightly misnamed
   a10b7c2d3e6f1g5h8i9j4 k
 l m n o
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE
 FALSE FALSE FALSE FALSE
 p q r s t u v w x y z
 1 2 3 4
 FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 TRUE  TRUE  TRUE  TRUE
 5 6 7 8 9101112131415
 16171819
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
 TRUE  TRUE  TRUE  TRUE
20212223242526
  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE


 Is there a nicer way to do this? Regards
 Liviu


 --
 Do you know how to read?
 http://www.alienetworks.com/srtest.cfm
 http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
 Do you know how to write?
 http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



 --

 Bert Gunter
 Genentech Nonclinical Biostatistics

 Internal Contact Info:
 Phone: 467-7374
 Website:
 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm



-- 
Do you know how to read?
http://www.alienetworks.com/srtest.cfm
http://goodies.xfce.org/projects/applications/xfce4-dict#speed-reader
Do you know how to write?
http://garbl.home.comcast.net/~garbl/stylemanual/e.htm#e-mail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Overlay Histogram

2012-08-06 Thread li li

Dear all,
  For two sets of random variables, say, x -  rnorm(1000, 10, 10) and  y
- rnorm(1000. 3, 20).
Is there any way to overlay the histograms (and density curves) of x and y
on the plot of y vs. x?
The histogram of x is on the x axis and that of y is on the y axis.
  The density curve here is to approximate the shape of the distribution
and does not have to have area 1.
   Thank you in advance.
  Hannah

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Overlay Histogram

See

example(layout)

for one idea. I think you might also want to look into rug plots.

Best,
Michael

On Mon, Aug 6, 2012 at 2:40 PM, li li hannah@gmail.com wrote:
 Dear all,
   For two sets of random variables, say, x -  rnorm(1000, 10, 10) and  y
 - rnorm(1000. 3, 20).
 Is there any way to overlay the histograms (and density curves) of x and y
 on the plot of y vs. x?
 The histogram of x is on the x axis and that of y is on the y axis.
   The density curve here is to approximate the shape of the distribution
 and does not have to have area 1.
Thank you in advance.
   Hannah

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] test if elements of a character vector contain letters

2012-08-06 Thread Yihui Xie

You probably mean grepl('[a-zA-Z]', x)

Regards,
Yihui
--
Yihui Xie xieyi...@gmail.com
Phone: 515-294-2465 Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA


On Mon, Aug 6, 2012 at 3:29 PM, Liviu Andronic landronim...@gmail.com wrote:
 On Mon, Aug 6, 2012 at 6:42 PM, Bert Gunter gunter.ber...@gene.com wrote:
 nzchar(x)  !is.na(x)

 No?


 It doesn't work for what I need:
 x
  [1] a10 b8  c9  d2  e3  f4  g1  h7  i6  j5  k
 l   m   n
 [15] o   p   q   r   s   t   u   v   w   x   y
 z   1   2
 [29] 3   4   5   6   7   8   9   10  11  12  13
 14  15  16
 [43] 17  18  19  20  21  22  23  24  25  26
 nzchar(x)  !is.na(x)
  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 TRUE TRUE TRUE TRUE
 [18] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 TRUE TRUE TRUE TRUE
 [35] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
 TRUE TRUE TRUE TRUE
 [52] TRUE


 I need to have TRUE when an element contains a letter, and FALSE when
 an element contains only numbers. The above returns TRUE for the
 entire vector.

 Regards
 Liviu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] program of matrix

2012-08-06 Thread hafida

A OK I MAKE A MISTAKE
OK MR DAVID I WILL DO IT
  
THANK  YOU



--
View this message in context: 
http://r.789695.n4.nabble.com/program-of-matrix-tp4639288p4639334.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] program of matrix

2012-08-06 Thread hafida

I CANT FIND ANY ANSWER MR DAVID



--
View this message in context: 
http://r.789695.n4.nabble.com/program-of-matrix-tp4639288p4639332.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Splitting Data Into Different Series

2012-08-06 Thread Henrique Andrade

Dear Rui and Arun,

Thanks a lot for your help. I will test all the proposed solutions ;-)

Best regards,
Henrique Andrade

2012/8/6 Rui Barradas ruipbarra...@sapo.pt:
 Hello,

 Try the following.

 split(data.frame(dados), dados[, code])

 Also, it's better to have data like 'dados' in a data.frame, like this you
 would have dates of class Date, and numbers of classes numeric or integer:


 dados2 - data.frame(dados)
 dados2$date - as.Date(dados2$date)
 dados2$value - as.numeric(dados2$value)
 dados2$code - as.integer(dados2$code)

 #See the STRucture
 str(dados2)

 The code above would be simplified to split(dados2, dados2$code)

 And it's also better to keep the result in a list, they are all in one place
 and you can access the components as

 result[[ 433 ]]  # etc.

 Hope this helps

 Rui Barradas

 Em 06-08-2012 18:06, Henrique Andrade escreveu:

 Dear R Community,

 I'm trying to write a loop to split my data into different series. I
 need to make a
 new matrix (or series) according to the series code.

 For instance, every time the code column assumes the value 433 I need
 to
 save date, value, and code into the dados433 matrix.

 Please take a look at the following example:

 dados -
 matrix(c(2012-01-01,2012-02-01,2012-03-01,2012-04-01,2012-05-01,2012-06-01,


 2012-01-01,2012-02-01,2012-03-01,2012-04-01,2012-05-01,2012-06-01,


 2012-01-01,2012-02-01,2012-03-01,2012-04-01,2012-05-01,2012-06-01,

 0.56,0.45,0.21,0.64,0.36,0.08,152136,153081,155872,158356,162157,166226,


 33.47,34.48,35.24,38.42,35.33,34.43,433,433,433,433,433,433,2005,2005,2005,
  2005,2005,2005,3939,3939,3939,3939,3939,3939),
 nrow=18, ncol=3, byrow=FALSE,

 dimnames=list(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18),
  c(date, value, code)))

 dados433 - matrix(data = NA, nrow = 6, ncol = 3, byrow= FALSE)
 dados2005 - matrix(data = NA, nrow = 6, ncol = 3, byrow= FALSE)
 dados3939 - matrix(data = NA, nrow = 6, ncol = 3, byrow= FALSE)

 for(i in seq(along=dados[,3])) {
  if(dados[i,3] == 433) {dados433[i,1:3] - dados[i,1:3]}
 }

 for(i in seq(along=dados[,3])) {
  if(dados[i,3] == 2005) {dados2005[i,1:3] - dados[i,1:3]}
 }

 for(i in seq(along=dados[,3])) {
  if(dados[i,3] == 3939) {dados3939[i,1:3] - dados[i,1:3]}
 }

 Best regards,
 Henrique Andrade

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Henrique Andrade

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Splitting Data Into Different Series

HI,
You can subset the data


 dados433-subset(dados,dados[,3]==433)
 is.matrix(dados433)
#[1] TRUE
 dados433
  date value  code 
1 2012-01-01 0.56 433
2 2012-02-01 0.45 433
3 2012-03-01 0.21 433
4 2012-04-01 0.64 433
5 2012-05-01 0.36 433
6 2012-06-01 0.08 433

dados2005-subset(dados,dados[,3]==2005)
dados3939-subset(dados,dados[,3]==3939)

#or split the data

dados1-as.data.frame(dados)
dados2-split(dados1,dados1$code)



- Original Message -
From: Henrique Andrade henrique.coe...@gmail.com
To: r-help@r-project.org
Cc: 
Sent: Monday, August 6, 2012 1:06 PM
Subject: [R] Splitting Data Into Different Series

Dear R Community,

I'm trying to write a loop to split my data into different series. I
need to make a
new matrix (or series) according to the series code.

For instance, every time the code column assumes the value 433 I need to
save date, value, and code into the dados433 matrix.

Please take a look at the following example:

dados - 
matrix(c(2012-01-01,2012-02-01,2012-03-01,2012-04-01,2012-05-01,2012-06-01,

2012-01-01,2012-02-01,2012-03-01,2012-04-01,2012-05-01,2012-06-01,

2012-01-01,2012-02-01,2012-03-01,2012-04-01,2012-05-01,2012-06-01,

0.56,0.45,0.21,0.64,0.36,0.08,152136,153081,155872,158356,162157,166226,

33.47,34.48,35.24,38.42,35.33,34.43,433,433,433,433,433,433,2005,2005,2005,
                        2005,2005,2005,3939,3939,3939,3939,3939,3939),
nrow=18, ncol=3, byrow=FALSE,

dimnames=list(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18),
                        c(date, value, code)))

dados433 - matrix(data = NA, nrow = 6, ncol = 3, byrow= FALSE)
dados2005 - matrix(data = NA, nrow = 6, ncol = 3, byrow= FALSE)
dados3939 - matrix(data = NA, nrow = 6, ncol = 3, byrow= FALSE)

for(i in seq(along=dados[,3])) {
    if(dados[i,3] == 433) {dados433[i,1:3] - dados[i,1:3]}
}

for(i in seq(along=dados[,3])) {
    if(dados[i,3] == 2005) {dados2005[i,1:3] - dados[i,1:3]}
}

for(i in seq(along=dados[,3])) {
    if(dados[i,3] == 3939) {dados3939[i,1:3] - dados[i,1:3]}
}

Best regards,
Henrique Andrade

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Overlay Histogram

HI,

#These links 


http://stackoverflow.com/questions/8545035/scatterplot-with-marginal-histograms-in-ggplot2
http://stackoverflow.com/questions/11022675/rotate-histogram-in-r-or-overlay-a-density-in-a-barplot
#  might be helpful for you.
A.K.



- Original Message -
From: li li hannah@gmail.com
To: r-help r-help@r-project.org
Cc: 
Sent: Monday, August 6, 2012 3:40 PM
Subject: [R] Overlay Histogram

Dear all,
  For two sets of random variables, say, x -  rnorm(1000, 10, 10) and  y
- rnorm(1000. 3, 20).
Is there any way to overlay the histograms (and density curves) of x and y
on the plot of y vs. x?
The histogram of x is on the x axis and that of y is on the y axis.
  The density curve here is to approximate the shape of the distribution
and does not have to have area 1.
   Thank you in advance.
      Hannah

    [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] AR vs ARMA model

2012-08-06 Thread Jie

Please find some reference online or textbook. This must be contained in
the model assessment part.
AIC, BIC, rolling prediction/forecasting error might be what you want.

Best wishes,
Jie

On Fri, Aug 3, 2012 at 4:07 AM, Soham soham.tommarvolorid...@gmail.comwrote:

 Hi I am trying to fit a time series data.It gives a AR(2) model using the
 ar
 function and ARMA(1,1) model using autoarmafit function in timsac
 package.How do I know which is the correct underlying model? pls help



 --
 View this message in context:
 http://r.789695.n4.nabble.com/AR-vs-ARMA-model-tp4639015.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.htmlhttp://www.r-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] program of matrix



On Aug 6, 2012, at 11:16 AM, hafida wrote:


I CANT FIND ANY ANSWER MR DAVID


When I suggested that you learn to use the shift key, I was hoping for  
a sparing use of that key, such as at the beginning of sentences. The  
caps-lock key is different than the shift key.


You are also posting to a mailing list whose Posting Guide requests  
that poster include context.



--
View this message in context: 
http://r.789695.n4.nabble.com/program-of-matrix-tp4639288p4639332.html
Sent from the R help mailing list archive at Nabble.com.


--

David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cannot find function simpleRDA2

2012-08-06 Thread Schoenfeld, David Alan,Ph.D.,Biostatistics

Hi,

simpleRDA2 is still in the vegan package, but it is not exported.
I.e., the author only intends it for internal use and he doesn't make
it available to end users directly. If you need to get at it, you can
use

getAnywhere(simpleRDA2)

which will show it.

If you need to make it available to your scripts, you can add the line

simpleRDA2 - vegan:::simpleRDA2 # Note three colons

which will make a copy in your workspace (global environment) that
your functions can access.

Best,
Michael

On Mon, Aug 6, 2012 at 10:40 AM, Lindsey Leigh Sloat
llsl...@email.arizona.edu wrote:
 Hi,

 I am trying to run the command forward.sel.par, however I receive
 the error message: Error: could not find function 'simpleRDA2'. I
 have the vegan library loaded. The documentation on varpart has not
 helped me to understand why I cannot call this function. Maybe I am
 missing something obvious because I am still an 'R' novice.

 Below is a reproducible example for you.

 Thank you always for all of your help.
 Lindsey

 example:

 X=matrix(rnorm(30),10,3)
 Y=matrix(rnorm(50),10,5)

 forward.sel.par - function(Y, X, alpha = 0.05, K = nrow(X)-1,
 R2thresh = 0.99, R2more = 0.001, adjR2thresh = 0.99, Yscale = FALSE,
 verbose=TRUE)
 ##
 ## Parametric forward selection of explanatory variables in regression and 
 RDA.
 ## Y is the response, X is the table of explanatory variables.
 ##
 ## If Y is univariate, this function implements FS in regression.
 ## If Y is multivariate, this function implements FS using the F-test 
 described
 ## by Miller and Farr (1971). This test requires that
 ##   -- the Y variables be standardized,
 ##   -- the error in the response variables be normally distributed
 (to be verified by the user).
 ##
 ## This function uses 'simpleRDA2' and 'RsquareAdj' developed for
 'varpart' in 'vegan'.
 ##
 ##Pierre Legendre  Guillaume Blanchet, May 2007
 ##
 ## Arguments --
 ##
 ## Y Response data matrix with n rows and m columns containing
 quantitative variables.
 ## X Explanatory data matrix with n rows and p columns
 containing quantitative variables.
 ## alpha Significance level. Stop the forward selection procedure
 if the p-value of a variable is higher than alpha. The default is
 0.05.
 ## K Maximum number of variables to be selected. The default
 is one minus the number of rows.
 ## R2thresh  Stop the forward selection procedure if the R-square of
 the model exceeds the stated value. This parameter can vary from 0.001
 to 1.
 ## R2moreStop the forward selection procedure if the difference in
 model R-square with the previous step is lower than R2more. The
 default setting is 0.001.
 ## adjR2thresh Stop the forward selection procedure if the adjusted
 R-square of the model exceeds the stated value. This parameter can
 take any value (positive or negative) smaller than 1.
 ## YscaleStandardize the variables in table Y to variance 1. The
 default setting is FALSE. The setting is automatically changed to TRUE
 if Y contains more than one variable. This is a validity condition for
 the parametric test of significance (Miller and Farr 1971).
 ##
 ## Reference:
 ## Miller, J. K., and S. D. Farr. 1971. Bimultivariate redundancy: a
 comprehensive measure of
 ##interbattery relationship. Multivariate Behavioral Research 6: 313-324.

 {
   require(vegan)
   FPval - function(R2cum,R2prev,n,mm,p)
 ## Compute the partial F and p-value after adding a single
 explanatory variable to the model.
 ## In FS, the number of df of the numerator of F is always 1. See
 Sokal  Rohlf 1995, eq 16.14.
 ##
 ## The amendment, based on Miller and Farr (1971), consists in
 multiplying the numerator and
 ## denominator df by 'p', the number of variables in Y, when
 computing the p-value.
 ##
 ##Pierre Legendre, May 2007
 {
   df2 - (n-1-mm)
   Fstat - ((R2cum-R2prev)*df2) / (1-R2cum)
   pval - pf(Fstat,1*p,df2*p,lower.tail=FALSE)
   return(list(Fstat=Fstat,pval=pval))
 }

   Y - as.matrix(Y)
   X - apply(as.matrix(X),2,scale,center=TRUE,scale=TRUE)
   var.names = colnames(as.data.frame(X))
   n - nrow(X)
   m - ncol(X)
   if(nrow(Y) != n) stop(Numbers of rows not the same in Y and X)
   p - ncol(Y)
   if(p  1) {
 Yscale = TRUE
 if(verbose) cat(The variables in response matrix Y have been
 standardized,'\n')
   }
   Y - apply(Y,2,scale,center=TRUE,scale=Yscale)
   SS.Y - sum(Y^2)

   X.out - c(1:m)

   ## Find the first variable X to include in the model
   R2prev - 0
   R2cum - 0
   for(j in 1:m) {
 toto - simpleRDA2(Y,X[,j],SS.Y)
 if(toto$Rsquare  R2cum) {
   R2cum - toto$Rsquare
   no.sup - j
 }
   }
   mm - 1
   FP - FPval(R2cum,R2prev,n,mm,p)
   if(FP$pval = alpha) {
 adjRsq - RsquareAdj(R2cum,n,mm)
 res1 - var.names[no.sup]
 res2 - no.sup
 res3 - R2cum
 res4 - R2cum
 res5 - adjRsq
 res6 - FP$Fstat
 res7 - FP$pval
 X.out[no.sup] - 0
 delta -

[R] Force evaluation of a symbol when a function is created


I am porting a program in matlab to R,
The problem is that Matlab has a feature where symbols that aren't arguments 
are evaluated immediately.
That is:
Y=3
F=@(x) x*Y

Will yield a function such that F(2)=6.
If later say. Y=4 then F(2) will still equal 6.

R on the other hand has lazy evaluation.
F-function(x){x*Y}
Will do the following
Y=3
F(2)=6
Y=4
F(2)=8.
Does anyone know of away to defeat lazy evaluation in R so that I can easily 
simulate the Matlab behavior.  I know that I can live without this in ordinary 
programming but it would make my port much easier.

Thanks.




The information in this e-mail is intended only for the ...{{dropped:14}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Force evaluation of a symbol when a function is created

2012-08-06 Thread William Dunlap

You could use local(), as in
F - local({
   +Y - 3
   +function(x) x * Y
   +})
   F(7)
   [1] 21
Y - 19
F(5)
  [1] 15

Look into 'environments' for more.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Schoenfeld, David Alan,Ph.D.,Biostatistics
 Sent: Monday, August 06, 2012 2:08 PM
 To: 'r-help@r-project.org'
 Subject: [R] Force evaluation of a symbol when a function is created
 
 
 I am porting a program in matlab to R,
 The problem is that Matlab has a feature where symbols that aren't arguments 
 are
 evaluated immediately.
 That is:
 Y=3
 F=@(x) x*Y
 
 Will yield a function such that F(2)=6.
 If later say. Y=4 then F(2) will still equal 6.
 
 R on the other hand has lazy evaluation.
 F-function(x){x*Y}
 Will do the following
 Y=3
 F(2)=6
 Y=4
 F(2)=8.
 Does anyone know of away to defeat lazy evaluation in R so that I can easily 
 simulate the
 Matlab behavior.  I know that I can live without this in ordinary programming 
 but it would
 make my port much easier.
 
 Thanks.
 
 
 
 
 The information in this e-mail is intended only for the ...{{dropped:14}}
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R: Help xts object Subset Date by Day of the Week

On Sun, Aug 5, 2012 at 4:49 PM, Douglas Karabasz
doug...@sigmamonster.com wrote:
 I have a xts object made of daily closing prices I have acquired using
 quantmod.



 Here is my code:

 library(xts)

 library(quantmod)

 library(lubridate)



 # Gets SPY data

 getSymbols(SPY)

 # Subset Prices to just closing price

 SP500 - Cl(SPY)

 # Show day of the week for each date using 2-6 for monday-friday

 SP500wd - wday(SP500)

 # Add Price and days of week together

 SP500wd - cbind(SP500, SP500wd)

 # subset Monday into one xts object

 SPmon - subset(SP500wd, SP500wd$..2==2)





 I then used the package lubridate to show the days of the week.   Due to the
 requirement of an xts objects to be numeric you will see each day is
 represented as a number so that Monday is =2, Tuesday=3, Wednesday=4,
 Thursday=5, Friday=6, Saturday=7.   Since this is a financial index you will
 only see the numbers 2-6 or Monday-Friday.

 I want to subset the data by using the day column.  I would like some help
 to figure out the best way to accomplish a few objectives.

 1.   Subset the data so that I only show Monday in sequence.  However, I
 do want to make sure that it shows the date, price and the ..2 colum(which
 is the day of week) after Sub setting the data  (I have it done but not sure
 if it is the best way)


I think what you do works, this might also be a one liner:

SPY[format(index(SPY), %a) == Mon, ]

Alternatively

split.default(SPY, format(index(SPY), %a))

creates a list of xts objects split by day of the week (Note you need
split.default here because split.xts does something different)


 2.   Rearrange the object (hopefully without destroying the xts object)
 so that my data lines up like a weekly calendar.   So it would look like the
 follow.

Unfortunately, your formatting got all chewed up by the R-help server,
which doesn't like HTML so I'm not quite sure what you want here.

Possibly some black magic like this?

SPY.CL - Cl(SPY)

length(SPY.CL) - 7*floor(length(SPY.CL)/7)

dim(SPY.CL) - c(length(SPY.CL)/7, 7)

But note that this looses time stamps because each row can only have a
single time stamp.

You might also try

to.weekly()

Cheers,

Michael







 Long Date Monday

 Monday Price

 Monday Day Index

 Long Date Tuesday

 Tuesday Price

 Tuesday Day Index

 Long Date Wednesday

 Wednesday Price

 Wednesday Index

 Long Date Thursday

 Thursday Price

 Thursday Index

 Friday

 Friday Price

 Friday Index


 1/5/2009

 92.85

 2

 1/6/2009

 93.47

 3

 1/7/2009

 90.67

 4

 1/8/2009

 84.4

 5

 1/9/2009

 89.09

 6


 1/12/2009

 86.95

 2

 1/13/2009

 87.11

 3

 1/14/2009

 84.37

 4

 1/15/2009

 91.04

 5

 1/16/2009

 85.06

 6


 MLK Mondy

 MLK Monday

 MLK Monday

 1/20/2009

 80.57

 3

 1/21/2009

 84.05

 4

 1/22/2009

 82.75

 5

 1/23/2009

 83.11

 6


 1/26/2009

 83.68

 2

 1/27/2009

 84.53

 3

 1/28/2009

 87.39

 4

 1/29/2009

 84.55

 5

 1/30/2009

 82.83

 6










































 Thank you,

 Douglas


 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R: Help xts object Subset Date by Day of the Week

On Mon, Aug 6, 2012 at 4:30 PM, R. Michael Weylandt
michael.weyla...@gmail.com wrote:
 On Sun, Aug 5, 2012 at 4:49 PM, Douglas Karabasz
 doug...@sigmamonster.com wrote:
 I have a xts object made of daily closing prices I have acquired using
 quantmod.



 Here is my code:

 library(xts)

 library(quantmod)

 library(lubridate)



 # Gets SPY data

 getSymbols(SPY)

 # Subset Prices to just closing price

 SP500 - Cl(SPY)

 # Show day of the week for each date using 2-6 for monday-friday

 SP500wd - wday(SP500)

 # Add Price and days of week together

 SP500wd - cbind(SP500, SP500wd)

 # subset Monday into one xts object

 SPmon - subset(SP500wd, SP500wd$..2==2)





 I then used the package lubridate to show the days of the week.   Due to the
 requirement of an xts objects to be numeric you will see each day is
 represented as a number so that Monday is =2, Tuesday=3, Wednesday=4,
 Thursday=5, Friday=6, Saturday=7.   Since this is a financial index you will
 only see the numbers 2-6 or Monday-Friday.

 I want to subset the data by using the day column.  I would like some help
 to figure out the best way to accomplish a few objectives.

 1.   Subset the data so that I only show Monday in sequence.  However, I
 do want to make sure that it shows the date, price and the ..2 colum(which
 is the day of week) after Sub setting the data  (I have it done but not sure
 if it is the best way)


 I think what you do works, this might also be a one liner:

 SPY[format(index(SPY), %a) == Mon, ]

 Alternatively

 split.default(SPY, format(index(SPY), %a))

 creates a list of xts objects split by day of the week (Note you need
 split.default here because split.xts does something different)


 2.   Rearrange the object (hopefully without destroying the xts object)
 so that my data lines up like a weekly calendar.   So it would look like the
 follow.

 Unfortunately, your formatting got all chewed up by the R-help server,
 which doesn't like HTML so I'm not quite sure what you want here.

 Possibly some black magic like this?

 SPY.CL - Cl(SPY)

 length(SPY.CL) - 7*floor(length(SPY.CL)/7)

 dim(SPY.CL) - c(length(SPY.CL)/7, 7)

 But note that this looses time stamps because each row can only have a
 single time stamp.

To clarify that's not _why_ that looses the time-stamps (and
xts-ness) but just that it does happen. Technically, it's because
dim-.xts doesn't exist; the reason it doesn't (I'd imagine) is
because of the time stamp thing.

M


 You might also try

 to.weekly()

 Cheers,

 Michael







 Long Date Monday

 Monday Price

 Monday Day Index

 Long Date Tuesday

 Tuesday Price

 Tuesday Day Index

 Long Date Wednesday

 Wednesday Price

 Wednesday Index

 Long Date Thursday

 Thursday Price

 Thursday Index

 Friday

 Friday Price

 Friday Index


 1/5/2009

 92.85

 2

 1/6/2009

 93.47

 3

 1/7/2009

 90.67

 4

 1/8/2009

 84.4

 5

 1/9/2009

 89.09

 6


 1/12/2009

 86.95

 2

 1/13/2009

 87.11

 3

 1/14/2009

 84.37

 4

 1/15/2009

 91.04

 5

 1/16/2009

 85.06

 6


 MLK Mondy

 MLK Monday

 MLK Monday

 1/20/2009

 80.57

 3

 1/21/2009

 84.05

 4

 1/22/2009

 82.75

 5

 1/23/2009

 83.11

 6


 1/26/2009

 83.68

 2

 1/27/2009

 84.53

 3

 1/28/2009

 87.39

 4

 1/29/2009

 84.55

 5

 1/30/2009

 82.83

 6










































 Thank you,

 Douglas


 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] test if elements of a character vector contain letters

2012-08-06 Thread Schoenfeld, David Alan,Ph.D.,Biostatistics

On Mon, Aug 6, 2012 at 7:35 PM, Marc Schwartz marc_schwa...@me.com wrote:
 is.letter - function(x) grepl([[:alpha:]], x)
 is.number - function(x) grepl([[:digit:]], x)



This does exactly what I wanted:
 x
 [1] a10 b8  c9  d2  e3  f4  g1  h7  i6  j5  k
l   m   n
[15] o   p   q   r   s   t   u   v   w   x   y
z   1   2
[29] 3   4   5   6   7   8   9   10  11  12  13
14  15  16
[43] 17  18  19  20  21  22  23  24  25  26
 xb - grepl([[:alpha:]],x)
 x[xb]  ##extract all vector elements that contain a letter
 [1] a10 b8  c9  d2  e3  f4  g1  h7  i6  j5  k
l   m   n
[15] o   p   q   r   s   t   u   v   w   x   y   z
 xb - grepl([[:digit:]],x)
 x[xb]  ##extract all vector elements that contain a digit
 [1] a10 b8  c9  d2  e3  f4  g1  h7  i6  j5  1
2   3   4
[15] 5   6   7   8   9   10  11  12  13  14  15
16  17  18
[29] 19  20  21  22  23  24  25  26

Thanks all for the suggestions! Regards
Liviu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Force evaluation of a symbol when a function is created

2012-08-06 Thread Bert Gunter

Thanks to both: Cute question, clever, informative answer.

However, Bill, I don't think you **quite** answered him, although the
modification needed is completely trivial. Of course, I could never
have figured it out without your response.

Anyway, I interpret the question as asking for the function definition
to _implicitly_ pick up the value of Y at the time the function is
defined, rather than explicitly assigning it in local(). The following
are two essentially identical approaches: I prefer the second, because
it's more transparent to me, but that's just a matter of taste.

Y - 3
F -local({y - Y;function(x)x*y})
G - evalq(function(x)x*y,env=list(y=Y))

Yielding:

 Y - 3
 F -local({y - Y;function(x)x*y})
 G - evalq(function(x)x*y,env=list(y=Y))
 F(5)
[1] 15
 G(5)
[1] 15
 Y - 2
 F(5)
[1] 15
 G(5)
[1] 15
 F -local({y - Y;function(x)x*y})
 G - evalq(function(x)x*y,env=list(y=Y))
 F(5)
[1] 10
 G(5)
[1] 10

Cheers,
Bert

On Mon, Aug 6, 2012 at 2:24 PM, William Dunlap wdun...@tibco.com wrote:
 You could use local(), as in
 F - local({
+Y - 3
+function(x) x * Y
+})
F(7)
[1] 21
 Y - 19
 F(5)
   [1] 15

 Look into 'environments' for more.

 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Schoenfeld, David Alan,Ph.D.,Biostatistics
 Sent: Monday, August 06, 2012 2:08 PM
 To: 'r-help@r-project.org'
 Subject: [R] Force evaluation of a symbol when a function is created


 I am porting a program in matlab to R,
 The problem is that Matlab has a feature where symbols that aren't arguments 
 are
 evaluated immediately.
 That is:
 Y=3
 F=@(x) x*Y

 Will yield a function such that F(2)=6.
 If later say. Y=4 then F(2) will still equal 6.

 R on the other hand has lazy evaluation.
 F-function(x){x*Y}
 Will do the following
 Y=3
 F(2)=6
 Y=4
 F(2)=8.
 Does anyone know of away to defeat lazy evaluation in R so that I can easily 
 simulate the
 Matlab behavior.  I know that I can live without this in ordinary 
 programming but it would
 make my port much easier.

 Thanks.




 The information in this e-mail is intended only for t...{{dropped:26}}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Force evaluation of a symbol when a function is created

2012-08-06 Thread William Dunlap

Both of those approaches require the function to be created
at the same time that the environment containing some of its
bindings is created.  You can also take an existing function and
assign a new environment to it.  E.g.,

   f - function(x) y * x
   ys - c(2,3,5,7,11)
   fs - lapply(ys, function(y) {
  env - new.env(parent=baseenv());
  env[[y]] - y ;
  environment(f) - env ; f })
   # fs is a list of functions, all identical except for their environments, 
which contain 'y'.
   fs[[2]]
  function (x) 
  y * x
  environment: 0x05df1c38
   fs[[2]](10)
  [1] 30
   fs[[3]]
  function (x) 
  y * x
  environment: 0x05def8c0
   fs[[3]](10)
  [1] 50

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: Bert Gunter [mailto:gunter.ber...@gene.com]
 Sent: Monday, August 06, 2012 3:03 PM
 To: William Dunlap
 Cc: Schoenfeld, David Alan,Ph.D.,Biostatistics; r-help@r-project.org
 Subject: Re: [R] Force evaluation of a symbol when a function is created
 
 Thanks to both: Cute question, clever, informative answer.
 
 However, Bill, I don't think you **quite** answered him, although the
 modification needed is completely trivial. Of course, I could never
 have figured it out without your response.
 
 Anyway, I interpret the question as asking for the function definition
 to _implicitly_ pick up the value of Y at the time the function is
 defined, rather than explicitly assigning it in local(). The following
 are two essentially identical approaches: I prefer the second, because
 it's more transparent to me, but that's just a matter of taste.
 
 Y - 3
 F -local({y - Y;function(x)x*y})
 G - evalq(function(x)x*y,env=list(y=Y))
 
 Yielding:
 
  Y - 3
  F -local({y - Y;function(x)x*y})
  G - evalq(function(x)x*y,env=list(y=Y))
  F(5)
 [1] 15
  G(5)
 [1] 15
  Y - 2
  F(5)
 [1] 15
  G(5)
 [1] 15
  F -local({y - Y;function(x)x*y})
  G - evalq(function(x)x*y,env=list(y=Y))
  F(5)
 [1] 10
  G(5)
 [1] 10
 
 Cheers,
 Bert
 
 On Mon, Aug 6, 2012 at 2:24 PM, William Dunlap wdun...@tibco.com wrote:
  You could use local(), as in
  F - local({
 +Y - 3
 +function(x) x * Y
 +})
 F(7)
 [1] 21
  Y - 19
  F(5)
[1] 15
 
  Look into 'environments' for more.
 
  Bill Dunlap
  Spotfire, TIBCO Software
  wdunlap tibco.com
 
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On
 Behalf
  Of Schoenfeld, David Alan,Ph.D.,Biostatistics
  Sent: Monday, August 06, 2012 2:08 PM
  To: 'r-help@r-project.org'
  Subject: [R] Force evaluation of a symbol when a function is created
 
 
  I am porting a program in matlab to R,
  The problem is that Matlab has a feature where symbols that aren't 
  arguments are
  evaluated immediately.
  That is:
  Y=3
  F=@(x) x*Y
 
  Will yield a function such that F(2)=6.
  If later say. Y=4 then F(2) will still equal 6.
 
  R on the other hand has lazy evaluation.
  F-function(x){x*Y}
  Will do the following
  Y=3
  F(2)=6
  Y=4
  F(2)=8.
  Does anyone know of away to defeat lazy evaluation in R so that I can 
  easily simulate
 the
  Matlab behavior.  I know that I can live without this in ordinary 
  programming but it
 would
  make my port much easier.
 
  Thanks.
 
 
 
 
  The information in this e-mail is intended only for the ...{{dropped:14}}
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 --
 
 Bert Gunter
 Genentech Nonclinical Biostatistics
 
 Internal Contact Info:
 Phone: 467-7374
 Website:
 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-
 biostatistics/pdb-ncb-home.htm
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Force evaluation of a symbol when a function is created

Thank you both, this was very helpful.  I need to study environments more. Do 
either of you know a good source?

-Original Message-
From: Bert Gunter [mailto:gunter.ber...@gene.com] 
Sent: Monday, August 06, 2012 6:03 PM
To: William Dunlap
Cc: Schoenfeld, David Alan,Ph.D.,Biostatistics; r-help@r-project.org
Subject: Re: [R] Force evaluation of a symbol when a function is created

Thanks to both: Cute question, clever, informative answer.

However, Bill, I don't think you **quite** answered him, although the
modification needed is completely trivial. Of course, I could never
have figured it out without your response.

Anyway, I interpret the question as asking for the function definition
to _implicitly_ pick up the value of Y at the time the function is
defined, rather than explicitly assigning it in local(). The following
are two essentially identical approaches: I prefer the second, because
it's more transparent to me, but that's just a matter of taste.

Y - 3
F -local({y - Y;function(x)x*y})
G - evalq(function(x)x*y,env=list(y=Y))

Yielding:

 Y - 3
 F -local({y - Y;function(x)x*y})
 G - evalq(function(x)x*y,env=list(y=Y))
 F(5)
[1] 15
 G(5)
[1] 15
 Y - 2
 F(5)
[1] 15
 G(5)
[1] 15
 F -local({y - Y;function(x)x*y})
 G - evalq(function(x)x*y,env=list(y=Y))
 F(5)
[1] 10
 G(5)
[1] 10

Cheers,
Bert

On Mon, Aug 6, 2012 at 2:24 PM, William Dunlap wdun...@tibco.com wrote:
 You could use local(), as in
 F - local({
+Y - 3
+function(x) x * Y
+})
F(7)
[1] 21
 Y - 19
 F(5)
   [1] 15

 Look into 'environments' for more.

 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Schoenfeld, David Alan,Ph.D.,Biostatistics
 Sent: Monday, August 06, 2012 2:08 PM
 To: 'r-help@r-project.org'
 Subject: [R] Force evaluation of a symbol when a function is created

 I am porting a program in matlab to R,
 The problem is that Matlab has a feature where symbols that aren't arguments 
 are
 evaluated immediately.
 That is:
 Y=3
 F=@(x) x*Y

 Will yield a function such that F(2)=6.
 If later say. Y=4 then F(2) will still equal 6.

 R on the other hand has lazy evaluation.
 F-function(x){x*Y}
 Will do the following
 Y=3
 F(2)=6
 Y=4
 F(2)=8.
 Does anyone know of away to defeat lazy evaluation in R so that I can easily 
 simulate the
 Matlab behavior.  I know that I can live without this in ordinary 
 programming but it would
 make my port much easier.

 Thanks.

 The information in this e-mail is intended only for the ...{{dropped:14}}

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 

Bert Gunter
Genentech Nonclinical Biostatistics

Internal Contact Info:
Phone: 467-7374
Website:
http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm

The information in this e-mail is intended only for the person to whom it is
addressed. If you believe this e-mail was sent to you in error and the e-mail
contains patient information, please contact the Partners Compliance HelpLine at
http://www.partners.org/complianceline . If the e-mail was sent to you in error
but does not contain patient information, please contact the sender and properly
dispose of the e-mail.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Correct Place to Seek an R-Project Consultant?

2012-08-06 Thread Vik Rubenfeld

I would like to find out how to apply commands found in the bayesm package, to 
analyze data gathered via a choice-based conjoint study. Is there a web 
resource where I can seek an R-Project consultant experienced in this, who I 
could hire to walk me through the appropriate bayesm commands to use for this 
purpose?

Thanks in advance to all for any info.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Force evaluation of a symbol when a function is created

On Mon, Aug 6, 2012 at 9:03 PM, Schoenfeld, David
Alan,Ph.D.,Biostatistics dschoenf...@partners.org wrote:
 Thank you both, this was very helpful.  I need to study environments more. Do 
 either of you know a good source?

Disclaimer: I really have no idea what I'm talking about.

They are a somewhat subtle, but exceptionally powerful concept: see,
inter alia,

cran.r-project.org/doc/contrib/Fox-Companion/appendix-scope.pdf
http://www.lemnica.com/esotericR/Introducing-Closures/

If you know a little bit of C, it will go a long way in understanding
environments in R. You'll want to (eventually) start to associate R
names with C pointers and environments with symbol tables (hence the
fact the printed environment is just a memory address) , but that's
perhaps a little bit down the road. Environments are different in
their fundamental behavior because of this though: they're the best
way to get pass by reference in R.

Best,
Michael

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] GAM and interpolation?

2012-08-06 Thread Alex Hotmail

Hello fellow R users,

 

I would need your help on GAM/GAMM models and interpolation on a marked
spatial point process  (cases and controls).

I  use the mgcv package to fit a GAMM model with a binary outcome, a
parametric part (var1+..+varn), a spline used for the spatial variation, and
a random effect coded through another spline in this form:

 

gam(outcome~var1+.+varn+s(xlong+ylat)+s(var, bs=re), data=MyData,
family=binomial(link=logit))

 

My purpose is to calculate a risk map adjusted on my covariates to look for
compare and look for obvious differences with a risk map calculated by
kernel ratio.

However...the big deal is to interpolate my model to estimate the risk over
the area of interest, but of course I don't have measurements of the
variables (except geographic coordinates) for the whole area: only for the
individuals in the dataset.

 

I am kind of lost...I have been searching for a couple of days now and I
tried the predict.gam function with the easy type=response and the more
mysterious type=lpmatrix, and other possibilities but cannot find what I
am looking for. I only calculate the risk for my individuals. I thought that
the non-parametric spline component of the GAM/GAMM models could have helped
me interpolate and fill the gaps.

 

Did I miss something big? Are there solutions (without headache) or magical
package I missed?

 

Thank you for any help you could bring !

 

Alex


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Force evaluation of a symbol when a function is created