date:20121103

[R] reorder() in the latticeExtra library

2012-11-03 Thread JDINIS

Hello all, thanks for your time and help. Below are my commands, and it
generates a really nice plot, however I am not happy with the reorder()
function. I would like the order to be the same as they appear in the
genotype variable  genotype - c(CJ1450 NW 4/25/12,CJ1450 BAL
4/25/12,CJ1450 NW 4/27/12,CJ1450 BAL 4/27/12,CJ1721 NW
4/27/12,CJ1721 BAL 4/27/12,CJ1721 NW 4/29/12,CJ1721 BAL 4/29/12 )
and not as it is currently coded.

Is there any way to turn off the reorder, or set it up so the values appear
in the order above, thank you again!

(I am open to all suggestions)

JD


genotype - c(CJ1450 NW 4/25/12,CJ1450 BAL 4/25/12,CJ1450 NW
4/27/12,CJ1450 BAL 4/27/12,CJ1721 NW 4/27/12,CJ1721 BAL
4/27/12,CJ1721 NW 4/29/12,CJ1721 BAL 4/29/12 )
#paste(Animal, as.roman(1:8), sep = -) 
plant.height - c(0.001173003, 0.001506127, 0.001361596, 0.001922572,
0.034272147, 0.030466017, 0.001654299, 0.001071724)
SE - c(0.000444123, 0.000290096, 0.000372844, 0.00197687, 0.033945128,
0.035231568, 0.001094518, 0.000423545) 
lower - plant.height - SE; upper - plant.height + SE 
x - data.frame(group = genotype, lower = lower, est = plant.height, upper =
upper)

library(latticeExtra) 
segplot(reorder(genotype, est) ~ lower + upper, data = x, draw.bands =
FALSE, centers = est, segments.fun = panel.arrows, ends = both, angle =
90, length = 0, par.settings = simpleTheme(pch = 19, col = 1), xlab =
expression(nucleotide diversity  %+-%  sd), panel = function(x, y, z,
...) { 
panel.abline(h = z, col = grey, lty = dashed) 
panel.abline(v = 14.20, col = grey) 
panel.segplot(x, y, z, ...)}) 



--
View this message in context: 
http://r.789695.n4.nabble.com/reorder-in-the-latticeExtra-library-tp4648299.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reorder() in the latticeExtra library

2012-11-03 Thread David Winsemius


On Nov 2, 2012, at 8:04 PM, JDINIS wrote:

 Hello all, thanks for your time and help. Below are my commands, and it
 generates a really nice plot, however I am not happy with the reorder()
 function. I would like the order to be the same as they appear in the
 genotype variable  genotype - c(CJ1450 NW 4/25/12,CJ1450 BAL
 4/25/12,CJ1450 NW 4/27/12,CJ1450 BAL 4/27/12,CJ1721 NW
 4/27/12,CJ1721 BAL 4/27/12,CJ1721 NW 4/29/12,CJ1721 BAL 4/29/12 )
 and not as it is currently coded.
 
 Is there any way to turn off the reorder, or set it up so the values appear
 in the order above, thank you again!
 
 (I am open to all suggestions)
 
 JD
 
 
 genotype - c(CJ1450 NW 4/25/12,CJ1450 BAL 4/25/12,CJ1450 NW
 4/27/12,CJ1450 BAL 4/27/12,CJ1721 NW 4/27/12,CJ1721 BAL
 4/27/12,CJ1721 NW 4/29/12,CJ1721 BAL 4/29/12 )
 #paste(Animal, as.roman(1:8), sep = -) 
 plant.height - c(0.001173003, 0.001506127, 0.001361596, 0.001922572,
 0.034272147, 0.030466017, 0.001654299, 0.001071724)
 SE - c(0.000444123, 0.000290096, 0.000372844, 0.00197687, 0.033945128,
 0.035231568, 0.001094518, 0.000423545) 
 lower - plant.height - SE; upper - plant.height + SE 
 x - data.frame(group = genotype, lower = lower, est = plant.height, upper =
 upper)
 
 library(latticeExtra) 
 segplot(reorder(genotype, est) ~ lower + upper, data = x, draw.bands =
 FALSE, centers = est, segments.fun = panel.arrows, ends = both, angle =
 90, length = 0, par.settings = simpleTheme(pch = 19, col = 1), xlab =
 expression(nucleotide diversity  %+-%  sd), panel = function(x, y, z,
 ...) { 
 panel.abline(h = z, col = grey, lty = dashed) 
 panel.abline(v = 14.20, col = grey) 
 panel.segplot(x, y, z, ...)}) 

Wouldn't you just define genotype as a factor with the desired sequence of 
levels and remove the call to reorder?

-- 
David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] mergeing a large number of large .csvs

2012-11-03 Thread Benjamin Caldwell

Jeff,
If you're willing to educate, I'd be happy to learn what wide vs long
format means. I'll give rbind a shot in the meantime.
Ben
On Nov 2, 2012 4:31 PM, Jeff Newmiller jdnew...@dcn.davis.ca.us wrote:

 I would first confirm that you need the data in wide format... many
 algorithms are more efficient in long format anyway, and rbind is way more
 efficient than merge.

 If you feel this is not negotiable, you may want to consider sqldf. Yes,
 you need to learn a bit of SQL, but it is very well integrated into R.
 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
 Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 ---
 Sent from my phone. Please excuse my brevity.

 Benjamin Caldwell btcaldw...@berkeley.edu wrote:

 Dear R help;
 I'm currently trying to combine a large number (about 30 x 30) of large
 .csvs together (each at least 1 records). They are organized by
 plots,
 hence 30 X 30, with each group of csvs in a folder which corresponds to
 the
 plot. The unmerged csvs all have the same number of columns (5). The
 fifth
 column has a different name for each csv. The number of rows is
 different.
 
 The combined csvs are of course quite large, and the code I'm running
 is
 quite slow - I'm currently running it on a computer with 10 GB ram,
 ssd,
 and quad core 2.3 ghz processor; it's taken 8 hours and it's only  75%
 of
 the way through (it's hung up on one of the largest data groupings now
 for
 an hour, and using 3.5 gigs of RAM.
 
 I know that R isn't the most efficient way of doing this, but I'm not
 familiar with sql or C. I wonder if anyone has suggestions for a
 different
 way to do this in the R environment. For instance, the key function now
 is
 merge, but I haven't tried join from the plyr package or rbind from
 base.
 I'm willing to provide a dropbox link to a couple of these files if
 you'd
 like to see the data. My code is as follows:
 
 
 #multmerge is based on code by Tony cookson,
 
 http://www.r-bloggers.com/merging-multiple-data-files-into-one-data-frame/
 ;
 The function takes a path. This path should be the name of a folder
 that
 contains all of the files you would like to read and merge together and
 only those files you would like to merge.
 
 multmerge = function(mypath){
 filenames=list.files(path=mypath, full.names=TRUE)
 datalist = try(lapply(filenames,
 function(x){read.csv(file=x,header=T)}))
 try(Reduce(function(x,y) {merge(x, y, all=TRUE)}, datalist))
 }
 
 #this function renames files using a fixed list and outputs a .csv
 
 merepk - function (path, nf.name) {
 
 output-multmerge(mypath=path)
 name - list(x, y, z, depth, amplitude)
 try(names(output) - name)
 
 write.csv(output, nf.name)
 }
 
 #assumes all folders are in the same directory, with nothing else there
 
 merge.by.folder - function (folderpath){
 
 foldernames-list.files(path=folderpath)
 n- length(foldernames)
 setwd(folderpath)
 
 for (i in 1:n){
 path-paste(folderpath,foldernames[i], sep=\\)
  nf.name - as.character(paste(foldernames[i],.csv, sep=))
 merepk (path,nf.name)
  }
 }
 
 folderpath - yourpath
 
 merge.by.folder(folderpath)
 
 
 Thanks for looking, and happy friday!
 
 
 
 *Ben Caldwell*
 
 PhD Candidate
 University of California, Berkeley
 
[[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] finding global variables in a function containing formulae

2012-11-03 Thread Gabor Grothendieck

On Thu, Nov 1, 2012 at 2:04 PM, Hafen, Ryan P ryan.ha...@pnnl.gov wrote:
 I need to find all global variables being used in a function and 
 findGlobals() in the codetools package works quite nicely.  However, I am not 
 able to find variables that are used in formulae.  Simply avoiding formulae 
 in functions is not an option because I do not have control over what 
 functions this will be applied to.

 Here is an example to illustrate:

 library(codetools)

 xGlobal - rnorm(10)
 yGlobal - rnorm(10)

 plotFn1 - function() {
plot(yGlobal ~ xGlobal)
 }

 plotFn2 - function() {
y - yGlobal
x - xGlobal
plot(y ~ x)
 }

 plotFn3 - function() {
plot(xGlobal, yGlobal)
 }

 findGlobals(plotFn1, merge=FALSE)$variables
 # character(0)
 findGlobals(plotFn2, merge=FALSE)$variables
 # [1] xGlobal yGlobal
 findGlobals(plotFn3, merge=FALSE)$variables
 # [1] xGlobal yGlobal

 I would like to find that plotFn1 also uses globals xGlobal and yGlobal.  Any 
 suggestions on how I might do this?

If this is only being applied to your own functions then we can have a
convention when writing them to help it in which we declare such
variables so that findGlobals can locate them:


plotFn1 - function() {
   xGlobal; yGlobal
   plot(yGlobal ~ xGlobal)
}

findGlobals(plotFn1)

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] rgl package and animation

2012-11-03 Thread Duncan Murdoch


On 12-11-02 7:47 PM, Robert Baer wrote:

I am trying to figure out how to use rgl package for animation.  It
appears that this is done using the play3d() function.  Below I have
some sample code that plots a 3D path and puts a sphere at the point
farthest from the origin (which in this case also appears to be at the
end of the path).  What I would like to do is animate the movement of
another sphere along the length of the path while simultaneously
rotating the viewport.

Duncan Murdock's (wonderful) Braided Knot YouTube video:
   (http://www.youtube.com/watch?v=prdZWQD7L5c)
makes it clear that such things can be done, but I am having trouble
understanding how to construct the f(time) function that gets passed to
play3d().  The demo(flag) example is a little helpful, but I still can't
quite translate it to my problem.

Can anyone point to some some simple f(time) function examples that I
could use for reference or give me a little hint as to how to construct
f(time) for movement along the path while simultaneously rotating the
viewport?

Thanks,

Rob



library(rgl)
# Generate a 3D path
dat -
structure(list(X = c(0, 0.06181308, 0.002235635,
-0.03080658, -0.1728054, -0.372467, -0.5877065,
-0.8814848, -1.103668, -1.366157, -1.625862, -1.948066,
-2.265388, -2.689826, -3.095001, -3.49749, -3.946068,
-4.395653, -4.772034, -5.111259, -5.410515, -5.649475, -5.73439,
-5.662201, -5.567145, -5.390334, -5.081581, -4.796631,
-4.496559, -4.457024, -4.459564, -4.641746, -4.849105,
-5.0899430001, -5.43129, -5.763724, -6.199448, -6.517578,
-6.864234, -6.907439), Y = c(0, -0.100724,
-0.1694719,
0.036505999886, -0.09299519, -0.222977, -0.3557596,
-0.3658229, -0.3299489, -0.2095574,
-0.08041446,
0.02013388, 0.295372, 0.1388314, 0.2811047,
0.2237614, 0.1419052, 0.06029464,
-0.09330875,
-0.2075969, -0.3286296, -0.4385684,
-0.4691093,
-0.6235059, -0.5254676, -0.568444, -0.6388859,
-0.727356, -1.073769, -1.0321350001, -1.203461, -1.438637,
-1.6502310001, -1.861351, -2.169083, -2.4314730001,
-2.6991430001,
-2.961258, -3.239381, -3.466103), Z = c(0, 0.1355290002,
0.40106200024, 1.216374, 1.5539550003, 1.7308050003,
1.8116760003, 2.185124, 2.5260320004, 3.034794,
3.4265440004, 3.822512, 4.7449040002, 4.644837,
5.4184880002, 5.8586730001, 6.378356, 6.8339540001,
7.216339, 7.5941160004, 7.9559020004, 8.352936, 8.709319,
9.0166930003, 9.4855350003, 9.9000550001, 10.397003,
10.932068, 11.025726, 12.334595, 13.177887, 13.741852, 14.61142,
15.351013, 16.161255, 16.932831, 17.897186, 18.826691, 19.776001,
20.735596), time = c(0, 0.0116, 0.0196,
0.0311, 0.0391, 0.0507,
0.0623,
0.0703, 0.0818, 0.0899,
0.101,
0.109, 0.121, 0.129, 0.141,
0.152, 0.16, 0.172, 0.18, 0.191,
0.199, 0.211, 0.222, 0.23,
0.242, 0.25, 0.262, 0.27, 0.281,
0.289, 0.301, 0.312, 0.32,
0.332, 0.34, 0.351, 0.359,
0.371, 0.379, 0.391)), .Names = c(X,
Y, Z, time), row.names = c(1844, 1845, 1846, 1847,
1848, 1849, 1850, 1851, 1852, 1853, 1854, 1855,
1856, 1857, 1858, 1859, 1860, 1861, 1862, 1863,
1864, 1865, 1866, 1867, 1868, 1869, 1870, 1871,
1872, 1873, 1874, 1875, 1876, 1877, 1878, 1879,
1880, 1881, 1882, 1883), class = data.frame)


# Plot 3d path
with(dat, plot3d(X,Y,Z, type = 'l', col = 'blue', lty = 1))

# get absolute distance from origin
dat$r = sqrt(dat$X ^ 2 + dat$Y ^ 2 + dat$Z ^ 2)
mxpnt = dat[dat$r == mr,] # Coordinates of furthest point

# Plot a blue sphere at max distance
plot3d(mxpnt$X, mxpnt$Y, mxpnt$Z, type = 's', radius = 1, col = 'blue',
add = TRUE)



Your code didn't include the mr variable, but I assume it's just 
max(dat$r).  With that assumption, I'd do the animation function as follows:


First, draw the new sphere at the first point and save the object id:

sphereid - sphere3d(dat[1,c(X, Y, Z)], col=red, radius=1)

# Also save the spinner that you like:

spin - spin3d( ) #maybe with different parms

# Now, the animation function:

f - function(time) {
  par3d(skipRedraw = TRUE) # stops intermediate redraws
  on.exit(par3d(skipRedraw=FALSE)) # redraw at the end

  rgl.pop(id=sphereid) # delete the old sphere
  pt - time %% 40 + 1 # compute which one to draw
  pnt - dat[pt, c(X, Y, Z)] # maybe interpolate instead?
  sphereid - spheres3d(pnt, radius=1,

Re: [R] How to make pch symbols thicker?

2012-11-03 Thread Ben Tupper


On Nov 2, 2012, at 10:06 PM, 21rosit wrote:

 Hi I need to know how to make pch symbols like pch=3 (+) or pch=4(x) or even
 the border of squares or triangles thicker without changing the size. I have
 a lot of symbols of different colors but you can't see the colors clearly
 and I don't want to change the symbol.
 Thanks! 
 
 

Hi,

Have you tried the lwd= argument?  Like this...

plot(1:10, pch = 3, lwd = 4)

Cheers,
Ben


Ben Tupper
Bigelow Laboratory for Ocean Sciences
180 McKown Point Rd. P.O. Box 475
West Boothbay Harbor, Maine   04575-0475 
http://www.bigelow.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Bioconductor, merging annotation with list of probeids

2012-11-03 Thread Brawni

i will sorry! anyway it's a data.frame object. isn't that good?



--
View this message in context: 
http://r.789695.n4.nabble.com/Bioconductor-merging-annotation-with-list-of-probeids-tp4648251p4648305.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reorder() in the latticeExtra library

2012-11-03 Thread Jorge Dinis

Thanks David, I used you suggestion and it worked fine, please see below for 
what I did.

segplot(reorder(factor(genotype), genotype) ~ lower + upper


On Nov 3, 2012, at 2:47 AM, David Winsemius wrote:

 define genotype as a factor


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Contrasts in manova

2012-11-03 Thread paola

Hi everybody

I am trying to find contrast in MANOVA. 
I used next code

contrasts(ffage)-ctr
contrasts(ffage)
MANOVA.agec-manova(Y1~ffage,data=vol18.df)
summary(MANOVA.agec, split =list (ffage=list(0-17 v over 18=0, 18-25 v
over 26=1, 26-31 v over 32=2, 32-42 v over 43=3, 43-65 v 66+=4))) 
  
But the output was only the overall fit

 contrasts(ffage)-ctr
 contrasts(ffage)
  [,1] [,2] [,3] [,4] [,5]
050000
1   -14000
2   -1   -1300
3   -1   -1   -120
4   -1   -1   -1   -11
5   -1   -1   -1   -1   -1
 MANOVA.agec-manova(Y1~ffage,data=vol18.df)
 summary(MANOVA.agec, split =list (ffage=list(0-17 v over 18=0, 18-25 v
 over 26=1, 26-31 v over 32=2, 32-42 v over 43=3, 43-65 v 66+=4))) 
   Df  Pillai approx F num Df den Df   Pr(F)   
ffage   5 0.30607   2.1543 20520 0.002681 **
Residuals 130   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

I try to use linearHypothesis, but i couldnt find anything, maybe I didnt
use in a corect way, *any advice is welcome*

I always get the error
Error in solve.default(wcrossprod(model.matrix(model), w = wts)) : 
  Lapack routine dgesv: system is exactly singular
 

Thanks in advance
Paola



--
View this message in context: 
http://r.789695.n4.nabble.com/Contrasts-in-manova-tp4648306.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Bioconductor, merging annotation with list of probeids

2012-11-03 Thread Uwe Ligges




On 03.11.2012 14:56, Brawni wrote:

i will sorry! anyway it's a data.frame object. isn't that good?



And what are you referring to? I do not see any citation in this message?

A, some Nabble generated mail... Please do read the posting guide to 
this mailing list.


Uwe Ligges




--
View this message in context: 
http://r.789695.n4.nabble.com/Bioconductor-merging-annotation-with-list-of-probeids-tp4648251p4648305.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] mergeing a large number of large .csvs

2012-11-03 Thread Jeff Newmiller

On the absence of any data examples from you per the posting guidelines, I will 
refer you to the help files for the melt function in the reshape2 package.  
Note that there can be various mixtures of wide versus long... such as a wide 
file with one date column and columns representing all stock prices and all 
trade volumes. The longest format would be what melt gives (date, column name, 
and value) but an in-between format would have one distinct column each for 
dollar values and volume values with a column indicating ticker label and of 
course another for date.

If your csv files can be grouped according to those with similar column 
types, then as you read them in you can use cbind( csvlabel=somelabel, 
csvdf) to distinguish it and then rbind those data frames together to create an 
intermediate-width data frame. When dealing with large amounts of data you will 
want to minimize the amount of reshaping you do, but it would require knowledge 
of your data and algorithms to say any more.
---
Jeff NewmillerThe .   .  Go Live...
DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
  Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
/Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
--- 
Sent from my phone. Please excuse my brevity.

Benjamin Caldwell btcaldw...@berkeley.edu wrote:

Jeff,
If you're willing to educate, I'd be happy to learn what wide vs long
format means. I'll give rbind a shot in the meantime.
Ben
On Nov 2, 2012 4:31 PM, Jeff Newmiller jdnew...@dcn.davis.ca.us
wrote:

 I would first confirm that you need the data in wide format... many
 algorithms are more efficient in long format anyway, and rbind is way
more
 efficient than merge.

 If you feel this is not negotiable, you may want to consider sqldf.
Yes,
 you need to learn a bit of SQL, but it is very well integrated into
R.

---
 Jeff NewmillerThe .   .  Go
Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
 Go...
   Live:   OO#.. Dead: OO#.. 
Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#. 
rocks...1k

---
 Sent from my phone. Please excuse my brevity.

 Benjamin Caldwell btcaldw...@berkeley.edu wrote:

 Dear R help;
 I'm currently trying to combine a large number (about 30 x 30) of
large
 .csvs together (each at least 1 records). They are organized by
 plots,
 hence 30 X 30, with each group of csvs in a folder which corresponds
to
 the
 plot. The unmerged csvs all have the same number of columns (5). The
 fifth
 column has a different name for each csv. The number of rows is
 different.
 
 The combined csvs are of course quite large, and the code I'm
running
 is
 quite slow - I'm currently running it on a computer with 10 GB ram,
 ssd,
 and quad core 2.3 ghz processor; it's taken 8 hours and it's only 
75%
 of
 the way through (it's hung up on one of the largest data groupings
now
 for
 an hour, and using 3.5 gigs of RAM.
 
 I know that R isn't the most efficient way of doing this, but I'm
not
 familiar with sql or C. I wonder if anyone has suggestions for a
 different
 way to do this in the R environment. For instance, the key function
now
 is
 merge, but I haven't tried join from the plyr package or rbind from
 base.
 I'm willing to provide a dropbox link to a couple of these files if
 you'd
 like to see the data. My code is as follows:
 
 
 #multmerge is based on code by Tony cookson,
 

http://www.r-bloggers.com/merging-multiple-data-files-into-one-data-frame/
 ;
 The function takes a path. This path should be the name of a folder
 that
 contains all of the files you would like to read and merge together
and
 only those files you would like to merge.
 
 multmerge = function(mypath){
 filenames=list.files(path=mypath, full.names=TRUE)
 datalist = try(lapply(filenames,
 function(x){read.csv(file=x,header=T)}))
 try(Reduce(function(x,y) {merge(x, y, all=TRUE)}, datalist))
 }
 
 #this function renames files using a fixed list and outputs a .csv
 
 merepk - function (path, nf.name) {
 
 output-multmerge(mypath=path)
 name - list(x, y, z, depth, amplitude)
 try(names(output) - name)
 
 write.csv(output, nf.name)
 }
 
 #assumes all folders are in the same directory, with nothing else
there
 
 merge.by.folder - function (folderpath){
 
 foldernames-list.files(path=folderpath)
 n- length(foldernames)
 setwd(folderpath)
 
 for (i in 1:n){
 path-paste(folderpath,foldernames[i],

Re: [R] rgl package and animation

2012-11-03 Thread Robert Baer


On 11/3/2012 6:47 AM, Duncan Murdoch wrote:

On 12-11-02 7:47 PM, Robert Baer wrote:

I am trying to figure out how to use rgl package for animation.  It
appears that this is done using the play3d() function.  Below I have
some sample code that plots a 3D path and puts a sphere at the point
farthest from the origin (which in this case also appears to be at the
end of the path).  What I would like to do is animate the movement of
another sphere along the length of the path while simultaneously
rotating the viewport.

Duncan Murdock's (wonderful) Braided Knot YouTube video:
   (http://www.youtube.com/watch?v=prdZWQD7L5c)
makes it clear that such things can be done, but I am having trouble
understanding how to construct the f(time) function that gets passed to
play3d().  The demo(flag) example is a little helpful, but I still can't
quite translate it to my problem.

Can anyone point to some some simple f(time) function examples that I
could use for reference or give me a little hint as to how to construct
f(time) for movement along the path while simultaneously rotating the
viewport?

Thanks,

Rob



library(rgl)
# Generate a 3D path
dat -
structure(list(X = c(0, 0.06181308, 0.002235635,
-0.03080658, -0.1728054, -0.372467, -0.5877065,
-0.8814848, -1.103668, -1.366157, -1.625862, -1.948066,
-2.265388, -2.689826, -3.095001, -3.49749, -3.946068,
-4.395653, -4.772034, -5.111259, -5.410515, -5.649475, -5.73439,
-5.662201, -5.567145, -5.390334, -5.081581, -4.796631,
-4.496559, -4.457024, -4.459564, -4.641746, -4.849105,
-5.0899430001, -5.43129, -5.763724, -6.199448, -6.517578,
-6.864234, -6.907439), Y = c(0, -0.100724,
-0.1694719,
0.036505999886, -0.09299519, -0.222977, -0.3557596,
-0.3658229, -0.3299489, -0.2095574,
-0.08041446,
0.02013388, 0.295372, 0.1388314, 0.2811047,
0.2237614, 0.1419052, 0.06029464,
-0.09330875,
-0.2075969, -0.3286296, -0.4385684,
-0.4691093,
-0.6235059, -0.5254676, -0.568444, -0.6388859,
-0.727356, -1.073769, -1.0321350001, -1.203461, -1.438637,
-1.6502310001, -1.861351, -2.169083, -2.4314730001,
-2.6991430001,
-2.961258, -3.239381, -3.466103), Z = c(0, 0.1355290002,
0.40106200024, 1.216374, 1.5539550003, 1.7308050003,
1.8116760003, 2.185124, 2.5260320004, 3.034794,
3.4265440004, 3.822512, 4.7449040002, 4.644837,
5.4184880002, 5.8586730001, 6.378356, 6.8339540001,
7.216339, 7.5941160004, 7.9559020004, 8.352936, 
8.709319,

9.0166930003, 9.4855350003, 9.9000550001, 10.397003,
10.932068, 11.025726, 12.334595, 13.177887, 13.741852, 14.61142,
15.351013, 16.161255, 16.932831, 17.897186, 18.826691, 19.776001,
20.735596), time = c(0, 0.0116, 0.0196,
0.0311, 0.0391, 0.0507,
0.0623,
0.0703, 0.0818, 0.0899,
0.101,
0.109, 0.121, 0.129, 
0.141,

0.152, 0.16, 0.172, 0.18, 0.191,
0.199, 0.211, 0.222, 0.23,
0.242, 0.25, 0.262, 0.27, 0.281,
0.289, 0.301, 0.312, 0.32,
0.332, 0.34, 0.351, 0.359,
0.371, 0.379, 0.391)), .Names = 
c(X,

Y, Z, time), row.names = c(1844, 1845, 1846, 1847,
1848, 1849, 1850, 1851, 1852, 1853, 1854, 1855,
1856, 1857, 1858, 1859, 1860, 1861, 1862, 1863,
1864, 1865, 1866, 1867, 1868, 1869, 1870, 1871,
1872, 1873, 1874, 1875, 1876, 1877, 1878, 1879,
1880, 1881, 1882, 1883), class = data.frame)


# Plot 3d path
with(dat, plot3d(X,Y,Z, type = 'l', col = 'blue', lty = 1))

# get absolute distance from origin
dat$r = sqrt(dat$X ^ 2 + dat$Y ^ 2 + dat$Z ^ 2)
mr = max(dat$r)  # yes sorry, didn't get copied to original 
email code

mxpnt = dat[dat$r == mr,] # Coordinates of furthest point

# Plot a blue sphere at max distance
plot3d(mxpnt$X, mxpnt$Y, mxpnt$Z, type = 's', radius = 1, col = 'blue',
add = TRUE)



Your code didn't include the mr variable, but I assume it's just 
max(dat$r).  With that assumption, I'd do the animation function as 
follows:


First, draw the new sphere at the first point and save the object id:

sphereid - sphere3d(dat[1,c(X, Y, Z)], col=red, radius=1)

# Also save the spinner that you like:

spin - spin3d( ) #maybe with different parms

# Now, the animation function:

f - function(time) {
  par3d(skipRedraw = TRUE) # stops intermediate redraws
  on.exit(par3d(skipRedraw=FALSE)) # redraw at the end

  rgl.pop(id=sphereid) # delete the old sphere
  pt - time %% 40

Re: [R] override date in xts time series

2012-11-03 Thread Eric Morway

Hello Arun, 

I too am using R 2.15 and am unable to get the same result as you.  You 
will notice in the R code that follows that when I use 'update' the time 
in the xts object goes haywire.  For example, 2004-04-04 01:15:00 EST 
gets converted to 2004-01-03 22:15:00 PST (see below).  Because I can't 
get the result you showed me in your previous response, and maybe you 
won't be able to get this result, I've resorted back to your other 
suggestion using gsub.  I don't have a good handle on regular expressions 
and was wondering if in the last line of code below, the replace month is 
'hardwired'?  In other words, could \\101\\2 somehow be replaced with 
unique(month(index(x.1)))in the last line of code below so that x.1 is 
providing the replacement month, rather than have it fixed?  Or perhaps 
I've misunderstood the regular expression, which is entirely possible.


sessionInfo()
#R version 2.15.2 (2012-10-26)
#Platform: x86_64-w64-mingw32/x64 (64-bit)

library(xts)
library(lubridate)

x.Date - rep(1/1/2004,times=5)
x.Times- c(01:15:00, 01:30:00, 01:45:00,
   02:00:00, 02:30:00, 03:00:00, 03:15:00)
x-paste(x.Date,x.Times)

y.Date - rep(4/4/2004,times=4)
y.Times- c(01:15:00, 01:30:00, 01:45:00,
   02:00:00, 02:30:00, 03:30:00)

y-paste(y.Date,y.Times)

fmt - %m/%d/%Y %H:%M:%S
x.1-xts(1:7, as.POSIXct(x, format=fmt, tz = EST))
y.1-xts(1:6, as.POSIXct(y, format=fmt, tz = EST))

y.1

#[,1]
#2004-04-04 01:15:001
#2004-04-04 01:30:002
#2004-04-04 01:45:003
#2004-04-04 02:00:004
#2004-04-04 02:30:005
#2004-04-04 03:30:006
#Warning message:
#timezone of object (EST) is different than current timezone (). 

index(y.1)
# 2004-04-04 01:15:00 EST 2004-04-04 01:30:00 EST
# 2004-04-04 01:45:00 EST 2004-04-04 02:00:00 EST
# 2004-04-04 02:30:00 EST 2004-04-04 03:30:00 EST

index(y.1)-update(index(y.1),month=unique(month(index(x.1 
index(y.1)

# 2004-01-03 22:15:00 PST 2004-01-03 22:30:00 PST
# 2004-01-03 22:45:00 PST 2004-01-03 23:00:00 PST
# 2004-01-03 23:30:00 PST 2004-01-04 00:30:00 PST


index(y.1)-as.POSIXct(gsub((.*\\-).*(\\-.*),\\101\\2,index(y.1))) 
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] AUTO: Alan Chalk has left RSA (returning 30/11/2012)

2012-11-03 Thread Alan Chalk

I am out of the office until 30/11/2012.

Please send work related emails to laura.jor...@uk.rsagroup.com or personal
emails to alanch...@gmail.com.


Note: This is an automated response to your message  R-help Digest, Vol
117, Issue 3 sent on 03/11/2012 11:00:07.

This is the only notification you will receive while this person is away.



Please consider the environment - Think before you print 
RSA -The UK's first carbon neutral insurer
***
Royal  Sun Alliance Insurance plc (No. 93792). Registered in England  Wales 
at St. Mark's Court, Chart Way, Horsham, West Sussex, RH12 1XL.
Authorised and Regulated by the Financial Services Authority.  For your 
protection, telephone calls may be recorded and monitored.  The information in 
this e-mail is confidential and may be read, copied or used only by the 
intended recipients. If you have received it in error please contact the sender 
immediately by returning the e-mail. Please delete the e-mail and do not 
disclose any of its contents to anyone. No responsibility is accepted for loss 
or damage arising from viruses or changes made to this message after it was 
sent.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] override date in xts time series

2012-11-03 Thread Eric Morway

Sys.setenv(TZ=GMT) did the trick!  Thank you very much.  I'll continue 
to work the larger problem with this option.

Out of curiosity, however, can the following code be modified so that the 
replacement argument is informed by the month of x.1?:

index(y.1)-as.POSIXct(gsub((.*\\-).*(\\-.*),\\101\\2,index(y.1))) 

Something to the tune of the following seems to work, but is it robust?:

txt-paste(\\10,as.character(unique(month(index(x.1,\\2,sep=)
index(y.1)-as.POSIXct(gsub((.*\\-).*(\\-.*),txt,index(y.1))) 
index(y.1)
# 2004-01-04 01:15:00 PST 2004-01-04 01:30:00 PST
# 2004-01-04 01:45:00 PST 2004-01-04 02:00:00 PST
# 2004-01-04 02:30:00 PST 2004-01-04 03:30:00 PST

What would the gsub 'pattern' string be to replace the day, if I may ask? 
I'm not trying to push my luck, but the gsub approach is new to me and 
don't quite follow everything that is going on.

-Eric
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] override date in xts time series

2012-11-03 Thread arun

Hi,
Sorry, I forgot to answer the second question.
 txt-paste(\\10,unique(month(index(x.1))),\\2,sep=)  #without the 
as.character() also should work
#because
 str(paste(\\10,unique(month(index(x.1))),\\2,sep=)) # it returns a 
character
# chr \\101\\2

#Here too:
str(paste(10,unique(month(index(x.1))),2,sep=))
# chr 1012
#According to the description in paste()
Concatenate vectors after converting to character. 


 as.POSIXct(gsub((.*\\-).*(\\-.*),txt,index(y.1))) 
#[1] 2004-01-04 01:15:00 EST 2004-01-04 01:30:00 EST
#[3] 2004-01-04 01:45:00 EST 2004-01-04 02:00:00 EST
#[5] 2004-01-04 02:30:00 EST 2004-01-04 03:30:00 EST

#Now, suppose if I want to change both the month and day from the original y.1
 index(y.1)-as.POSIXct(gsub((.*\\-).*(\\-).*(\\s.*),\\101\\207\\3,index(y.1)))
 #Here, the month will be 01 and day 07
 y.1
#    [,1]
#2004-01-07 01:15:00    1
#2004-01-07 01:30:00    2
#2004-01-07 01:45:00    3
#2004-01-07 02:00:00    4
#2004-01-07 02:30:00    5
#2004-01-07 03:30:00    6
Hope it helps.
A.K.








From: Eric Morway emor...@usgs.gov
To: arun smartpink...@yahoo.com 
Cc: R help r-help@r-project.org 
Sent: Saturday, November 3, 2012 12:32 PM
Subject: Re: [R] override date in xts time series


Sys.setenv(TZ=GMT) did
the trick!  Thank you very much.  I'll continue to work the larger
problem with this option. 

Out of curiosity, however, can the following
code be modified so that the replacementargument is informed by the month
of x.1?: 

index(y.1)-as.POSIXct(gsub((.*\\-).*(\\-.*),\\101\\2,index(y.1)))  

Something to the tune of the following
seems to work, but is it robust?: 

txt-paste(\\10,as.character(unique(month(index(x.1,\\2,sep=) 
index(y.1)-as.POSIXct(gsub((.*\\-).*(\\-.*),txt,index(y.1)))  
index(y.1) 
# 2004-01-04 01:15:00 PST 2004-01-04
01:30:00 PST 
# 2004-01-04 01:45:00 PST 2004-01-04
02:00:00 PST 
# 2004-01-04 02:30:00 PST 2004-01-04
03:30:00 PST 

What would the gsub 'pattern' string
be to replace the day, if I may ask?  I'm not trying to push my luck,
but the gsub approach is new to me and don't quite follow everything that
is going on. 

-Eric

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Replacing NAs in long format

2012-11-03 Thread Christopher Desjardins

Hi,
I have the following data:

 data[1:20,c(1,2,20)]
idr  schyear year
1   80
1   91
1  10   NA
2   4   NA
2   5   -1
2   60
2   71
2   82
2   93
2  104
2  11   NA
2  126
3   4   NA
3   5   -2
3   6   -1
3   70
3   81
3   92
3  103
3  11   NA

What I want to do is replace the NAs in the year variable with the
following:

idr  schyear year
1   80
1   91
1  10   2
2   4   -2
2   5   -1
2   60
2   71
2   82
2   93
2  104
2  11   5
2  126
3   4   -3
3   5   -2
3   6   -1
3   70
3   81
3   92
3  103
3  11   4

I have no idea how to do this. What it needs to do is make sure that for
each subject (idr) that it either adds a 1 if it is preceded by a value in
year or subtracts a 1 if it comes before a year value.

Does that make sense? I could do this in Excel but I am at a loss for how
to do this in R. Please reply to me as well as the list if you respond.

Thanks!
Chris

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reorder() in the latticeExtra library

2012-11-03 Thread David Winsemius


On Nov 3, 2012, at 6:36 AM, Jorge Dinis wrote:

 Thanks David, I used you suggestion and it worked fine, please see below for 
 what I did.
 
 segplot(reorder(factor(genotype), genotype) ~ lower + upper
Perhaps a missing close-paren . ^
Although reading this as a formatted posting such as you sent might cause a 
registration error.

 
 
 On Nov 3, 2012, at 2:47 AM, David Winsemius wrote:
 
 define genotype as a factor
 
-- 
David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Replacing NAs in long format

2012-11-03 Thread Rui Barradas


Hello,

Try the following. I've called your data.frames 'dat' and 'dat2'

# First your datasets, see ?dput
dput(dat)
structure(list(idr = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), schyear = c(8L, 9L,
10L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L), year = c(0L, 1L, NA, NA, -1L, 0L, 1L, 2L, 3L,
4L, NA, 6L, NA, -2L, -1L, 0L, 1L, 2L, 3L, NA)), .Names = c(idr,
schyear, year), class = data.frame, row.names = c(NA, -20L
))
dput(dat2)
structure(list(idr = c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), schyear = c(8L, 9L,
10L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L), year = c(0L, 1L, 2L, -2L, -1L, 0L, 1L, 2L, 3L,
4L, 5L, 6L, -3L, -2L, -1L, 0L, 1L, 2L, 3L, 4L)), .Names = c(idr,
schyear, year), class = data.frame, row.names = c(NA, -20L
))

# Now the code
fun - function(x){
for(i in which(is.na(x$year))){
if(i == 1)
x$year[i] - x$year[i + 1] - 1L
else
x$year[i] - x$year[i - 1] + 1L
}
x
}

result - do.call(rbind, lapply(split(dat, dat$idr), fun))
all.equal(result, dat2)

Hope this helps,

Rui Barradas
Em 03-11-2012 17:14, Christopher Desjardins escreveu:

Hi,
I have the following data:


data[1:20,c(1,2,20)]

idr  schyear year
1   80
1   91
1  10   NA
2   4   NA
2   5   -1
2   60
2   71
2   82
2   93
2  104
2  11   NA
2  126
3   4   NA
3   5   -2
3   6   -1
3   70
3   81
3   92
3  103
3  11   NA

What I want to do is replace the NAs in the year variable with the
following:

idr  schyear year
1   80
1   91
1  10   2
2   4   -2
2   5   -1
2   60
2   71
2   82
2   93
2  104
2  11   5
2  126
3   4   -3
3   5   -2
3   6   -1
3   70
3   81
3   92
3  103
3  11   4

I have no idea how to do this. What it needs to do is make sure that for
each subject (idr) that it either adds a 1 if it is preceded by a value in
year or subtracts a 1 if it comes before a year value.

Does that make sense? I could do this in Excel but I am at a loss for how
to do this in R. Please reply to me as well as the list if you respond.

Thanks!
Chris

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] finding global variables in a function containing formulae

2012-11-03 Thread William Dunlap

findGlobals must be explicitly ignoring calls to the ~ function.
You could poke through the source code of codetools and find
where this is happening.

Or, if you have the source code for the package you are investigating,
use sed to change all ~ to %TILDE% and then use findGlobals on
the resulting source code.  The messages will be a bit garbled but
should give you a start.  E.g., compare the following two, in which y
is defined in the function but x is not:
findGlobals(function(y)lm(y~x))
  [1] ~  lm
   findGlobals(function(y)lm(y %TILDE% x))
  [1] lm  %TILDE% x

You will get false alarms, since in a call like lm(y~x+z, data=dat) findGlobals
cannot know if dat includes columns called 'x', 'y', and 'z' and the above
approach errs on the side of reporting the potential problem.

You could use code in codetools to analyze S code instead of source code
to globally replace all calls to ~ with calls to %TILDE% but that is more
work than using sed on the source code. 

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Hafen, Ryan P
 Sent: Friday, November 02, 2012 4:28 PM
 To: Bert Gunter
 Cc: r-help@r-project.org
 Subject: Re: [R] finding global variables in a function containing formulae
 
 Thanks.  That works if I a have the formula expression handy.  But suppose
 I want a function, findGlobalVars() that takes a function as an argument
 and finds globals in it, where I have absolutely no idea what is in the
 supplied function:
 
 findGlobalVars - function(f) {
require(codetools)
findGlobals(f, merge=FALSE)$variables
 }
 
 
 findGlobalVars(plotFn1)
 
 I would like findGlobalVars() to be able to find variables in formulae
 that might be present in f.
 
 
 
 
 On 11/1/12 1:19 PM, Bert Gunter gunter.ber...@gene.com wrote:
 
 Does
 
 ?all.vars
 ##as in
  all.vars(y~x)
 [1] y x
 
 help?
 
 -- Bert
 
 On Thu, Nov 1, 2012 at 11:04 AM, Hafen, Ryan P ryan.ha...@pnnl.gov
 wrote:
  I need to find all global variables being used in a function and
 findGlobals() in the codetools package works quite nicely.  However, I
 am not able to find variables that are used in formulae.  Simply
 avoiding formulae in functions is not an option because I do not have
 control over what functions this will be applied to.
 
  Here is an example to illustrate:
 
  library(codetools)
 
  xGlobal - rnorm(10)
  yGlobal - rnorm(10)
 
  plotFn1 - function() {
 plot(yGlobal ~ xGlobal)
  }
 
  plotFn2 - function() {
 y - yGlobal
 x - xGlobal
 plot(y ~ x)
  }
 
  plotFn3 - function() {
 plot(xGlobal, yGlobal)
  }
 
  findGlobals(plotFn1, merge=FALSE)$variables
  # character(0)
  findGlobals(plotFn2, merge=FALSE)$variables
  # [1] xGlobal yGlobal
  findGlobals(plotFn3, merge=FALSE)$variables
  # [1] xGlobal yGlobal
 
  I would like to find that plotFn1 also uses globals xGlobal and
 yGlobal.  Any suggestions on how I might do this?
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 
 --
 
 Bert Gunter
 Genentech Nonclinical Biostatistics
 
 Internal Contact Info:
 Phone: 467-7374
 Website:
 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-bio
 statistics/pdb-ncb-home.htm
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] some help

2012-11-03 Thread dattel_palme

Hi People!

I have following concern consisting of some steps to do in R: 

I have an ascii file (table) consisting of many columns and rows. 
1. I would like to order all values of the columns one under each other. It
will begin with column 1, then column 2 under column 1, column 3 under
column 2 etc. until at the end there is only 1 column. How do I do it?

2. Second problem is to make a scatterplot of two variables (I think after
further explanation scatter plot itself will not be needed). I have two
columns of two different variables (that I produces before), column 1 with
variable 1 and column 2 with variable 2. I would like to order them by one
variable and 0,01 interval (the varibale values will range between 0 and 1).
From each 0,01 interval (100 intervals) i want to pick the maximum and
minimum value of variable 2. 

3. From the obtained max and min of values of each interval i would like to
make a linear least square regression. 

I hope someone can help me out!
Thanks
Stefan



--
View this message in context: 
http://r.789695.n4.nabble.com/some-help-tp4648316.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] reorder() in the latticeExtra library

2012-11-03 Thread arun

Hi,
Try this:
genotype1-factor(genotype,levels=c(CJ1450 NW 4/25/12,CJ1450 BAL 4/25/12, 
CJ1450 NW\n4/27/12, 
CJ1450 BAL 4/27/12, CJ1721 NW 4/27/12, CJ1721 BAL\n4/27/12,
CJ1721 NW 4/29/12, CJ1721 BAL 4/29/12) )

segplot(genotype1 ~ lower + upper, data = x, draw.bands =
FALSE, centers = est, segments.fun = panel.arrows, ends = both, angle =
90, length = 0, par.settings = simpleTheme(pch = 19, col = 1), xlab =
expression(nucleotide diversity  %+-%  sd), panel = function(x, y, z,
...) {
panel.abline(h = z, col = grey, lty = dashed)
panel.abline(v = 14.20, col = grey)
panel.segplot(x, y, z, ...)}) 
A.K.





- Original Message -
From: JDINIS jorgemdi...@gmail.com
To: r-help@r-project.org
Cc: 
Sent: Friday, November 2, 2012 11:04 PM
Subject: [R] reorder()  in the latticeExtra library

Hello all, thanks for your time and help. Below are my commands, and it
generates a really nice plot, however I am not happy with the reorder()
function. I would like the order to be the same as they appear in the
genotype variable  genotype - c(CJ1450 NW 4/25/12,CJ1450 BAL
4/25/12,CJ1450 NW 4/27/12,CJ1450 BAL 4/27/12,CJ1721 NW
4/27/12,CJ1721 BAL 4/27/12,CJ1721 NW 4/29/12,CJ1721 BAL 4/29/12 )
and not as it is currently coded.

Is there any way to turn off the reorder, or set it up so the values appear
in the order above, thank you again!

(I am open to all suggestions)

JD


genotype - c(CJ1450 NW 4/25/12,CJ1450 BAL 4/25/12,CJ1450 NW
4/27/12,CJ1450 BAL 4/27/12,CJ1721 NW 4/27/12,CJ1721 BAL
4/27/12,CJ1721 NW 4/29/12,CJ1721 BAL 4/29/12 )
#paste(Animal, as.roman(1:8), sep = -) 
plant.height - c(0.001173003, 0.001506127, 0.001361596, 0.001922572,
0.034272147, 0.030466017, 0.001654299, 0.001071724)
SE - c(0.000444123, 0.000290096, 0.000372844, 0.00197687, 0.033945128,
0.035231568, 0.001094518, 0.000423545) 
lower - plant.height - SE; upper - plant.height + SE 
x - data.frame(group = genotype, lower = lower, est = plant.height, upper =
upper)

library(latticeExtra) 
segplot(reorder(genotype, est) ~ lower + upper, data = x, draw.bands =
FALSE, centers = est, segments.fun = panel.arrows, ends = both, angle =
90, length = 0, par.settings = simpleTheme(pch = 19, col = 1), xlab =
expression(nucleotide diversity  %+-%  sd), panel = function(x, y, z,
...) { 
panel.abline(h = z, col = grey, lty = dashed) 
panel.abline(v = 14.20, col = grey) 
panel.segplot(x, y, z, ...)}) 



--
View this message in context: 
http://r.789695.n4.nabble.com/reorder-in-the-latticeExtra-library-tp4648299.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] optim .C / Crashing on run

2012-11-03 Thread Paul Browne

Hello,

I am attempting to use optim under the default Nelder-Mead algorithm for
model fitting, minimizing a Chi^2 statistic whose value is determined by a
.C call to an external shared library compiled from C  C++ code.

My problem has been that the R session will immediately crash upon starting
the simplex run, without it taking a single step.

This is strange, as the .C call itself works, is error-free (as far as I
can tell!)  does not return NAN or Inf under any initial starting
parameters that I have tested it with in R. It only ever crashes the R
session when the Chi^2 function to be minimized is called from optim, not
under any other circumstances.

In the interests of reproducibility, I attach R code that reads attached
data files  attempts a N-M optim run. The required shared library
containing the external code (compiled in Ubuntu 12.04 x64 with g++ 4.6.3)
is also attached. Calculating an initial Chi^2 value for a starting set of
model parameters works, then the R session crashes when the optim call is
made.

Is there something I'm perhaps doing wrong in the specification of the
optim run? Is it inadvisable to use external code with optim? There doesn't
seem to be a problem with the external code itself, so I'm very stumped as
to the source of the crashes.
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] to print system.time always

2012-11-03 Thread mrzung

Hi all;

I want to print system.time whenever I execute any command.

It takes too much time to type system.time() function to all command.

is there any solution on it?

And,

apply(matrix,1,cumsum) command is too slow to some large matrix.

is there any function like rowCumSums ?

thank u!



--
View this message in context: 
http://r.789695.n4.nabble.com/to-print-system-time-always-tp4648314.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] override date in xts time series

2012-11-03 Thread arun

HI,

Could you check whether you are getting the same result with tz=GMT?
as.POSIXct(x,format=fmt,tz=GMT)
#[1] 2004-01-01 01:15:00 GMT 2004-01-01 01:30:00 GMT
#[3] 2004-01-01 01:45:00 GMT 2004-01-01 02:00:00 GMT
#[5] 2004-01-01 02:30:00 GMT 2004-01-01 03:00:00 GMT
#[7] 2004-01-01 03:15:00 GMT
 as.POSIXct(y,format=fmt,tz=GMT)
#[1] 2004-04-04 01:15:00 GMT 2004-04-04 01:30:00 GMT
#[3] 2004-04-04 01:45:00 GMT 2004-04-04 02:00:00 GMT
#[5] 2004-04-04 02:30:00 GMT 2004-04-04 03:30:00 GMT

x.1-xts(1:7,as.POSIXct(x,format=fmt,tz=GMT))
 x.1
#    [,1]
#2004-01-01 01:15:00    1
#2004-01-01 01:30:00    2
#2004-01-01 01:45:00    3
#2004-01-01 02:00:00    4
#2004-01-01 02:30:00    5
#2004-01-01 03:00:00    6
#2004-01-01 03:15:00    7
#Warning message:
#timezone of object (GMT) is different than current timezone (). 
 y.1-xts(1:6,as.POSIXct(y,format=fmt,tz=GMT))
 y.1
#    [,1]
#2004-04-04 01:15:00    1
#2004-04-04 01:30:00    2
#2004-04-04 01:45:00    3
#2004-04-04 02:00:00    4
#2004-04-04 02:30:00    5
#2004-04-04 03:30:00    6
#Warning message:
#timezone of object (GMT) is different than current timezone (). 

 update(index(y.1),month=unique(month(index(x.1  
#[1] 2004-01-04 01:15:00 GMT 2004-01-04 01:30:00 GMT
#[3] 2004-01-04 01:45:00 GMT 2004-01-04 02:00:00 GMT
#[5] 2004-01-04 02:30:00 GMT 2004-01-04 03:30:00 GMT


#Here is where the problem occurs


index(y.1)-update(index(y.1),month=unique(month(index(x.1  
y.1
#    [,1]
#2004-01-03 20:15:00    1
#2004-01-03 20:30:00    2
#2004-01-03 20:45:00    3
#2004-01-03 21:00:00    4
#2004-01-03 21:30:00    5
#2004-01-03 22:30:00    6
#Warning message:
#timezone of object (GMT) is different than current timezone ().

#So I am going to change the timezone in the system and see what happens
Sys.setenv(TZ=GMT)

y.1-xts(1:6,as.POSIXct(y,format=fmt,tz=GMT))
update(index(y.1),month=unique(month(index(x.1  
#[1] 2004-01-04 01:15:00 GMT 2004-01-04 01:30:00 GMT
#[3] 2004-01-04 01:45:00 GMT 2004-01-04 02:00:00 GMT
#[5] 2004-01-04 02:30:00 GMT 2004-01-04 03:30:00 GMT
 index(y.1)-update(index(y.1),month=unique(month(index(x.1  
 y.1
#    [,1]
#2004-01-04 01:15:00    1
#2004-01-04 01:30:00    2
#2004-01-04 01:45:00    3
#2004-01-04 02:00:00    4
#2004-01-04 02:30:00    5
#2004-01-04 03:30:00    6


A.K.


From: Eric Morway emor...@usgs.gov
To: arun smartpink...@yahoo.com 
Cc: R help r-help@r-project.org 
Sent: Saturday, November 3, 2012 11:44 AM
Subject: Re: [R] override date in xts time series


Hello Arun,  

I too am using R 2.15 and am unable
to get the same result as you.  You will notice in the R code that
follows that when I use 'update' the time in the xts object goes haywire.
 For example, 2004-04-04
01:15:00 ESTgets converted
to 2004-01-03 22:15:00 PST(see below).  Because I can't get the result you 
showed me in your
previous response, and maybe you won't be able to get this result, I've
resorted back to your other suggestion using gsub.  I don't have a
good handle on regular expressions and was wondering if in the last line
of code below, the replace month is 'hardwired'?  In other words,
could \\101\\2somehow be replaced with unique(month(index(x.1)))in
the last line of code below so that x.1 is providing the replacement month,
rather than have it fixed?  Or perhaps I've misunderstood the regular
expression, which is entirely possible. 


sessionInfo() 
#R version 2.15.2 (2012-10-26) 
#Platform: x86_64-w64-mingw32/x64 (64-bit) 

library(xts) 
library(lubridate) 

x.Date - rep(1/1/2004,times=5) 
x.Times- c(01:15:00,
01:30:00, 01:45:00, 

     02:00:00, 02:30:00, 03:00:00,
03:15:00) 
x-paste(x.Date,x.Times) 

y.Date - rep(4/4/2004,times=4) 
y.Times- c(01:15:00,
01:30:00, 01:45:00, 

     02:00:00, 02:30:00, 03:30:00) 

y-paste(y.Date,y.Times) 

fmt - %m/%d/%Y %H:%M:%S 
x.1-xts(1:7, as.POSIXct(x, format=fmt,
tz = EST)) 
y.1-xts(1:6, as.POSIXct(y, format=fmt,
tz = EST)) 

y.1 

#          
         [,1] 
#2004-04-04 01:15:00    1 
#2004-04-04 01:30:00    2 
#2004-04-04 01:45:00    3 
#2004-04-04 02:00:00    4 
#2004-04-04 02:30:00    5 
#2004-04-04 03:30:00    6 
#Warning message: 
#timezone of object (EST) is different
than current timezone ().  

index(y.1) 
# 2004-04-04 01:15:00 EST
2004-04-04 01:30:00 EST 
# 2004-04-04 01:45:00 EST
2004-04-04 02:00:00 EST 
# 2004-04-04 02:30:00 EST
2004-04-04 03:30:00 EST 

index(y.1)-update(index(y.1),month=unique(month(index(x.1  
index(y.1) 

# 2004-01-03 22:15:00 PST
2004-01-03 22:30:00 PST 
# 2004-01-03 22:45:00 PST
2004-01-03 23:00:00 PST 
# 2004-01-03 23:30:00 PST
2004-01-04 00:30:00 PST 


index(y.1)-as.POSIXct(gsub((.*\\-).*(\\-.*),\\101\\2,index(y.1)))          
         

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal,

Re: [R] Replacing NAs in long format

2012-11-03 Thread jim holtman

 x - read.table(text = idr  schyear year
+  1   80
+  1   91
+  1  10   NA
+  2   4   NA
+  2   5   -1
+  2   60
+  2   71
+  2   82
+  2   93
+  2  104
+  2  11   NA
+  2  126
+  3   4   NA
+  3   5   -2
+  3   6   -1
+  3   70
+  3   81
+  3   92
+  3  103
+  3  11   NA, header = TRUE)
  # you did not specify if there might be multiple contiguous NAs,
  # so there are a lot of checks to be made
  x.l - lapply(split(x, x$idr), function(.idr){
+ # check for all NAs -- just return indeterminate state
+ if (sum(is.na(.idr$year)) == nrow(.idr)) return(.idr)
+ # repeat until all NAs have been fixed; takes care of contiguous ones
+ while (any(is.na(.idr$year))){
+ # find all the NAs
+ for (i in which(is.na(.idr$year))){
+ if ((i == 1L)  (!is.na(.idr$year[i + 1L]))){
+ .idr$year[i] - .idr$year[i + 1L] - 1
+ } else if ((i  1L)  (!is.na(.idr$year[i - 1L]))){
+ .idr$year[i] - .idr$year[i - 1L] + 1
+ } else if ((i  nrow(.idr))  (!is.na(.idr$year[i + 1L]))){
+ .idr$year[i] - .idr$year[i + 1L] -1
+ }
+ }
+ }
+ return(.idr)
+ })
 do.call(rbind, x.l)
 idr schyear year
1.11   80
1.21   91
1.31  102
2.42   4   -2
2.52   5   -1
2.62   60
2.72   71
2.82   82
2.92   93
2.10   2  104
2.11   2  115
2.12   2  126
3.13   3   4   -3
3.14   3   5   -2
3.15   3   6   -1
3.16   3   70
3.17   3   81
3.18   3   92
3.19   3  103
3.20   3  114




On Sat, Nov 3, 2012 at 1:14 PM, Christopher Desjardins
cddesjard...@gmail.com wrote:
 Hi,
 I have the following data:

 data[1:20,c(1,2,20)]
 idr  schyear year
 1   80
 1   91
 1  10   NA
 2   4   NA
 2   5   -1
 2   60
 2   71
 2   82
 2   93
 2  104
 2  11   NA
 2  126
 3   4   NA
 3   5   -2
 3   6   -1
 3   70
 3   81
 3   92
 3  103
 3  11   NA

 What I want to do is replace the NAs in the year variable with the
 following:

 idr  schyear year
 1   80
 1   91
 1  10   2
 2   4   -2
 2   5   -1
 2   60
 2   71
 2   82
 2   93
 2  104
 2  11   5
 2  126
 3   4   -3
 3   5   -2
 3   6   -1
 3   70
 3   81
 3   92
 3  103
 3  11   4

 I have no idea how to do this. What it needs to do is make sure that for
 each subject (idr) that it either adds a 1 if it is preceded by a value in
 year or subtracts a 1 if it comes before a year value.

 Does that make sense? I could do this in Excel but I am at a loss for how
 to do this in R. Please reply to me as well as the list if you respond.

 Thanks!
 Chris

 [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] to print system.time always

2012-11-03 Thread jim holtman

Here is a faster solution to your 'apply'; use 'sapply' instead:

 str(x)
 num [1:100, 1:30] 0.0346 0.4551 0.66 0.8528 0.5494 ...

 system.time(y - apply(x, 1, cumsum))
   user  system elapsed
  13.240.61   14.02
 system.time(ys - sapply(1:col, function(a) cumsum(x[,a])))
   user  system elapsed
   1.400.141.59


On Sat, Nov 3, 2012 at 11:52 AM, mrzung mrzun...@gmail.com wrote:
 Hi all;

 I want to print system.time whenever I execute any command.

 It takes too much time to type system.time() function to all command.

 is there any solution on it?

 And,

 apply(matrix,1,cumsum) command is too slow to some large matrix.

 is there any function like rowCumSums ?

 thank u!



 --
 View this message in context: 
 http://r.789695.n4.nabble.com/to-print-system-time-always-tp4648314.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] to print system.time always

2012-11-03 Thread Uwe Ligges




On 03.11.2012 16:52, mrzung wrote:

Hi all;

I want to print system.time whenever I execute any command.

It takes too much time to type system.time() function to all command.

is there any solution on it?


See ?Rprof on how to profile your code.



And,

apply(matrix,1,cumsum) command is too slow to some large matrix.

is there any function like rowCumSums ?


You had:

result1 - apply(matrix,1,cumsum)


This is only slow, if you have lots of rows. Now think in matrices 
how to to that:


b - sapply(1:ncol(matrix), function(i) c(rep(1, i), rep(0, ncol(matrix)-i)
result2 - t(x %*% b)

This is roughly 10 times faster on a 100 x 10 matrix.

Check the results:
all.equal(result1, result2)


Uwe Ligges







thank u!



--
View this message in context: 
http://r.789695.n4.nabble.com/to-print-system-time-always-tp4648314.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] to print system.time always

2012-11-03 Thread Uwe Ligges




On 03.11.2012 19:42, jim holtman wrote:

Here is a faster solution to your 'apply'; use 'sapply' instead:


str(x)

  num [1:100, 1:30] 0.0346 0.4551 0.66 0.8528 0.5494 ...


system.time(y - apply(x, 1, cumsum))

user  system elapsed
   13.240.61   14.02

system.time(ys - sapply(1:col, function(a) cumsum(x[,a])))

user  system elapsed
1.400.141.59



Which solves another problem (cumsum of cols rather than rows). Applying 
it on rows won't be much faster.


Uwe Ligges




On Sat, Nov 3, 2012 at 11:52 AM, mrzung mrzun...@gmail.com wrote:

Hi all;

I want to print system.time whenever I execute any command.

It takes too much time to type system.time() function to all command.

is there any solution on it?

And,

apply(matrix,1,cumsum) command is too slow to some large matrix.

is there any function like rowCumSums ?

thank u!



--
View this message in context: 
http://r.789695.n4.nabble.com/to-print-system-time-always-tp4648314.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.






__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] to print system.time always

2012-11-03 Thread jim holtman

I use notepad++ on Windows, so it is easy to add a hotkey that will
surround a block of code that you want to execute with:

system.time({..code to run..})

Usually you don't want it around each statement.  I use the following
function to have it print out CPU and memory usage at various
locations in my script since it allows me to see the progress and
where time might be going:

my.stats -
local({
# local variables to hold the last times
# first two variables are the elasped and CPU times from the last report

Level- 1
MaxLevel - 30
Stack- matrix(0, ncol=2, nrow=MaxLevel)
function(text = stats, reset=FALSE, oper=)
{
procTime - proc.time()[1:3]# get current metrics
if (reset){ # setup to mark timing from this point
Level - 1 # reset the Level
Stack[Level, ] - c(procTime[3], procTime[1] + procTime[2])
}
if (oper == push){
if (Level  MaxLevel) Level - Level + 1
Stack[Level, ] - c(procTime[3], procTime[1] + procTime[2])
}
.caller - sys.calls()
if (length(.caller) == 1) .caller - Rgui
else .caller - as.character(.caller[[length(.caller) - 1]])[1]
cat(sprintf(%s (%d) - %s : %s %.1f %.1f %.1f : %.1fMB\n,
text,
Level,
.caller,
format(Sys.time(), format=%H:%M:%S),
procTime[1] + procTime[2] - Stack[Level, 2],
procTime[3] - Stack[Level, 1],
procTime[3],
memory.size()))
if ((oper == pop)  (Level  1)) Level - Level - 1
else if (oper == reset) Level - 1
invisible(flush.console())  # force a write to the console
}
})

It produces output like this:

 my.stats('start')
start (1) - Rgui : 14:53:16 39.2 597822.1 597822.1 : 1213.8MB
 system.time(for(i in 1:col) ym[, i] - cumsum(x[,i]))
   user  system elapsed
   1.770.011.80
 my.stats('done')
done (1) - Rgui : 14:53:23 41.0 597828.6 597828.6 : 1213.8MB


This says that between 'start' and 'done',   1.8 CPU seconds were used
(41.0 - 39.2) which is what syste.time was reporting.




On Sat, Nov 3, 2012 at 11:52 AM, mrzung mrzun...@gmail.com wrote:
 Hi all;

 I want to print system.time whenever I execute any command.

 It takes too much time to type system.time() function to all command.

 is there any solution on it?

 And,

 apply(matrix,1,cumsum) command is too slow to some large matrix.

 is there any function like rowCumSums ?

 thank u!



 --
 View this message in context: 
 http://r.789695.n4.nabble.com/to-print-system-time-always-tp4648314.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] mergeing a large number of large .csvs

2012-11-03 Thread jim holtman

A faster way would be to use something like 'per', 'awk' or 'sed'.
You can strip off the header line of each CSV (if it has one) and then
concatenate the files together.  This is very efficient use of memory
since you are just reading one file at a time and then writing it out.
 Will probably be a lot faster since no conversions have to be done.
Once you have the one large file, then you can play with it (load it
if you have enough memory, or load it into a database).

On Sat, Nov 3, 2012 at 11:37 AM, Jeff Newmiller
jdnew...@dcn.davis.ca.us wrote:
 On the absence of any data examples from you per the posting guidelines, I 
 will refer you to the help files for the melt function in the reshape2 
 package.  Note that there can be various mixtures of wide versus long... such 
 as a wide file with one date column and columns representing all stock prices 
 and all trade volumes. The longest format would be what melt gives (date, 
 column name, and value) but an in-between format would have one distinct 
 column each for dollar values and volume values with a column indicating 
 ticker label and of course another for date.

 If your csv files can be grouped according to those with similar column 
 types, then as you read them in you can use cbind( csvlabel=somelabel, 
 csvdf) to distinguish it and then rbind those data frames together to create 
 an intermediate-width data frame. When dealing with large amounts of data you 
 will want to minimize the amount of reshaping you do, but it would require 
 knowledge of your data and algorithms to say any more.
 ---
 Jeff NewmillerThe .   .  Go Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live Go...
   Live:   OO#.. Dead: OO#..  Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.  rocks...1k
 ---
 Sent from my phone. Please excuse my brevity.

 Benjamin Caldwell btcaldw...@berkeley.edu wrote:

Jeff,
If you're willing to educate, I'd be happy to learn what wide vs long
format means. I'll give rbind a shot in the meantime.
Ben
On Nov 2, 2012 4:31 PM, Jeff Newmiller jdnew...@dcn.davis.ca.us
wrote:

 I would first confirm that you need the data in wide format... many
 algorithms are more efficient in long format anyway, and rbind is way
more
 efficient than merge.

 If you feel this is not negotiable, you may want to consider sqldf.
Yes,
 you need to learn a bit of SQL, but it is very well integrated into
R.

---
 Jeff NewmillerThe .   .  Go
Live...
 DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
 Go...
   Live:   OO#.. Dead: OO#..
Playing
 Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
 /Software/Embedded Controllers)   .OO#.   .OO#.
rocks...1k

---
 Sent from my phone. Please excuse my brevity.

 Benjamin Caldwell btcaldw...@berkeley.edu wrote:

 Dear R help;
 I'm currently trying to combine a large number (about 30 x 30) of
large
 .csvs together (each at least 1 records). They are organized by
 plots,
 hence 30 X 30, with each group of csvs in a folder which corresponds
to
 the
 plot. The unmerged csvs all have the same number of columns (5). The
 fifth
 column has a different name for each csv. The number of rows is
 different.
 
 The combined csvs are of course quite large, and the code I'm
running
 is
 quite slow - I'm currently running it on a computer with 10 GB ram,
 ssd,
 and quad core 2.3 ghz processor; it's taken 8 hours and it's only
75%
 of
 the way through (it's hung up on one of the largest data groupings
now
 for
 an hour, and using 3.5 gigs of RAM.
 
 I know that R isn't the most efficient way of doing this, but I'm
not
 familiar with sql or C. I wonder if anyone has suggestions for a
 different
 way to do this in the R environment. For instance, the key function
now
 is
 merge, but I haven't tried join from the plyr package or rbind from
 base.
 I'm willing to provide a dropbox link to a couple of these files if
 you'd
 like to see the data. My code is as follows:
 
 
 #multmerge is based on code by Tony cookson,
 

http://www.r-bloggers.com/merging-multiple-data-files-into-one-data-frame/
 ;
 The function takes a path. This path should be the name of a folder
 that
 contains all of the files you would like to read and merge together
and
 only those files you would like to merge.
 
 multmerge = function(mypath){
 filenames=list.files(path=mypath, full.names=TRUE)
 datalist = try(lapply(filenames,
 function(x){read.csv(file=x,header=T)}))

[R] Violin plot of categorical/binned data

2012-11-03 Thread Nathan Miller

Hi,

I'm trying to create a plot showing the density distribution of some
shipping data. I like the look of violin plots, but my data is not
continuous but rather binned and I want to make sure its binned nature (not
smooth) is apparent in the final plot. So for example, I have the number of
individuals per vessel, but rather than having the actual number of
individuals I have data in the format of: 7 values of zero, 11 values
between 1-10, 6 values between 10-100, 13 values between 100-1000, etc. To
plot this data I generated a new dataset with the first 7 values being 0,
representing the 7 values of 0, the next 11 values being 5.5, representing
the 11 values between 1-10, etc. Sample data below.

I can make a violin plot (code below) using a log y-axis, which looks
alright (though I do have to deal with the zeros still), but in its default
format it hides the fact that these are binned data, which seems a bit
misleading. Is it possible to make a violin plot that looks a bit more
angular (more corners, less smoothing) or in someway shows the
distribution, but also clearly shows the true nature of these data? I've
tried playing with the bandwidth adjustment and the kernel but haven't been
able to get a figure that seems to work.

Anyone have some thoughts on this?

Thanks,
Nate

library(ggplot2)
library(scales)

p=ggplot(data2,(aes(vessel,values)))
p+geom_violin()+
scale_y_log10(breaks = trans_breaks(log10, function(x) 10^x),labels =
trans_format(log10, math_format(10^.x)))

data2-read.table(textConnection(
   vessel  values
 rec 0.0e+00
 rec 0.0e+00
 rec 0.0e+00
 rec 0.0e+00
 rec 0.0e+00
 rec 0.0e+00
 rec 0.0e+00
 rec 5.5e+00
 rec 5.5e+00
rec 5.5e+00
rec 5.5e+00
rec 5.5e+00
rec 5.5e+00
rec 5.5e+00
rec 5.5e+00
rec 5.5e+00
rec 5.5e+00
rec 5.5e+00
rec 5.5e+01
rec 5.5e+01
rec 5.5e+01
rec 5.5e+01
rec 5.5e+01
rec 5.5e+01
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+02
rec 5.5e+03
rec 5.5e+03
rec 5.5e+03
rec 5.5e+03
rec 5.5e+03
rec 5.5e+03
rec 5.5e+03
rec 5.5e+04
rec 5.5e+04
rec 5.5e+04
rec 5.5e+05
rec 5.5e+05,header=T)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] some help

2012-11-03 Thread David Winsemius


On Nov 3, 2012, at 9:07 AM, dattel_palme wrote:

 Hi People!
 
 I have following concern consisting of some steps to do in R: 
 
 I have an ascii file (table) consisting of many columns and rows. 
 1. I would like to order all values of the columns one under each other. It
 will begin with column 1, then column 2 under column 1, column 3 under
 column 2 etc. until at the end there is only 1 column. How do I do it?

something along the lines of 

dat - read.table(filename, sep=separator, header=TRUE)
stacked - do.call(rbind, dat)


 
 2. Second problem is to make a scatterplot of two variables (I think after
 further explanation scatter plot itself will not be needed). I have two
 columns of two different variables (that I produces before),

Did you now? But from the data produced above you only have one column. Is this 
another data-object where these column have names?

 column 1 with
 variable 1 and column 2 with variable 2. I would like to order them by one
 variable and 0,01 interval (the varibale values will range between 0 and 1).
 From each 0,01 interval (100 intervals) i want to pick the maximum and
 minimum value of variable 2. 
 

That is incoherent to this native speaker of English who is sometimes confused 
by presentations of problems without concrete references. An example would help 
greatly.


 3. From the obtained max and min of values of each interval i would like to
 make a linear least square regression. 

Definitely need an example. Please read the Posting Guide.

-- 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Alameda, CA, USA

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] some help

2012-11-03 Thread Rui Barradas


Hello,

Without data it's not easy to answer to your questions, but

1. Use ?unlist. If the data is in a file, read it with ?read.table and 
the unlist the result. All columns will be stacked.


dat - read.table(filename, ...)
unlist(dat)

2. At best confusing. But to divide a vector into groups use ?cut or 
?findInterval and then, to find the maximum and minimum of each group, 
?tapply or ?ave.


3. Regress what on what?


Provide a data example using dput for better answers:

dput( head(mydata, 30) )  # paste the output of this in a post


Hope this helps,

Rui Barradas
Em 03-11-2012 16:07, dattel_palme escreveu:

Hi People!

I have following concern consisting of some steps to do in R:

I have an ascii file (table) consisting of many columns and rows.
1. I would like to order all values of the columns one under each other. It
will begin with column 1, then column 2 under column 1, column 3 under
column 2 etc. until at the end there is only 1 column. How do I do it?

2. Second problem is to make a scatterplot of two variables (I think after
further explanation scatter plot itself will not be needed). I have two
columns of two different variables (that I produces before), column 1 with
variable 1 and column 2 with variable 2. I would like to order them by one
variable and 0,01 interval (the varibale values will range between 0 and 1).
From each 0,01 interval (100 intervals) i want to pick the maximum and
minimum value of variable 2.

3. From the obtained max and min of values of each interval i would like to
make a linear least square regression.

I hope someone can help me out!
Thanks
Stefan



--
View this message in context: 
http://r.789695.n4.nabble.com/some-help-tp4648316.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] mergeing a large number of large .csvs

2012-11-03 Thread Benjamin Caldwell

Jim,

Where can I find documentation of the commands you mention?
Thanks





On Sat, Nov 3, 2012 at 12:15 PM, jim holtman jholt...@gmail.com wrote:

 A faster way would be to use something like 'per', 'awk' or 'sed'.
 You can strip off the header line of each CSV (if it has one) and then
 concatenate the files together.  This is very efficient use of memory
 since you are just reading one file at a time and then writing it out.
  Will probably be a lot faster since no conversions have to be done.
 Once you have the one large file, then you can play with it (load it
 if you have enough memory, or load it into a database).

 On Sat, Nov 3, 2012 at 11:37 AM, Jeff Newmiller
 jdnew...@dcn.davis.ca.us wrote:
  On the absence of any data examples from you per the posting guidelines,
 I will refer you to the help files for the melt function in the reshape2
 package.  Note that there can be various mixtures of wide versus long...
 such as a wide file with one date column and columns representing all stock
 prices and all trade volumes. The longest format would be what melt gives
 (date, column name, and value) but an in-between format would have one
 distinct column each for dollar values and volume values with a column
 indicating ticker label and of course another for date.
 
  If your csv files can be grouped according to those with similar column
 types, then as you read them in you can use cbind( csvlabel=somelabel,
 csvdf) to distinguish it and then rbind those data frames together to
 create an intermediate-width data frame. When dealing with large amounts of
 data you will want to minimize the amount of reshaping you do, but it would
 require knowledge of your data and algorithms to say any more.
 
 ---
  Jeff NewmillerThe .   .  Go
 Live...
  DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
 Go...
Live:   OO#.. Dead: OO#..  Playing
  Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
  /Software/Embedded Controllers)   .OO#.   .OO#.
  rocks...1k
 
 ---
  Sent from my phone. Please excuse my brevity.
 
  Benjamin Caldwell btcaldw...@berkeley.edu wrote:
 
 Jeff,
 If you're willing to educate, I'd be happy to learn what wide vs long
 format means. I'll give rbind a shot in the meantime.
 Ben
 On Nov 2, 2012 4:31 PM, Jeff Newmiller jdnew...@dcn.davis.ca.us
 wrote:
 
  I would first confirm that you need the data in wide format... many
  algorithms are more efficient in long format anyway, and rbind is way
 more
  efficient than merge.
 
  If you feel this is not negotiable, you may want to consider sqldf.
 Yes,
  you need to learn a bit of SQL, but it is very well integrated into
 R.
 

 ---
  Jeff NewmillerThe .   .  Go
 Live...
  DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
  Go...
Live:   OO#.. Dead: OO#..
 Playing
  Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
  /Software/Embedded Controllers)   .OO#.   .OO#.
 rocks...1k
 

 ---
  Sent from my phone. Please excuse my brevity.
 
  Benjamin Caldwell btcaldw...@berkeley.edu wrote:
 
  Dear R help;
  I'm currently trying to combine a large number (about 30 x 30) of
 large
  .csvs together (each at least 1 records). They are organized by
  plots,
  hence 30 X 30, with each group of csvs in a folder which corresponds
 to
  the
  plot. The unmerged csvs all have the same number of columns (5). The
  fifth
  column has a different name for each csv. The number of rows is
  different.
  
  The combined csvs are of course quite large, and the code I'm
 running
  is
  quite slow - I'm currently running it on a computer with 10 GB ram,
  ssd,
  and quad core 2.3 ghz processor; it's taken 8 hours and it's only
 75%
  of
  the way through (it's hung up on one of the largest data groupings
 now
  for
  an hour, and using 3.5 gigs of RAM.
  
  I know that R isn't the most efficient way of doing this, but I'm
 not
  familiar with sql or C. I wonder if anyone has suggestions for a
  different
  way to do this in the R environment. For instance, the key function
 now
  is
  merge, but I haven't tried join from the plyr package or rbind from
  base.
  I'm willing to provide a dropbox link to a couple of these files if
  you'd
  like to see the data. My code is as follows:
  
  
  #multmerge is based on code by Tony cookson,
  
 
 
 http://www.r-bloggers.com/merging-multiple-data-files-into-one-data-frame/
  ;
  The function takes a path. This path should be the name of a folder
  that
  contains all of the files

Re: [R] backreferences in gregexpr

2012-11-03 Thread Alexander Shenkin

On 11/2/2012 5:14 PM, Gabor Grothendieck wrote:
 On Fri, Nov 2, 2012 at 6:02 PM, Alexander Shenkin ashen...@ufl.edu wrote:
 Hi Folks,

 I'm trying to extract just the backreferences from a regex.

 temp = abcd1234abcd1234
 regmatches(temp, gregexpr((?:abcd)(1234), temp))
 [[1]]
 [1] abcd1234 abcd1234

 What I would like is:
 [1] 1234 1234

 Note: I know I can just match 1234 here, but the actual example is
 complicated enough that I have to match a larger string, and just want
 to pass out the backreferenced portion.

 Any help greatly appreciated!

 
 Try this:
 
 library(gsubfn)
 strapplyc(temp, abcd(1234))
 [[1]]
 [1] 1234 1234
 

Thanks Gabor.  Didn't find strapplyc in package gsubfun, but did find
strapply, and that worked well.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] mergeing a large number of large .csvs

2012-11-03 Thread jim holtman

These are not commands, but programs you can use.  Here is a file copy
program in perl (I spelt it wrong in the email);  This will copy all
the files that have daily in their names.  It also skips the first
line of each file assuming that it is the header.

perl  can be found on most systems.  www.activestate.com  has a
version that runs under Windows and that is what I am using.


chdir /temp/csv;  # my directory with files
@files = glob daily*csv;  # get files to copy (daily..csv)
open OUTPUT, combined.csv; # output file
# loop for each file
foreach $file (@files) {
print $file, \n;  # print file being processed
open INPUT,  . $file;
# assume that the first line is a header, so skip it
$header = INPUT;
@all = INPUT;  # read rest of the file
close INPUT;
print OUTPUT @all;  # append to the output
}
close OUTPUT;

Here is what was printed on the console:


C:\Users\Ownerperl copyFiles.pl
daily.BO.csv
daily.C.csv
daily.CL.csv
daily.CT.csv
daily.GC.csv
daily.HO.csv
daily.KC.csv
daily.LA.csv
daily.LN.csv
daily.LP.csv
daily.LX.csv
daily.NG.csv
daily.S.csv
daily.SB.csv
daily.SI.csv
daily.SM.csv

Which was a list of all the files copied.

On Sat, Nov 3, 2012 at 4:08 PM, Benjamin Caldwell
btcaldw...@berkeley.edu wrote:
 Jim,

 Where can I find documentation of the commands you mention?
 Thanks





 On Sat, Nov 3, 2012 at 12:15 PM, jim holtman jholt...@gmail.com wrote:

 A faster way would be to use something like 'per', 'awk' or 'sed'.
 You can strip off the header line of each CSV (if it has one) and then
 concatenate the files together.  This is very efficient use of memory
 since you are just reading one file at a time and then writing it out.
  Will probably be a lot faster since no conversions have to be done.
 Once you have the one large file, then you can play with it (load it
 if you have enough memory, or load it into a database).

 On Sat, Nov 3, 2012 at 11:37 AM, Jeff Newmiller
 jdnew...@dcn.davis.ca.us wrote:
  On the absence of any data examples from you per the posting guidelines,
  I will refer you to the help files for the melt function in the reshape2
  package.  Note that there can be various mixtures of wide versus long...
  such as a wide file with one date column and columns representing all stock
  prices and all trade volumes. The longest format would be what melt gives
  (date, column name, and value) but an in-between format would have one
  distinct column each for dollar values and volume values with a column
  indicating ticker label and of course another for date.
 
  If your csv files can be grouped according to those with similar column
  types, then as you read them in you can use cbind( csvlabel=somelabel,
  csvdf) to distinguish it and then rbind those data frames together to 
  create
  an intermediate-width data frame. When dealing with large amounts of data
  you will want to minimize the amount of reshaping you do, but it would
  require knowledge of your data and algorithms to say any more.
 
  ---
  Jeff NewmillerThe .   .  Go
  Live...
  DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
  Go...
Live:   OO#.. Dead: OO#..  Playing
  Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
  /Software/Embedded Controllers)   .OO#.   .OO#.
  rocks...1k
 
  ---
  Sent from my phone. Please excuse my brevity.
 
  Benjamin Caldwell btcaldw...@berkeley.edu wrote:
 
 Jeff,
 If you're willing to educate, I'd be happy to learn what wide vs long
 format means. I'll give rbind a shot in the meantime.
 Ben
 On Nov 2, 2012 4:31 PM, Jeff Newmiller jdnew...@dcn.davis.ca.us
 wrote:
 
  I would first confirm that you need the data in wide format... many
  algorithms are more efficient in long format anyway, and rbind is way
 more
  efficient than merge.
 
  If you feel this is not negotiable, you may want to consider sqldf.
 Yes,
  you need to learn a bit of SQL, but it is very well integrated into
 R.
 

  ---
  Jeff NewmillerThe .   .  Go
 Live...
  DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
  Go...
Live:   OO#.. Dead: OO#..
 Playing
  Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
  /Software/Embedded Controllers)   .OO#.   .OO#.
 rocks...1k
 

  ---
  Sent from my phone. Please excuse my brevity.
 
  Benjamin Caldwell btcaldw...@berkeley.edu wrote:
 
  Dear R help;
  I'm currently trying to combine a large number (about 30 x 30) of
 large
  .csvs together (each at least 1 records). They are organized

Re: [R] mergeing a large number of large .csvs

2012-11-03 Thread jim holtman

It easier than that.  I forgot I can do it entirely within R:

setwd(/temp/csv)
files - Sys.glob(daily*csv)
output - file('Rcombined.csv', 'w')
for (i in files){
cat(i, '\n')  # write out file processing
input - readLines(i)
input - input[-1L]  # delete header
writeLines(input, output)
}
close(output)



On Sat, Nov 3, 2012 at 4:56 PM, jim holtman jholt...@gmail.com wrote:
 These are not commands, but programs you can use.  Here is a file copy
 program in perl (I spelt it wrong in the email);  This will copy all
 the files that have daily in their names.  It also skips the first
 line of each file assuming that it is the header.

 perl  can be found on most systems.  www.activestate.com  has a
 version that runs under Windows and that is what I am using.


 chdir /temp/csv;  # my directory with files
 @files = glob daily*csv;  # get files to copy (daily..csv)
 open OUTPUT, combined.csv; # output file
 # loop for each file
 foreach $file (@files) {
 print $file, \n;  # print file being processed
 open INPUT,  . $file;
 # assume that the first line is a header, so skip it
 $header = INPUT;
 @all = INPUT;  # read rest of the file
 close INPUT;
 print OUTPUT @all;  # append to the output
 }
 close OUTPUT;

 Here is what was printed on the console:


 C:\Users\Ownerperl copyFiles.pl
 daily.BO.csv
 daily.C.csv
 daily.CL.csv
 daily.CT.csv
 daily.GC.csv
 daily.HO.csv
 daily.KC.csv
 daily.LA.csv
 daily.LN.csv
 daily.LP.csv
 daily.LX.csv
 daily.NG.csv
 daily.S.csv
 daily.SB.csv
 daily.SI.csv
 daily.SM.csv

 Which was a list of all the files copied.

 On Sat, Nov 3, 2012 at 4:08 PM, Benjamin Caldwell
 btcaldw...@berkeley.edu wrote:
 Jim,

 Where can I find documentation of the commands you mention?
 Thanks





 On Sat, Nov 3, 2012 at 12:15 PM, jim holtman jholt...@gmail.com wrote:

 A faster way would be to use something like 'per', 'awk' or 'sed'.
 You can strip off the header line of each CSV (if it has one) and then
 concatenate the files together.  This is very efficient use of memory
 since you are just reading one file at a time and then writing it out.
  Will probably be a lot faster since no conversions have to be done.
 Once you have the one large file, then you can play with it (load it
 if you have enough memory, or load it into a database).

 On Sat, Nov 3, 2012 at 11:37 AM, Jeff Newmiller
 jdnew...@dcn.davis.ca.us wrote:
  On the absence of any data examples from you per the posting guidelines,
  I will refer you to the help files for the melt function in the reshape2
  package.  Note that there can be various mixtures of wide versus long...
  such as a wide file with one date column and columns representing all 
  stock
  prices and all trade volumes. The longest format would be what melt gives
  (date, column name, and value) but an in-between format would have one
  distinct column each for dollar values and volume values with a column
  indicating ticker label and of course another for date.
 
  If your csv files can be grouped according to those with similar column
  types, then as you read them in you can use cbind( csvlabel=somelabel,
  csvdf) to distinguish it and then rbind those data frames together to 
  create
  an intermediate-width data frame. When dealing with large amounts of data
  you will want to minimize the amount of reshaping you do, but it would
  require knowledge of your data and algorithms to say any more.
 
  ---
  Jeff NewmillerThe .   .  Go
  Live...
  DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
  Go...
Live:   OO#.. Dead: OO#..  Playing
  Research Engineer (Solar/BatteriesO.O#.   #.O#.  with
  /Software/Embedded Controllers)   .OO#.   .OO#.
  rocks...1k
 
  ---
  Sent from my phone. Please excuse my brevity.
 
  Benjamin Caldwell btcaldw...@berkeley.edu wrote:
 
 Jeff,
 If you're willing to educate, I'd be happy to learn what wide vs long
 format means. I'll give rbind a shot in the meantime.
 Ben
 On Nov 2, 2012 4:31 PM, Jeff Newmiller jdnew...@dcn.davis.ca.us
 wrote:
 
  I would first confirm that you need the data in wide format... many
  algorithms are more efficient in long format anyway, and rbind is way
 more
  efficient than merge.
 
  If you feel this is not negotiable, you may want to consider sqldf.
 Yes,
  you need to learn a bit of SQL, but it is very well integrated into
 R.
 

  ---
  Jeff NewmillerThe .   .  Go
 Live...
  DCN:jdnew...@dcn.davis.ca.usBasics: ##.#.   ##.#.  Live
  Go...
Live:   OO#.. Dead: OO#..
 Playing
  Research Engineer (Solar/BatteriesO.O#.

[R] Can you turn a string into a (working) symbol?

2012-11-03 Thread andrewH

Dear folks--

Suppose I have an expression that evaluates to a string, and that that
string, were it not a character vector, would be a symbol.  I would like a
function, call it doppel(), that will take that expression as an argument
and produce something that functions exactly like the symbol would have if I
typed it in the place of the function of the expression.  It should go as
far along the path to evaluation as the symbol would have, and then stop,
and be available for subsequent manipulation.  For example, if 

aa - 3.1416
bb  - function(x) {x^2}
r - 2
xx - c(aa, bb)

out - doppel(xx[1])*doppel(xx[2])(r)

Then out should be 13.3664

Or similarly, after 
doppel(paste(a,  a,  sep=''))  -  3
aa

typing aa should return 3.

Is there such a function? Can there be? 

I thought as.symbol would do this, but it does not.
 as.symbol (xx[1])*as.symbol (xx[2])(r)
Error: attempt to apply non-function

Looking forward to hearing from y'all.--andrewH




--
View this message in context: 
http://r.789695.n4.nabble.com/Can-you-turn-a-string-into-a-working-symbol-tp4648343.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] backreferences in gregexpr

2012-11-03 Thread Gabor Grothendieck

On Sat, Nov 3, 2012 at 4:08 PM, Alexander Shenkin ashen...@ufl.edu wrote:

 On 11/2/2012 5:14 PM, Gabor Grothendieck wrote:
  On Fri, Nov 2, 2012 at 6:02 PM, Alexander Shenkin ashen...@ufl.edu
 wrote:
  Hi Folks,
 
  I'm trying to extract just the backreferences from a regex.
 
  temp = abcd1234abcd1234
  regmatches(temp, gregexpr((?:abcd)(1234), temp))
  [[1]]
  [1] abcd1234 abcd1234
 
  What I would like is:
  [1] 1234 1234
 
  Note: I know I can just match 1234 here, but the actual example is
  complicated enough that I have to match a larger string, and just want
  to pass out the backreferenced portion.
 
  Any help greatly appreciated!
 
 
  Try this:
 
  library(gsubfn)
  strapplyc(temp, abcd(1234))
  [[1]]
  [1] 1234 1234
 

 Thanks Gabor.  Didn't find strapplyc in package gsubfun, but did find
 strapply, and that worked well.



You must have an old version of the package.  Time to upgrade.

-- 
Statistics  Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Logical vector-based extraction

2012-11-03 Thread Muhuri, Pradip (SAMHSA/CBHSQ)


Hello,

The most part of the program works except that the following logical variable 
does not get created although the second logical variable-based extraction 
works.

 I don't understand what I am doing wrong here.

state_pflt200 - df$p_fatal 200
df[state_pflt200, c(state.name,p_fatal)]


I would appreciate receiving your help.

Thanks,

Pradip Muhuri




# Below is the code that includes the reproducible example. 

df - data.frame (state.name=
  
c(Alabama,Alaska,Arizona,Arkansas,California,Colorado,Connecticut,
  Delaware,DC, 
Florida,Georgia,Hawaii,Idaho,Illinois,Indiana,
  Iowa,Kansas,Kentucky,   
Louisiana,Maine,Maryland,Massachusetts,Michigan,
  
Minnesota,Mississippi,Missouri,Montana,Nebraska,Nevada,New 
Hampshire,
  New Jersey,New Mexico,New York,North Carolina,North 
Dakota,Ohio,Oklahoma,
  Oregon,Pennsylvania,Rhode Island,South 
Carolina,South Dakota,Tennessee,Texas,
   Utah, Vermont,Virginia,Washington,West 
Virginia,Wisconsin,Wyoming),

   p_fatal = sample(200:500,51,replace=TRUE),

   t_safety_score = sample(1:10,51,replace=TRUE)
  )

options (width=120)



# The following logical variable does not get created - Don't understand what I 
am doing wrong
state_pflt200 - df$p_fatal 200
df[state_pflt200, c(state.name,p_fatal)]

# The following works
state_sslt5 - df$t_safety_score 5
df[state_sslt5,c(state.name, t_safety_score)]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Logical vector-based extraction

2012-11-03 Thread jim holtman

works fine for me creating:

 state_pflt200 - df$p_fatal 200
  df[state_pflt200, c(state.name,p_fatal)]
[1] state.name p_fatal
0 rows (or 0-length row.names)


considering that there were no values less than 200 in your data, the
result is correct.

So what is the problem?

On Sat, Nov 3, 2012 at 5:41 PM, Muhuri, Pradip (SAMHSA/CBHSQ)
pradip.muh...@samhsa.hhs.gov wrote:

 Hello,

 The most part of the program works except that the following logical variable 
 does not get created although the second logical variable-based extraction 
 works.

  I don't understand what I am doing wrong here.

 state_pflt200 - df$p_fatal 200
 df[state_pflt200, c(state.name,p_fatal)]


 I would appreciate receiving your help.

 Thanks,

 Pradip Muhuri




 # Below is the code that includes the reproducible example.

 df - data.frame (state.name=
   
 c(Alabama,Alaska,Arizona,Arkansas,California,Colorado,Connecticut,
   Delaware,DC, 
 Florida,Georgia,Hawaii,Idaho,Illinois,Indiana,
   Iowa,Kansas,Kentucky,   
 Louisiana,Maine,Maryland,Massachusetts,Michigan,
   
 Minnesota,Mississippi,Missouri,Montana,Nebraska,Nevada,New 
 Hampshire,
   New Jersey,New Mexico,New York,North 
 Carolina,North Dakota,Ohio,Oklahoma,
   Oregon,Pennsylvania,Rhode Island,South 
 Carolina,South Dakota,Tennessee,Texas,
Utah, Vermont,Virginia,Washington,West 
 Virginia,Wisconsin,Wyoming),

p_fatal = sample(200:500,51,replace=TRUE),

t_safety_score = sample(1:10,51,replace=TRUE)
   )

 options (width=120)



 # The following logical variable does not get created - Don't understand what 
 I am doing wrong
 state_pflt200 - df$p_fatal 200
 df[state_pflt200, c(state.name,p_fatal)]

 # The following works
 state_sslt5 - df$t_safety_score 5
 df[state_sslt5,c(state.name, t_safety_score)]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can you turn a string into a (working) symbol?

2012-11-03 Thread jim holtman

Is this what you want (the answer you wanted is not correct):

  aa - 3.1416
  bb  - function(x) {x^2}
  r - 2
  xx - c(aa, bb)

 doppel - function(x) get(x)

 out - doppel(xx[1])*doppel(xx[2])(r)

 out
[1] 12.5664



On Sat, Nov 3, 2012 at 5:31 PM, andrewH ahoer...@rprogress.org wrote:
 Dear folks--

 Suppose I have an expression that evaluates to a string, and that that
 string, were it not a character vector, would be a symbol.  I would like a
 function, call it doppel(), that will take that expression as an argument
 and produce something that functions exactly like the symbol would have if I
 typed it in the place of the function of the expression.  It should go as
 far along the path to evaluation as the symbol would have, and then stop,
 and be available for subsequent manipulation.  For example, if

 aa - 3.1416
 bb  - function(x) {x^2}
 r - 2
 xx - c(aa, bb)

 out - doppel(xx[1])*doppel(xx[2])(r)

 Then out should be 13.3664

 Or similarly, after
 doppel(paste(a,  a,  sep=''))  -  3
 aa

 typing aa should return 3.

 Is there such a function? Can there be?

 I thought as.symbol would do this, but it does not.
 as.symbol (xx[1])*as.symbol (xx[2])(r)
 Error: attempt to apply non-function

 Looking forward to hearing from y'all.--andrewH




 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Can-you-turn-a-string-into-a-working-symbol-tp4648343.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can you turn a string into a (working) symbol?

2012-11-03 Thread jim holtman

for the second part use 'assign'

 assign(paste0('a', 'a'), 3)
 aa
[1] 3




On Sat, Nov 3, 2012 at 5:31 PM, andrewH ahoer...@rprogress.org wrote:
 Dear folks--

 Suppose I have an expression that evaluates to a string, and that that
 string, were it not a character vector, would be a symbol.  I would like a
 function, call it doppel(), that will take that expression as an argument
 and produce something that functions exactly like the symbol would have if I
 typed it in the place of the function of the expression.  It should go as
 far along the path to evaluation as the symbol would have, and then stop,
 and be available for subsequent manipulation.  For example, if

 aa - 3.1416
 bb  - function(x) {x^2}
 r - 2
 xx - c(aa, bb)

 out - doppel(xx[1])*doppel(xx[2])(r)

 Then out should be 13.3664

 Or similarly, after
 doppel(paste(a,  a,  sep=''))  -  3
 aa

 typing aa should return 3.

 Is there such a function? Can there be?

 I thought as.symbol would do this, but it does not.
 as.symbol (xx[1])*as.symbol (xx[2])(r)
 Error: attempt to apply non-function

 Looking forward to hearing from y'all.--andrewH




 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Can-you-turn-a-string-into-a-working-symbol-tp4648343.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Replacing NAs in long format

2012-11-03 Thread Christopher Desjardins

Hi Jim,
Thank you so much. That does exactly what I want.
Chris

On Sat, Nov 3, 2012 at 1:30 PM, jim holtman jholt...@gmail.com wrote:

  x - read.table(text = idr  schyear year
 +  1   80
 +  1   91
 +  1  10   NA
 +  2   4   NA
 +  2   5   -1
 +  2   60
 +  2   71
 +  2   82
 +  2   93
 +  2  104
 +  2  11   NA
 +  2  126
 +  3   4   NA
 +  3   5   -2
 +  3   6   -1
 +  3   70
 +  3   81
 +  3   92
 +  3  103
 +  3  11   NA, header = TRUE)
   # you did not specify if there might be multiple contiguous NAs,
   # so there are a lot of checks to be made
   x.l - lapply(split(x, x$idr), function(.idr){
 + # check for all NAs -- just return indeterminate state
 + if (sum(is.na(.idr$year)) == nrow(.idr)) return(.idr)
 + # repeat until all NAs have been fixed; takes care of contiguous ones
 + while (any(is.na(.idr$year))){
 + # find all the NAs
 + for (i in which(is.na(.idr$year))){
 + if ((i == 1L)  (!is.na(.idr$year[i + 1L]))){
 + .idr$year[i] - .idr$year[i + 1L] - 1
 + } else if ((i  1L)  (!is.na(.idr$year[i - 1L]))){
 + .idr$year[i] - .idr$year[i - 1L] + 1
 + } else if ((i  nrow(.idr))  (!is.na(.idr$year[i + 1L]))){
 + .idr$year[i] - .idr$year[i + 1L] -1
 + }
 + }
 + }
 + return(.idr)
 + })
  do.call(rbind, x.l)
  idr schyear year
 1.11   80
 1.21   91
 1.31  102
 2.42   4   -2
 2.52   5   -1
 2.62   60
 2.72   71
 2.82   82
 2.92   93
 2.10   2  104
 2.11   2  115
 2.12   2  126
 3.13   3   4   -3
 3.14   3   5   -2
 3.15   3   6   -1
 3.16   3   70
 3.17   3   81
 3.18   3   92
 3.19   3  103
 3.20   3  114
 
 


 On Sat, Nov 3, 2012 at 1:14 PM, Christopher Desjardins
 cddesjard...@gmail.com wrote:
  Hi,
  I have the following data:
 
  data[1:20,c(1,2,20)]
  idr  schyear year
  1   80
  1   91
  1  10   NA
  2   4   NA
  2   5   -1
  2   60
  2   71
  2   82
  2   93
  2  104
  2  11   NA
  2  126
  3   4   NA
  3   5   -2
  3   6   -1
  3   70
  3   81
  3   92
  3  103
  3  11   NA
 
  What I want to do is replace the NAs in the year variable with the
  following:
 
  idr  schyear year
  1   80
  1   91
  1  10   2
  2   4   -2
  2   5   -1
  2   60
  2   71
  2   82
  2   93
  2  104
  2  11   5
  2  126
  3   4   -3
  3   5   -2
  3   6   -1
  3   70
  3   81
  3   92
  3  103
  3  11   4
 
  I have no idea how to do this. What it needs to do is make sure that for
  each subject (idr) that it either adds a 1 if it is preceded by a value
 in
  year or subtracts a 1 if it comes before a year value.
 
  Does that make sense? I could do this in Excel but I am at a loss for how
  to do this in R. Please reply to me as well as the list if you respond.
 
  Thanks!
  Chris
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



 --
 Jim Holtman
 Data Munger Guru

 What is the problem that you are trying to solve?
 Tell me what you want to do, not how you want to do it.


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Replacing NAs in long format

2012-11-03 Thread Christopher Desjardins

I have a similar sort of follow up and I bet I could reuse some of this
code but I'm not sure how.

Let's say I want to create a flag that will be equal to 1 if schyear   = 5
and year = 0 for a given idr. For example

 dat

idr   schyear   year
1 4   -1
1 50
1 61
1 72
2 90
2101
211   2

How could I make the data look like this?

idr   schyear   year   flag
1 4   -1 1
1 50 1
1 61 1
1 72 1
2 90 0
21010
211   2 0


I am not sure how to end up not getting both 0s and 1s for the 'flag'
variable for an idr. For example,

dat$flag = ifelse(schyear = 5  year ==0, 1, 0)

Does not work because it will create:

idr   schyear   year   flag
1 4   -1 0
1 50 1
1 61 0
1 72 0
2 90 0
21010
211   2 0

And thus flag changes for an idr. Which it shouldn't.

Thanks,
Chris


On Sat, Nov 3, 2012 at 5:50 PM, Christopher Desjardins 
cddesjard...@gmail.com wrote:

 Hi Jim,
 Thank you so much. That does exactly what I want.
 Chris


 On Sat, Nov 3, 2012 at 1:30 PM, jim holtman jholt...@gmail.com wrote:

  x - read.table(text = idr  schyear year
 +  1   80
 +  1   91
 +  1  10   NA
 +  2   4   NA
 +  2   5   -1
 +  2   60
 +  2   71
 +  2   82
 +  2   93
 +  2  104
 +  2  11   NA
 +  2  126
 +  3   4   NA
 +  3   5   -2
 +  3   6   -1
 +  3   70
 +  3   81
 +  3   92
 +  3  103
 +  3  11   NA, header = TRUE)
   # you did not specify if there might be multiple contiguous NAs,
   # so there are a lot of checks to be made
   x.l - lapply(split(x, x$idr), function(.idr){
 + # check for all NAs -- just return indeterminate state
 + if (sum(is.na(.idr$year)) == nrow(.idr)) return(.idr)
 + # repeat until all NAs have been fixed; takes care of contiguous
 ones
 + while (any(is.na(.idr$year))){
 + # find all the NAs
 + for (i in which(is.na(.idr$year))){
 + if ((i == 1L)  (!is.na(.idr$year[i + 1L]))){
 + .idr$year[i] - .idr$year[i + 1L] - 1
 + } else if ((i  1L)  (!is.na(.idr$year[i - 1L]))){
 + .idr$year[i] - .idr$year[i - 1L] + 1
 + } else if ((i  nrow(.idr))  (!is.na(.idr$year[i +
 1L]))){
 + .idr$year[i] - .idr$year[i + 1L] -1
 + }
 + }
 + }
 + return(.idr)
 + })
  do.call(rbind, x.l)
  idr schyear year
 1.11   80
 1.21   91
 1.31  102
 2.42   4   -2
 2.52   5   -1
 2.62   60
 2.72   71
 2.82   82
 2.92   93
 2.10   2  104
 2.11   2  115
 2.12   2  126
 3.13   3   4   -3
 3.14   3   5   -2
 3.15   3   6   -1
 3.16   3   70
 3.17   3   81
 3.18   3   92
 3.19   3  103
 3.20   3  114
 
 


 On Sat, Nov 3, 2012 at 1:14 PM, Christopher Desjardins
 cddesjard...@gmail.com wrote:
  Hi,
  I have the following data:
 
  data[1:20,c(1,2,20)]
  idr  schyear year
  1   80
  1   91
  1  10   NA
  2   4   NA
  2   5   -1
  2   60
  2   71
  2   82
  2   93
  2  104
  2  11   NA
  2  126
  3   4   NA
  3   5   -2
  3   6   -1
  3   70
  3   81
  3   92
  3  103
  3  11   NA
 
  What I want to do is replace the NAs in the year variable with the
  following:
 
  idr  schyear year
  1   80
  1   91
  1  10   2
  2   4   -2
  2   5   -1
  2   60
  2   71
  2   82
  2   93
  2  104
  2  11   5
  2  126
  3   4   -3
  3   5   -2
  3   6   -1
  3   70
  3   81
  3   92
  3  103
  3  11   4
 
  I have no idea how to do this. What it needs to do is make sure that for
  each subject (idr) that it either adds a 1 if it is preceded by a value
 in
  year or subtracts a 1 if it comes before a year value.
 
  Does that make sense? I could do this in Excel but I am at a loss for
 how
  to do this in R. Please reply to me as well as the list if you respond.
 
  Thanks!
  Chris
 
  [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.



 --
 Jim Holtman
 Data

Re: [R] Replacing NAs in long format

2012-11-03 Thread arun

Hi,
May be this helps:
dat2-read.table(text=
idr  schyear  year
1    4  -1
1    5    0
1    6    1
1    7    2
2    9    0
2    10    1
2    11  2
,sep=,header=TRUE)

 dat2$flag-unlist(lapply(split(dat2,dat2$idr),function(x) 
rep(ifelse(any(apply(x,1,function(x) x[2]=5  
x[3]==0)),1,0),nrow(x))),use.names=FALSE)
 dat2
#  idr schyear year flag
#1   1   4   -1    1
#2   1   5    0    1
#3   1   6    1    1
#4   1   7    2    1
#5   2   9    0    0
#6   2  10    1    0
#7   2  11    2    0
A.K.




- Original Message -
From: Christopher Desjardins cddesjard...@gmail.com
To: jim holtman jholt...@gmail.com
Cc: r-help@r-project.org
Sent: Saturday, November 3, 2012 7:09 PM
Subject: Re: [R] Replacing NAs in long format

I have a similar sort of follow up and I bet I could reuse some of this
code but I'm not sure how.

Let's say I want to create a flag that will be equal to 1 if schyear   = 5
and year = 0 for a given idr. For example

 dat

idr   schyear   year
1         4           -1
1         5            0
1         6            1
1         7            2
2         9            0
2        10            1
2        11           2

How could I make the data look like this?

idr   schyear   year   flag
1         4           -1     1
1         5            0     1
1         6            1     1
1         7            2     1
2         9            0     0
2        10            1    0
2        11           2     0


I am not sure how to end up not getting both 0s and 1s for the 'flag'
variable for an idr. For example,

dat$flag = ifelse(schyear = 5  year ==0, 1, 0)

Does not work because it will create:

idr   schyear   year   flag
1         4           -1     0
1         5            0     1
1         6            1     0
1         7            2     0
2         9            0     0
2        10            1    0
2        11           2     0

And thus flag changes for an idr. Which it shouldn't.

Thanks,
Chris


On Sat, Nov 3, 2012 at 5:50 PM, Christopher Desjardins 
cddesjard...@gmail.com wrote:

 Hi Jim,
 Thank you so much. That does exactly what I want.
 Chris


 On Sat, Nov 3, 2012 at 1:30 PM, jim holtman jholt...@gmail.com wrote:

  x - read.table(text = idr  schyear year
 +  1       8    0
 +  1       9    1
 +  1      10   NA
 +  2       4   NA
 +  2       5   -1
 +  2       6    0
 +  2       7    1
 +  2       8    2
 +  2       9    3
 +  2      10    4
 +  2      11   NA
 +  2      12    6
 +  3       4   NA
 +  3       5   -2
 +  3       6   -1
 +  3       7    0
 +  3       8    1
 +  3       9    2
 +  3      10    3
 +  3      11   NA, header = TRUE)
   # you did not specify if there might be multiple contiguous NAs,
   # so there are a lot of checks to be made
   x.l - lapply(split(x, x$idr), function(.idr){
 +     # check for all NAs -- just return indeterminate state
 +     if (sum(is.na(.idr$year)) == nrow(.idr)) return(.idr)
 +     # repeat until all NAs have been fixed; takes care of contiguous
 ones
 +     while (any(is.na(.idr$year))){
 +         # find all the NAs
 +         for (i in which(is.na(.idr$year))){
 +             if ((i == 1L)  (!is.na(.idr$year[i + 1L]))){
 +                 .idr$year[i] - .idr$year[i + 1L] - 1
 +             } else if ((i  1L)  (!is.na(.idr$year[i - 1L]))){
 +                 .idr$year[i] - .idr$year[i - 1L] + 1
 +             } else if ((i  nrow(.idr))  (!is.na(.idr$year[i +
 1L]))){
 +                 .idr$year[i] - .idr$year[i + 1L] -1
 +             }
 +         }
 +     }
 +     return(.idr)
 + })
  do.call(rbind, x.l)
      idr schyear year
 1.1    1       8    0
 1.2    1       9    1
 1.3    1      10    2
 2.4    2       4   -2
 2.5    2       5   -1
 2.6    2       6    0
 2.7    2       7    1
 2.8    2       8    2
 2.9    2       9    3
 2.10   2      10    4
 2.11   2      11    5
 2.12   2      12    6
 3.13   3       4   -3
 3.14   3       5   -2
 3.15   3       6   -1
 3.16   3       7    0
 3.17   3       8    1
 3.18   3       9    2
 3.19   3      10    3
 3.20   3      11    4
 
 


 On Sat, Nov 3, 2012 at 1:14 PM, Christopher Desjardins
 cddesjard...@gmail.com wrote:
  Hi,
  I have the following data:
 
  data[1:20,c(1,2,20)]
  idr  schyear year
  1       8    0
  1       9    1
  1      10   NA
  2       4   NA
  2       5   -1
  2       6    0
  2       7    1
  2       8    2
  2       9    3
  2      10    4
  2      11   NA
  2      12    6
  3       4   NA
  3       5   -2
  3       6   -1
  3       7    0
  3       8    1
  3       9    2
  3      10    3
  3      11   NA
 
  What I want to do is replace the NAs in the year variable with the
  following:
 
  idr  schyear year
  1       8    0
  1       9    1
  1      10   2
  2       4   -2
  2       5   -1
  2       6    0
  2       7    1
  2       8    2
  2       9    3
  2      10    4
  2      11   5
  2      12    6
  3       4   -3
  3       5   -2
  3

Re: [R] finding global variables in a function containing formulae

2012-11-03 Thread William Dunlap

 -Original Message-
 From: William Dunlap
 Sent: Saturday, November 03, 2012 11:23 AM
 To: 'Hafen, Ryan P'; Bert Gunter
 Cc: r-help@r-project.org
 Subject: RE: [R] finding global variables in a function containing formulae

 findGlobals must be explicitly ignoring calls to the ~ function.
 You could poke through the source code of codetools and find
 where this is happening.

I looked through some old notes and found you could
disable the special handler for ~ by removing it from
the environment codetools:::collectUsageHandlers:
   findGlobals(function(y)lm(y~x)) # doesn't note 'x' as a global reference
  [1] ~  lm
   tildeHandler - codetools:::collectUsageHandlers[[~]]
   remove(~, envir=codetools:::collectUsageHandlers)
   findGlobals(function(y)lm(y~x)) # notes 'x'
  [1] ~  lm x
   # reinstall ~ handler to get original behavior
   # or detach(package:codetools, unload=TRUE) and reattach
   assign(~, tildeHandler, envir=codetools:::collectUsageHandlers)
   findGlobals(function(y)lm(y~x)) # does not note 'x'
  [1] ~  lm

You still have the false alarm problem.

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com

 Or, if you have the source code for the package you are investigating,
 use sed to change all ~ to %TILDE% and then use findGlobals on
 the resulting source code.  The messages will be a bit garbled but
 should give you a start.  E.g., compare the following two, in which y
 is defined in the function but x is not:
 findGlobals(function(y)lm(y~x))
   [1] ~  lm
findGlobals(function(y)lm(y %TILDE% x))
   [1] lm  %TILDE% x

 You will get false alarms, since in a call like lm(y~x+z, data=dat) 
 findGlobals
 cannot know if dat includes columns called 'x', 'y', and 'z' and the above
 approach errs on the side of reporting the potential problem.

 You could use code in codetools to analyze S code instead of source code
 to globally replace all calls to ~ with calls to %TILDE% but that is more
 work than using sed on the source code.

 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com

  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
  Behalf
  Of Hafen, Ryan P
  Sent: Friday, November 02, 2012 4:28 PM
  To: Bert Gunter
  Cc: r-help@r-project.org
  Subject: Re: [R] finding global variables in a function containing formulae

  Thanks.  That works if I a have the formula expression handy.  But suppose
  I want a function, findGlobalVars() that takes a function as an argument
  and finds globals in it, where I have absolutely no idea what is in the
  supplied function:

  findGlobalVars - function(f) {
 require(codetools)
 findGlobals(f, merge=FALSE)$variables
  }

  findGlobalVars(plotFn1)

  I would like findGlobalVars() to be able to find variables in formulae
  that might be present in f.

  On 11/1/12 1:19 PM, Bert Gunter gunter.ber...@gene.com wrote:

  Does

  ?all.vars
  ##as in
   all.vars(y~x)
  [1] y x

  help?

  -- Bert

  On Thu, Nov 1, 2012 at 11:04 AM, Hafen, Ryan P ryan.ha...@pnnl.gov
  wrote:
   I need to find all global variables being used in a function and
  findGlobals() in the codetools package works quite nicely.  However, I
  am not able to find variables that are used in formulae.  Simply
  avoiding formulae in functions is not an option because I do not have
  control over what functions this will be applied to.

   Here is an example to illustrate:

   library(codetools)

   xGlobal - rnorm(10)
   yGlobal - rnorm(10)

   plotFn1 - function() {
  plot(yGlobal ~ xGlobal)
   }

   plotFn2 - function() {
  y - yGlobal
  x - xGlobal
  plot(y ~ x)
   }

   plotFn3 - function() {
  plot(xGlobal, yGlobal)
   }

   findGlobals(plotFn1, merge=FALSE)$variables
   # character(0)
   findGlobals(plotFn2, merge=FALSE)$variables
   # [1] xGlobal yGlobal
   findGlobals(plotFn3, merge=FALSE)$variables
   # [1] xGlobal yGlobal

   I would like to find that plotFn1 also uses globals xGlobal and
  yGlobal.  Any suggestions on how I might do this?

   __
   R-help@r-project.org mailing list
   https://stat.ethz.ch/mailman/listinfo/r-help
   PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
   and provide commented, minimal, self-contained, reproducible code.

  --

  Bert Gunter
  Genentech Nonclinical Biostatistics

  Internal Contact Info:
  Phone: 467-7374
  Website:
  http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-bio
  statistics/pdb-ncb-home.htm

  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list

Re: [R] Replacing NAs in long format

2012-11-03 Thread William Dunlap

ave() or split-() can make that easier to write, although it
may take some time to internalize the idiom.  E.g.,

   flag - rep(NA, nrow(dat2)) # add as.integer if you prefer 1,0 over 
TRUE,FALSE
   split(flag, dat2$idr) - lapply(split(dat2, dat2$idr), function(d)with(d, 
any(schyear=5  year==0))) 
   data.frame(dat2, flag)
idr schyear year  flag
  1   1   4   -1  TRUE
  2   1   50  TRUE
  3   1   61  TRUE
  4   1   72  TRUE
  5   2   90 FALSE
  6   2  101 FALSE
  7   2  112 FALSE
or
   ave(seq_len(nrow(dat2)), dat2$idr, FUN=function(i)with(dat2[i,], 
any(schyear=5  year==0))) 
  [1] 1 1 1 1 0 0 0
   flag - ave(seq_len(nrow(dat2)), dat2$idr, FUN=function(i)with(dat2[i,], 
any(schyear=5  year==0))) 
   data.frame(dat2, flag)
idr schyear year flag
  1   1   4   -11
  2   1   501
  3   1   611
  4   1   721
  5   2   900
  6   2  1010
  7   2  1120

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of arun
 Sent: Saturday, November 03, 2012 5:01 PM
 To: Christopher Desjardins
 Cc: R help
 Subject: Re: [R] Replacing NAs in long format
 
 Hi,
 May be this helps:
 dat2-read.table(text=
 idr  schyear  year
 1    4  -1
 1    5    0
 1    6    1
 1    7    2
 2    9    0
 2    10    1
 2    11  2
 ,sep=,header=TRUE)
 
  dat2$flag-unlist(lapply(split(dat2,dat2$idr),function(x)
 rep(ifelse(any(apply(x,1,function(x) x[2]=5  
 x[3]==0)),1,0),nrow(x))),use.names=FALSE)
  dat2
 #  idr schyear year flag
 #1   1   4   -1    1
 #2   1   5    0    1
 #3   1   6    1    1
 #4   1   7    2    1
 #5   2   9    0    0
 #6   2  10    1    0
 #7   2  11    2    0
 A.K.
 
 
 
 
 - Original Message -
 From: Christopher Desjardins cddesjard...@gmail.com
 To: jim holtman jholt...@gmail.com
 Cc: r-help@r-project.org
 Sent: Saturday, November 3, 2012 7:09 PM
 Subject: Re: [R] Replacing NAs in long format
 
 I have a similar sort of follow up and I bet I could reuse some of this
 code but I'm not sure how.
 
 Let's say I want to create a flag that will be equal to 1 if schyear   = 5
 and year = 0 for a given idr. For example
 
  dat
 
 idr   schyear   year
 1         4           -1
 1         5            0
 1         6            1
 1         7            2
 2         9            0
 2        10            1
 2        11           2
 
 How could I make the data look like this?
 
 idr   schyear   year   flag
 1         4           -1     1
 1         5            0     1
 1         6            1     1
 1         7            2     1
 2         9            0     0
 2        10            1    0
 2        11           2     0
 
 
 I am not sure how to end up not getting both 0s and 1s for the 'flag'
 variable for an idr. For example,
 
 dat$flag = ifelse(schyear = 5  year ==0, 1, 0)
 
 Does not work because it will create:
 
 idr   schyear   year   flag
 1         4           -1     0
 1         5            0     1
 1         6            1     0
 1         7            2     0
 2         9            0     0
 2        10            1    0
 2        11           2     0
 
 And thus flag changes for an idr. Which it shouldn't.
 
 Thanks,
 Chris
 
 
 On Sat, Nov 3, 2012 at 5:50 PM, Christopher Desjardins 
 cddesjard...@gmail.com wrote:
 
  Hi Jim,
  Thank you so much. That does exactly what I want.
  Chris
 
 
  On Sat, Nov 3, 2012 at 1:30 PM, jim holtman jholt...@gmail.com wrote:
 
   x - read.table(text = idr  schyear year
  +  1       8    0
  +  1       9    1
  +  1      10   NA
  +  2       4   NA
  +  2       5   -1
  +  2       6    0
  +  2       7    1
  +  2       8    2
  +  2       9    3
  +  2      10    4
  +  2      11   NA
  +  2      12    6
  +  3       4   NA
  +  3       5   -2
  +  3       6   -1
  +  3       7    0
  +  3       8    1
  +  3       9    2
  +  3      10    3
  +  3      11   NA, header = TRUE)
    # you did not specify if there might be multiple contiguous NAs,
    # so there are a lot of checks to be made
    x.l - lapply(split(x, x$idr), function(.idr){
  +     # check for all NAs -- just return indeterminate state
  +     if (sum(is.na(.idr$year)) == nrow(.idr)) return(.idr)
  +     # repeat until all NAs have been fixed; takes care of contiguous
  ones
  +     while (any(is.na(.idr$year))){
  +         # find all the NAs
  +         for (i in which(is.na(.idr$year))){
  +             if ((i == 1L)  (!is.na(.idr$year[i + 1L]))){
  +                 .idr$year[i] - .idr$year[i + 1L] - 1
  +             } else if ((i  1L)  (!is.na(.idr$year[i - 1L]))){
  +                 .idr$year[i] - .idr$year[i - 1L] + 1
  +             } else if ((i  nrow(.idr))  (!is.na(.idr$year[i +
  1L]))){
  +                 .idr$year[i] -

Re: [R] Violin plot of categorical/binned data

2012-11-03 Thread Jim Lemon


On 11/04/2012 06:27 AM, Nathan Miller wrote:

Hi,

I'm trying to create a plot showing the density distribution of some
shipping data. I like the look of violin plots, but my data is not
continuous but rather binned and I want to make sure its binned nature (not
smooth) is apparent in the final plot. So for example, I have the number of
individuals per vessel, but rather than having the actual number of
individuals I have data in the format of: 7 values of zero, 11 values
between 1-10, 6 values between 10-100, 13 values between 100-1000, etc. To
plot this data I generated a new dataset with the first 7 values being 0,
representing the 7 values of 0, the next 11 values being 5.5, representing
the 11 values between 1-10, etc. Sample data below.

I can make a violin plot (code below) using a log y-axis, which looks
alright (though I do have to deal with the zeros still), but in its default
format it hides the fact that these are binned data, which seems a bit
misleading. Is it possible to make a violin plot that looks a bit more
angular (more corners, less smoothing) or in someway shows the
distribution, but also clearly shows the true nature of these data? I've
tried playing with the bandwidth adjustment and the kernel but haven't been
able to get a figure that seems to work.

Anyone have some thoughts on this?


Hi Nate,
I'm not exactly sure what you are doing in the data transformation, but 
you can display this type of information as a single polygon for each 
instance (kiteChart) or separate rectangles (battleship.plot).


library(plotrix)
vessels-matrix(c(zero=sample(1:10,5),one2ten=sample(5:20,5),
 ten2hundred=sample(15:36,5),hundred2thousand=sample(10:16,5)),
 ncol=4)
battleship.plot(vessels,xlab=Number of passengers,
 yaxlab=c(Barnacle,Maelstrom,Poopdeck,Seasick,Wallower),
 xaxlab=c(0,1-10,10-100,100-1000))
kiteChart(vessels,xlab=Number of passengers,ylab=Vessel,
 varlabels=c(Barnacle,Maelstrom,Poopdeck,Seasick,Wallower),
 timelabels=c(0,1-10,10-100,100-1000))

Jim

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Replacing NAs in long format

2012-11-03 Thread William Dunlap

Or, even simpler,

 flag - with(dat2, ave(schyear=5  year==0, idr, FUN=any))
 data.frame(dat2, flag)
  idr schyear year  flag
1   1   4   -1  TRUE
2   1   50  TRUE
3   1   61  TRUE
4   1   72  TRUE
5   2   90 FALSE
6   2  101 FALSE
7   2  112 FALSE

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of William Dunlap
 Sent: Saturday, November 03, 2012 5:38 PM
 To: arun; Christopher Desjardins
 Cc: R help
 Subject: Re: [R] Replacing NAs in long format
 
 ave() or split-() can make that easier to write, although it
 may take some time to internalize the idiom.  E.g.,
 
flag - rep(NA, nrow(dat2)) # add as.integer if you prefer 1,0 over 
 TRUE,FALSE
split(flag, dat2$idr) - lapply(split(dat2, dat2$idr), function(d)with(d, 
 any(schyear=5 
 year==0)))
data.frame(dat2, flag)
 idr schyear year  flag
   1   1   4   -1  TRUE
   2   1   50  TRUE
   3   1   61  TRUE
   4   1   72  TRUE
   5   2   90 FALSE
   6   2  101 FALSE
   7   2  112 FALSE
 or
ave(seq_len(nrow(dat2)), dat2$idr, FUN=function(i)with(dat2[i,], 
 any(schyear=5 
 year==0)))
   [1] 1 1 1 1 0 0 0
flag - ave(seq_len(nrow(dat2)), dat2$idr, FUN=function(i)with(dat2[i,],
 any(schyear=5  year==0)))
data.frame(dat2, flag)
 idr schyear year flag
   1   1   4   -11
   2   1   501
   3   1   611
   4   1   721
   5   2   900
   6   2  1010
   7   2  1120
 
 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com
 
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
  Behalf
  Of arun
  Sent: Saturday, November 03, 2012 5:01 PM
  To: Christopher Desjardins
  Cc: R help
  Subject: Re: [R] Replacing NAs in long format
 
  Hi,
  May be this helps:
  dat2-read.table(text=
  idr  schyear  year
  1    4  -1
  1    5    0
  1    6    1
  1    7    2
  2    9    0
  2    10    1
  2    11  2
  ,sep=,header=TRUE)
 
   dat2$flag-unlist(lapply(split(dat2,dat2$idr),function(x)
  rep(ifelse(any(apply(x,1,function(x) x[2]=5 
 x[3]==0)),1,0),nrow(x))),use.names=FALSE)
   dat2
  #  idr schyear year flag
  #1   1   4   -1    1
  #2   1   5    0    1
  #3   1   6    1    1
  #4   1   7    2    1
  #5   2   9    0    0
  #6   2  10    1    0
  #7   2  11    2    0
  A.K.
 
 
 
 
  - Original Message -
  From: Christopher Desjardins cddesjard...@gmail.com
  To: jim holtman jholt...@gmail.com
  Cc: r-help@r-project.org
  Sent: Saturday, November 3, 2012 7:09 PM
  Subject: Re: [R] Replacing NAs in long format
 
  I have a similar sort of follow up and I bet I could reuse some of this
  code but I'm not sure how.
 
  Let's say I want to create a flag that will be equal to 1 if schyear   = 5
  and year = 0 for a given idr. For example
 
   dat
 
  idr   schyear   year
  1         4           -1
  1         5            0
  1         6            1
  1         7            2
  2         9            0
  2        10            1
  2        11           2
 
  How could I make the data look like this?
 
  idr   schyear   year   flag
  1         4           -1     1
  1         5            0     1
  1         6            1     1
  1         7            2     1
  2         9            0     0
  2        10            1    0
  2        11           2     0
 
 
  I am not sure how to end up not getting both 0s and 1s for the 'flag'
  variable for an idr. For example,
 
  dat$flag = ifelse(schyear = 5  year ==0, 1, 0)
 
  Does not work because it will create:
 
  idr   schyear   year   flag
  1         4           -1     0
  1         5            0     1
  1         6            1     0
  1         7            2     0
  2         9            0     0
  2        10            1    0
  2        11           2     0
 
  And thus flag changes for an idr. Which it shouldn't.
 
  Thanks,
  Chris
 
 
  On Sat, Nov 3, 2012 at 5:50 PM, Christopher Desjardins 
  cddesjard...@gmail.com wrote:
 
   Hi Jim,
   Thank you so much. That does exactly what I want.
   Chris
  
  
   On Sat, Nov 3, 2012 at 1:30 PM, jim holtman jholt...@gmail.com wrote:
  
x - read.table(text = idr  schyear year
   +  1       8    0
   +  1       9    1
   +  1      10   NA
   +  2       4   NA
   +  2       5   -1
   +  2       6    0
   +  2       7    1
   +  2       8    2
   +  2       9    3
   +  2      10    4
   +  2      11   NA
   +  2      12    6
   +  3       4   NA
   +  3       5   -2
   +  3       6   -1
   +  3       7    0
   +  3       8    1
   +  3       9    2
   +  3      10    3
   +  3      11   NA, header = TRUE)
     # you did not specify if there might be multiple contiguous NAs,
     #

Re: [R] Replacing NAs in long format

2012-11-03 Thread arun

HI Bill,
It is much simpler.
# with aggregate() and merge()  

res1-with(dat2,aggregate(seq_len(nrow(dat2)),by=list(idr=idr),FUN=function(i) 
with(dat2[i,], any(schyear=5  year ==0
 res2-merge(dat2,res1,by=idr)
 colnames(res2)[4]-flag
 within(res2,{flag-as.integer(flag)})
 #idr schyear year flag
#1   1   4   -1    1
#2   1   5    0    1
#3   1   6    1    1
#4   1   7    2    1
#5   2   9    0    0
#6   2  10    1    0
#7   2  11    2    0


A.K.






- Original Message -
From: William Dunlap wdun...@tibco.com
To: arun smartpink...@yahoo.com; Christopher Desjardins 
cddesjard...@gmail.com
Cc: R help r-help@r-project.org
Sent: Saturday, November 3, 2012 9:21 PM
Subject: RE: [R] Replacing NAs in long format

Or, even simpler,

 flag - with(dat2, ave(schyear=5  year==0, idr, FUN=any))
 data.frame(dat2, flag)
  idr schyear year  flag
1   1       4   -1  TRUE
2   1       5    0  TRUE
3   1       6    1  TRUE
4   1       7    2  TRUE
5   2       9    0 FALSE
6   2      10    1 FALSE
7   2      11    2 FALSE

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of William Dunlap
 Sent: Saturday, November 03, 2012 5:38 PM
 To: arun; Christopher Desjardins
 Cc: R help
 Subject: Re: [R] Replacing NAs in long format
 
 ave() or split-() can make that easier to write, although it
 may take some time to internalize the idiom.  E.g.,
 
    flag - rep(NA, nrow(dat2)) # add as.integer if you prefer 1,0 over 
TRUE,FALSE
    split(flag, dat2$idr) - lapply(split(dat2, dat2$idr), function(d)with(d, 
any(schyear=5 
 year==0)))
    data.frame(dat2, flag)
     idr schyear year  flag
   1   1       4   -1  TRUE
   2   1       5    0  TRUE
   3   1       6    1  TRUE
   4   1       7    2  TRUE
   5   2       9    0 FALSE
   6   2      10    1 FALSE
   7   2      11    2 FALSE
 or
    ave(seq_len(nrow(dat2)), dat2$idr, FUN=function(i)with(dat2[i,], 
any(schyear=5 
 year==0)))
   [1] 1 1 1 1 0 0 0
    flag - ave(seq_len(nrow(dat2)), dat2$idr, FUN=function(i)with(dat2[i,],
 any(schyear=5  year==0)))
    data.frame(dat2, flag)
     idr schyear year flag
   1   1       4   -1    1
   2   1       5    0    1
   3   1       6    1    1
   4   1       7    2    1
   5   2       9    0    0
   6   2      10    1    0
   7   2      11    2    0
 
 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com
 
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
  Behalf
  Of arun
  Sent: Saturday, November 03, 2012 5:01 PM
  To: Christopher Desjardins
  Cc: R help
  Subject: Re: [R] Replacing NAs in long format
 
  Hi,
  May be this helps:
  dat2-read.table(text=
  idr  schyear  year
  1    4  -1
  1    5    0
  1    6    1
  1    7    2
  2    9    0
  2    10    1
  2    11  2
  ,sep=,header=TRUE)
 
   dat2$flag-unlist(lapply(split(dat2,dat2$idr),function(x)
  rep(ifelse(any(apply(x,1,function(x) x[2]=5 
 x[3]==0)),1,0),nrow(x))),use.names=FALSE)
   dat2
  #  idr schyear year flag
  #1   1   4   -1    1
  #2   1   5    0    1
  #3   1   6    1    1
  #4   1   7    2    1
  #5   2   9    0    0
  #6   2  10    1    0
  #7   2  11    2    0
  A.K.
 
 
 
 
  - Original Message -
  From: Christopher Desjardins cddesjard...@gmail.com
  To: jim holtman jholt...@gmail.com
  Cc: r-help@r-project.org
  Sent: Saturday, November 3, 2012 7:09 PM
  Subject: Re: [R] Replacing NAs in long format
 
  I have a similar sort of follow up and I bet I could reuse some of this
  code but I'm not sure how.
 
  Let's say I want to create a flag that will be equal to 1 if schyear   = 5
  and year = 0 for a given idr. For example
 
   dat
 
  idr   schyear   year
  1         4           -1
  1         5            0
  1         6            1
  1         7            2
  2         9            0
  2        10            1
  2        11           2
 
  How could I make the data look like this?
 
  idr   schyear   year   flag
  1         4           -1     1
  1         5            0     1
  1         6            1     1
  1         7            2     1
  2         9            0     0
  2        10            1    0
  2        11           2     0
 
 
  I am not sure how to end up not getting both 0s and 1s for the 'flag'
  variable for an idr. For example,
 
  dat$flag = ifelse(schyear = 5  year ==0, 1, 0)
 
  Does not work because it will create:
 
  idr   schyear   year   flag
  1         4           -1     0
  1         5            0     1
  1         6            1     0
  1         7            2     0
  2         9            0     0
  2        10            1    0
  2        11           2     0
 
  And thus flag changes for an idr. Which it shouldn't.
 
  Thanks,
  Chris
 
 
  On Sat, Nov 3, 2012 at 5:50 PM, Christopher

[R] sqldf Date problem

2012-11-03 Thread Andreas Recktenwald



Dear R-help readers,

i've created a database for quotes data (for 4 years; 2007 -- 2010)  
with the sqldf package. This database contains a column Date in the  
format mm/dd/.


The table in the database is called main.data and the database  
itself Honda. I tried to get the Data just for certain period, say  
from 01/01/2007 until 01/10/2007 with the following code:


sqldf(select * from main.data where Date='01/10/2007' and  
Date='01/01/2007'),

   dbname=Honda)


I get the data for this period for every year(2007,2008,2009,2010) not  
only for 2007. It seems that the year is overlooked and just looked  
for the fitting days and months.


Because I haven't really much experience with sql I decide to send my  
problem to the list.


Many thanks in advance.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Date format conversion from 2012-09-20 to 2012:09:20

2012-11-03 Thread veepsirtt

Hi,
thanks A.K
try this not working

#*
# Load historical data
#**
library('quantmod')
endDate =Sys.Date()
startDate = as.Date(endDate-10, order=ymd)

dataspy = getSymbols(SPY, from = startDate, to=endDate, auto.assign =
FALSE) 

myStDt- startDate
while (myStDt = endDate){
startEndDate-paste(startDate,myStDt,sep=::)
print(dataspy)
dataspy=Cl(dataspy[startEndDate])
#display the subseted data 
print(dataspy)
myStDt=myStDt+1 
}




--
View this message in context: 
http://r.789695.n4.nabble.com/Date-format-conversion-from-2012-09-20-to-2012-09-20-tp4643710p4648327.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] lmPerm p-values and multiple testing

2012-11-03 Thread Pat

Even you used perm=Exact, the maximum observations allowed is only 10.  If
data exceeds this, perm=Prob is used instead of Exact. So, the p-values
are always changed.  The Porb method will approximate the permutation
distribution by randomly exchanging pairs of Y elements.



--
View this message in context: 
http://r.789695.n4.nabble.com/lmPerm-p-values-and-multiple-testing-tp4643219p4648350.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] importing jpeg 2000

2012-11-03 Thread Fred909

Mike, thanks for your answer.

I thought GDAL is a library itself. You suggest underlying libraries. Do you
know where can I find them?
The other option: to convert the jpeg2000 files outside R: is it possible in
R to execute an extern program?

Btw I'm working with OSX Lion and still in the steep learning curve of R :)

Best Fred



--
View this message in context: 
http://r.789695.n4.nabble.com/importing-jpeg-2000-tp4648242p4648329.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] sqldf Date problem

2012-11-03 Thread Andreas Recktenwald


Dear R-help readers,

i've created a database for quotes data (for 4 years; 2007 -- 2010)  
with the sqldf package. This database contains a column Date in the  
format mm/dd/.


The table in the database is called main.data and the database  
itself Honda. I tried to get the Data just for certain period, say  
from 01/01/2007 until 01/10/2007 with the following code:


sqldf(select * from main.data where Date='01/10/2007' and  
Date='01/01/2007'),

   dbname=Honda)


I get the data for this period for every year(2007,2008,2009,2010) not  
only for 2007. It seems that the year is overlooked and just looked  
for the fitting days and months.


Because I haven't really much experience with sql I decide to send my  
problem to the list.


Many thanks in advance.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Changing Date Variables as Continuous Variables

2012-11-03 Thread hoguejm

I am very new to R, so I apologize if this question is trivial.

I have a row in my data of dates in the format mm/dd/; about 3500 rows. 

I am using this variable in a logistic regression model, and need to treat
it as continuous, not a factor as r has decided it is.

I tried the as.numeric function but it resulted in all NA's and the message:
NAs introduced by coercion 

If anyone knows a solution, I would greatly appreciate it. 

Cheers,
Jake



--
View this message in context: 
http://r.789695.n4.nabble.com/Changing-Date-Variables-as-Continuous-Variables-tp4648354.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Changing Date Variables as Continuous Variables

2012-11-03 Thread jim holtman

Here is how to convert your column of factors into Dates:

 x - read.table(text = '2/10/2011
+ 2/20/2011
+ 3/4/2011')
 # read in as factors
 str(x)
'data.frame':   3 obs. of  1 variable:
 $ V1: Factor w/ 3 levels 2/10/2011,2/20/2011,..: 1 2 3
 # convert to Date
 x$date - as.Date(as.character(x$V1), format = %m/%d/%Y)
 x
 V1   date
1 2/10/2011 2011-02-10
2 2/20/2011 2011-02-20
3  3/4/2011 2011-03-04
 str(x)
'data.frame':   3 obs. of  2 variables:
 $ V1  : Factor w/ 3 levels 2/10/2011,2/20/2011,..: 1 2 3
 $ date: Date, format: 2011-02-10 2011-02-20 2011-03-04


On Sat, Nov 3, 2012 at 8:09 PM, hoguejm hogu...@gmail.com wrote:
 I am very new to R, so I apologize if this question is trivial.

 I have a row in my data of dates in the format mm/dd/; about 3500 rows.

 I am using this variable in a logistic regression model, and need to treat
 it as continuous, not a factor as r has decided it is.

 I tried the as.numeric function but it resulted in all NA's and the message:
 NAs introduced by coercion 

 If anyone knows a solution, I would greatly appreciate it.

 Cheers,
 Jake



 --
 View this message in context: 
 http://r.789695.n4.nabble.com/Changing-Date-Variables-as-Continuous-Variables-tp4648354.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sqldf Date problem

2012-11-03 Thread jim holtman

Most likely your Date is either a character or a factor (you need to
provide an 'str' of the dataframe).  You are therefore most likely
doing a character compare and that is the reason for your problem.
You need to convert to a character string of the format -MM-DD to
do the correct character comparison.

##
 x - data.frame(Date = paste0('1/', 1:31, '/2011'))
 str(x)
'data.frame':   31 obs. of  1 variable:
 $ Date: Factor w/ 31 levels 1/1/2011,1/10/2011,..: 1 12 23 26 27
28 29 30 31 2 ...
 x
Date
1   1/1/2011
2   1/2/2011
3   1/3/2011
4   1/4/2011
5   1/5/2011
6   1/6/2011
7   1/7/2011
8   1/8/2011
9   1/9/2011
10 1/10/2011
11 1/11/2011
12 1/12/2011
13 1/13/2011
14 1/14/2011
15 1/15/2011
16 1/16/2011
17 1/17/2011
18 1/18/2011
19 1/19/2011
20 1/20/2011
21 1/21/2011
22 1/22/2011
23 1/23/2011
24 1/24/2011
25 1/25/2011
26 1/26/2011
27 1/27/2011
28 1/28/2011
29 1/29/2011
30 1/30/2011
31 1/31/2011

 require(sqldf)
 # not correct because of character compares
 sqldf('select * from x where Date  1/13/2011 and Date  1/25/2011')
Date
1   1/2/2011
2  1/14/2011
3  1/15/2011
4  1/16/2011
5  1/17/2011
6  1/18/2011
7  1/19/2011
8  1/20/2011
9  1/21/2011
10 1/22/2011
11 1/23/2011
12 1/24/2011
 # convert the date to /MM/DD for character compares
 x$newDate - as.character(as.Date(as.character(x$Date), format = %m/%d/%Y))
 # now do the select
 sqldf('select * from x where newDate between 2011-01-13 and 2011-01-25')
DatenewDate
1  1/13/2011 2011-01-13
2  1/14/2011 2011-01-14
3  1/15/2011 2011-01-15
4  1/16/2011 2011-01-16
5  1/17/2011 2011-01-17
6  1/18/2011 2011-01-18
7  1/19/2011 2011-01-19
8  1/20/2011 2011-01-20
9  1/21/2011 2011-01-21
10 1/22/2011 2011-01-22
11 1/23/2011 2011-01-23
12 1/24/2011 2011-01-24
13 1/25/2011 2011-01-25


On Sat, Nov 3, 2012 at 4:22 PM, Andreas Recktenwald
a.recktenw...@mx.uni-saarland.de wrote:
 Dear R-help readers,

 i've created a database for quotes data (for 4 years; 2007 -- 2010) with the
 sqldf package. This database contains a column Date in the format
 mm/dd/.

 The table in the database is called main.data and the database itself
 Honda. I tried to get the Data just for certain period, say from
 01/01/2007 until 01/10/2007 with the following code:

 sqldf(select * from main.data where Date='01/10/2007' and
 Date='01/01/2007'),
dbname=Honda)


 I get the data for this period for every year(2007,2008,2009,2010) not only
 for 2007. It seems that the year is overlooked and just looked for the
 fitting days and months.

 Because I haven't really much experience with sql I decide to send my
 problem to the list.

 Many thanks in advance.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Changing Date Variables as Continuous Variables

2012-11-03 Thread Rolf Turner



There is a phenomenon that occurs here which, it seems to me,
merits some emphasis.  The glm() function appears to be perfectly
willing to take a variable of class Date and treat it as a continuous
variable.

Apparently what it does (on the basis of one little experiment that I
did) is convert the vector of dates to a vector of Julian date values,
taking the origin to be the minimum value of the Date vector.

I would have thought that the user would have to effect this conversion
her/himself.  But not so; the software is so cleverly written that it's all
handled for the user.

The designers of R have thought of just about *everything*!

We think it 'mazing! :-)

cheers,

Rolf Turner

On 04/11/12 14:58, jim holtman wrote:

Here is how to convert your column of factors into Dates:


x - read.table(text = '2/10/2011

+ 2/20/2011
+ 3/4/2011')

# read in as factors
str(x)

'data.frame':   3 obs. of  1 variable:
  $ V1: Factor w/ 3 levels 2/10/2011,2/20/2011,..: 1 2 3

# convert to Date
x$date - as.Date(as.character(x$V1), format = %m/%d/%Y)
x

  V1   date
1 2/10/2011 2011-02-10
2 2/20/2011 2011-02-20
3  3/4/2011 2011-03-04

str(x)

'data.frame':   3 obs. of  2 variables:
  $ V1  : Factor w/ 3 levels 2/10/2011,2/20/2011,..: 1 2 3
  $ date: Date, format: 2011-02-10 2011-02-20 2011-03-04


On Sat, Nov 3, 2012 at 8:09 PM, hoguejm hogu...@gmail.com wrote:

I am very new to R, so I apologize if this question is trivial.

I have a row in my data of dates in the format mm/dd/; about 3500 rows.

I am using this variable in a logistic regression model, and need to treat
it as continuous, not a factor as r has decided it is.

I tried the as.numeric function but it resulted in all NA's and the message:
NAs introduced by coercion

If anyone knows a solution, I would greatly appreciate it.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can you turn a string into a (working) symbol?

2012-11-03 Thread andrewH

Ah!  Excellent! That will be most useful.  And sorry about the typo. 

I found another function in a different discussion that also seems to work,
at least in most cases I have tried.  I do not at all understand the
difference between the two.
doppel - function(x) {eval(parse(text=x))

However, neither one seems to work on the left hand side of a -, a -, 
or an =.

Again, my thanks.--andrewH



--
View this message in context: 
http://r.789695.n4.nabble.com/Can-you-turn-a-string-into-a-working-symbol-tp4648343p4648365.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Can you turn a string into a (working) symbol?

2012-11-03 Thread andrewH

Yes, the assign command goes a little way toward what what I was hoping for. 
But it requires a different syntax, and it does not in general let you use
quoted expressions that you could  use with other assignment operators. For
instance, 

 DD - 1:3
 assign(DD[2], 5)
 DD
[1] 1 2 3

So I am still looking for a function that produces an output that is fully
equivalent to the string without quotation marks.  Or for a definite
statement that no such function can exist.

Thanks so much for your attention to this problem.
andrewH




--
View this message in context: 
http://r.789695.n4.nabble.com/Can-you-turn-a-string-into-a-working-symbol-tp4648343p4648366.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

63 matches

Mail list logo