[R] timeseries highlighting

2012-01-30 Thread Alexy Khrabrov
I'd like to plot a given time series in a primary color but highlight
a segment of it in a different color.  Is there an elegant way to do
it?

A+

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] rbind a heterogeneous row

2011-03-22 Thread Alexy Khrabrov
I have a dataframe with many rows like this:

 df
 X1   X2   X3   X4   X5   X6   X7 week d
sim1 FALSE TRUE TRUE TRUE TRUE TRUE TRUE1 0.3064985

sim1 is the rowname, X1..X7,week,d are the column names.  X1..X7 are factors, 
booleans in this case.

I need to add another row, represented by the following list:

list(rep(T,7),5,0.0)

-- i.e, TRUE in all boolean columns, 5 in the week column, 0.0 in d.  The name 
of the new row is dreps.

I used to add fully numeric rows as follows:

df1 - rbind(df,dreps=c(all numbers))

But if I do this here,

df1 - rbind(df,dreps=c(rep(T,7),5,0.0)) -- booleans are converted to 0/1, 
which is not what I want.

What's the recommended way to specify and bind a heterogeneous row above?

Cheers,
Alexy
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] applying to dataframe rows

2011-03-15 Thread Alexy Khrabrov
How do I apply a function to every row of a dataframe most naturally?  
Specifically, I'd like to filter out any row which contains an Inf in any 
column.  Since all columns are numeric, I guess max should work on a row...

-- Alexy
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Developing a web crawler

2011-03-03 Thread Alexy Khrabrov

On Mar 3, 2011, at 4:22 AM, antujsrv wrote:
 
 I wish to develop a web crawler in R.

As Rex said, there are faster languages, but R string processing got better due 
to the stringr package (R Journal 2010-2).  When Hadley is done with it, it 
will be like having it all in R!

-- Alexy
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] the features of the truth

2011-03-01 Thread Alexy Khrabrov
This is really a statistics problem, so I wonder which R packages can be 
employed best to solve and visualize it.

I run a lot of simulations to approach the truth.  The truth is a result of 
very complex computations, and is a real number.  The closer it is to 0, the 
truthier it is.

Each simulations has a set of features, some of which are not available for all 
simulations.  Some of the features are numeric (week), some boolean (utility), 
while others are factors.

Each simulation has the final value, the dm column in the data frame.  The 
names of the simulations are rownames of the data frame, and feature names are 
the column names.  Here's the dataframe:

http://dl.dropbox.com/u/9300701/Data/sf.dm.pos.r

You read it in R with

sf - read.table(sf.dm.pos.r)

Seeking the truth questions:

-- What kinds of GLM and other models can we run to determine which features 
are most contributing to the truth, i.e. making dm closer to 0?

-- What kind of clustering can emphasize the most contributing features?

-- What kind of visualizations can be used to make it clear which features 
affect the truth the most, and in which combinations?  What kind of color 
visualizations are there to make the truth even clearer?

Cheers,
Alexy

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] .libPaths(new) stopped working in 2.10

2009-12-06 Thread Alexy Khrabrov
I used to have the following in my .Rprofile:

if (length(.libPaths())==1)
.libPaths(paste(Sys.getenv(HOME),/Library/R/,paste(R.version$major,as.integer(R.version$minor),sep='.'),/library,sep=''))

-- and it added my user-defined library directory.  Then I installed
packages there, so during an upgrade, I'd know exactly which packages
I installed and auto-upgrade with a script.

However, in R 2.10's Mac OSX GUI, .libPaths(new) does nothing...  Did
its behavior change?

Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] .libPaths(new) stopped working in 2.10

2009-12-06 Thread Alexy Khrabrov
I have the .libPaths() defined in my .Rprofile, and it stopped having any 
effect.  Even if I try to do it in the REPL, as:

 x - 
 paste(Sys.getenv(HOME),/Library/R/,paste(R.version$major,as.integer(R.version$minor),sep='.'),/library,sep='')
 x
[1] /Users/alexyk/Library/R/2.10/library
 .libPaths(x)
 .libPaths()
[1] /Library/Frameworks/R.framework/Resources/library

Nothing GUI'sh.
(Changing back to r-help as it's by no means obvious yet it's a Mac-only 
problem.)
Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] .libPaths(new) stopped working in 2.10

2009-12-06 Thread Alexy Khrabrov
On Sun, Dec 6, 2009 at 3:04 PM, Duncan Murdoch murd...@stats.uwo.ca wrote:
 Does that directory exist?  .libPaths(foobar) silently does nothing,
 because (on my system) foobar is not a directory. (This is documented
 behaviour, though it would perhaps be friendlier if it gave a warning when
 it dropped a requested addition.)

 I don't think this is new behaviour...

Indeed.  Thanks for the clarification!  After mkdir it does add it to the list.
Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] comma-separated thousands in numbers on plot axes?

2009-06-28 Thread Alexy Khrabrov
How can I make R separate thousands, millions, etc., on the plot axes,
with commas?

Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] compressing the plot's white space

2009-06-27 Thread Alexy Khrabrov
I need to fit a graph into a column of a 2-column paper.  I found that
just specifying width and height parameters (3.2in x 3.5in) to plot
doesn't decrease the fonts of the main title, axis titles, and
labeling numbers, and tick sizes.  So I have to add cex to all labels
and titles and manage ticks.  However, I can't decrease the space
between axis label and numbers on ticks.  Is there a way to place
those numbers inside the plot, and/or explicitly remove most of the
space between the numbers and the axis title?  Also, how should I
specify the margins to achieve best white space elimination?  On
Quartz, decreasing margins seem to squeeze the titles properly, on
postscript device, the axis titles simply move outside the plot.

Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] sorting by creation time in ls()

2009-03-17 Thread Alexy Khrabrov
When trying to remember what did I do in the session, especially after  
coming back to it after a few days, I'd like to mimic Unix's ls -ltrh  
-- does R retain the timing a certain variable is created?  If not,  
would it make a useful addition, to have ls with an option to sort by  
creation time?


Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] factors to integers preserving value in a dataframe

2009-02-27 Thread Alexy Khrabrov
I want to produce a dataframe with integer columns for elements of  
string pairs:


pairs - c(10 21,23 45)
pairs.split - lapply(pairs,function(x)strsplit(x, ))
pdf - as.data.frame(pairs.split)
names(pdf) - c(p,q)

-- at this point things look good, except the columns are factors, as  
I didn't change the default stringsAsFactors parameter to the  
as.data.frame.


Now if I want to convert columns to integers, I get

 typeof(pdf$p)
[1] integer
 pdf$p
[1] 10 21
Levels: 10 21
 as.integer(pdf$p)
[1] 1 2

-- being factor levels instead of the original values.  I could have  
used stringsAsFactors=F and then convert the strings to integers with  
as.integer all the same; what other ways are there -- e.g., is there a  
way to convert integer-looking factors to integers directly, without  
substituting levels?


Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] accessing and preserving list names in lapply

2009-02-26 Thread Alexy Khrabrov
Sometimes I'm iterating over a list where names are keys into another  
data structure, e.g. a related list.  Then I can't use lapply as it  
does [[]] and loses the name.  Then I do something like this:


do.one - function(ldf) { # list-dataframe item
  key - names(ldf)
  meat - ldf[[1]]
  mydf - some.df[[key]] # related data structure
  r.df - cbind(meat,new.column=computed)
  r - list(xxx=r.df)
  names(r) - key
  r
}

then if I operate on the list L of those ldf's not as lapply(L,...), but

res - lapply(1:length(L),do.one)

Can this procedure be simplified so that names are preserved?   
Specifically, can the xxx=..., xxx - key part be eliminated -- how  
can we have a variable on the left-hand side of list(lhs=value)?


Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] accessing and preserving list names in lapply

2009-02-26 Thread Alexy Khrabrov

res - lapply(1:length(L),do.one)


Actually, I do

res - lapply(:length(L),function(x)do.one(L[x]))

-- this is the price of needing the element's name, so I have to both  
make do.one extract the name and the meat separately inside, and  
lapply becomes ugly.  Yet the obvious alternatives -- extracting the  
names separately, attaching them back into list elements, etc., -- are  
even uglier.  Something pretty? :)


Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] growing dataframes with rbind

2009-02-24 Thread Alexy Khrabrov

I'm growing a large dataframe by composing new rows and then doing

row - compute.new.row.somehow(...)
d - rbind(d,row)

Is this a fast/preferred way?
Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] 1.095e+09 for integers

2009-02-22 Thread Alexy Khrabrov
I've had a very long file written out by R with write.table, with  
fields of time values, converted from POSIXlt as.numeric.  Among 2.5  
million values, very few had 6 trailing zeroes, and those were output  
in scientific notation as in the subject.  Is this the default  
behavior for long integers, and how can it be turned off (with all  
digits for any integer field in write.table)?  This is important to  
interoperate with other languages through such text dumps, as some do  
not expect scientific notation for integers, only for floats.


Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] name scoping within dataframe index

2009-01-26 Thread Alexy Khrabrov
Every time I have to prefix a dataframe column inside the indexing  
brackets with the dataframe name, e.g.


df[df$colname==value,]

-- I am wondering, why isn't there an R scoping rule that search  
starts with the dataframe names, as if we'd said


with(df, df[colname==value,])

-- wouldn't that be a reasonable default to prepend to the name search  
path?


Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] name scoping within dataframe index

2009-01-26 Thread Alexy Khrabrov

On 1/26/2009 1:46 PM, Alexy Khrabrov wrote:
Every time I have to prefix a dataframe column inside the indexing   
brackets with the dataframe name, e.g.

df[df$colname==value,]
-- I am wondering, why isn't there an R scoping rule that search   
starts with the dataframe names, as if we'd said

with(df, df[colname==value,])
-- wouldn't that be a reasonable default to prepend to the name  
search  path?


If you did that, it would be quite difficult to get at a colname  
variable that *isn't* the column of df.  It would be something like


df[get(colname, parent.frame()) == value,]


Actually, what I propose is  a special search rule which simply looks  
at the enclosing dataframe.name[...] outside the brackets and looks up  
the columns first.


It would break legacy code which used the column names identical to  
variables in this context, but there's probably other ideas to enhance  
R readability which would break legacy code.  Perhaps when the next  
major overhaul occurs, this is something folks can voice opinions  
about.  I find the need for inner prefixing quite unnatural, FWIW.


Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] name scoping within dataframe index

2009-01-26 Thread Alexy Khrabrov


On Jan 26, 2009, at 2:12 PM, Duncan Murdoch wrote:

df[get(colname, parent.frame()) == value,]
Actually, what I propose is  a special search rule which simply  
looks  at the enclosing dataframe.name[...] outside the brackets  
and looks up  the columns first.


Yes, I understood that, and I explained why it would be a bad idea.


Well this is the case in all programming languages with scoping where  
inner-scope variables override the outer ones.  Usually it's solved  
with prefixing with the outer scope, outercsope.name or  
outerscope::name or so.  So it only underscores the need to improve  
scoping access in R.


Dataframe column names belong to the dataframe object and the natural  
thing would be to enable easy access to naming; you'd need to apply an  
extra effort to access an overridden unrelated external variable.   
Again, just an analogy from other programming languages.


Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Functional pattern-matching in R

2008-10-29 Thread Alexy Khrabrov
I found there's a very good functional set of operations in R, such as  
apply family, Hadley Wickham's lovely plyr, etc.  There's even a  
Reduce (a.k.a. fold).  Now I wonder how can we do pattern-matching?


E.g., now I split dimensions like this:

m - dim(V)[1] # R
n - dim(V)[2]  # still R

While even Matlab allows for

[m,n] = size(V) % MATLAB!

Ideally I'd be able to say,

x,y - dim(V)

-- where .,. is some magic needed.

Similarly, to break lists, we'd need, in a MLish notation,

match L with
| head::tail = ...
| () = ;

What can be done in R now to simulate it, and/or how Rish is it to add  
something like that?


Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] svm models in a loop

2008-10-12 Thread Alexy Khrabrov


On Oct 12, 2008, at 2:26 AM, Prof Brian Ripley wrote:


You need to use substitute() on the call.  Something like

sapply(1:5,function(i)
  eval(substitute(svm(person_oid ~ ., data=zrr[1:N,]),  
list(N=100*i))

  )


Thanks!


On Sun, 12 Oct 2008, Alexy Khrabrov wrote:

I want to train svm models on increasingly large training data  
subsets of some zrr as follows:


m - sapply(1:5,function(i)  
svm(person_oid~.,data=zrr[1:100*i,]))# (*)


However, when I inspect m[1], it literally shows


m[1]

[[1]]
svm(formula = person_oid ~ ., data = zrr[1:N, ])


I suspect it shows '100*i' not 'N', but in the absence of a  
reproducible example, I cannot check.


Exactly -- I've mixed in an attempt to define N in sapply(...,  
function(i) { N - 100*i, ...}), but now see I need to do it in  
substitute().


Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] subsetting dataframe by rownames to be excluded

2008-10-11 Thread Alexy Khrabrov
Is there a way to select a subset of a dataframe consisting of all  
those rows with rownames *except* from a subset of rownames to be  
excluded?  Example:


 a - data.frame(x=1:10,y=10:1)
 a - a[order(a$y),] # to make rownames differ visually

 a[8,]
  x y
3 3 8

 a[8,]
  x y
8 8 3

 a[-8,]
x  y
10 10  1
9   9  2
8   8  3
7   7  4
6   6  5
5   5  6
4   4  7
2   2  9
1   1 10

 a[-8,]
Error in -8 : invalid argument to unary operator

-- is there a similar exclusion operator or simple way?  So far the  
best I can do is


 a[setdiff(rownames(a),8),]
x  y
10 10  1
9   9  2
7   7  4
6   6  5
5   5  6
4   4  7
3   3  8
2   2  9
1   1 10

Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] svm models in a loop

2008-10-11 Thread Alexy Khrabrov
I want to train svm models on increasingly large training data subsets  
of some zrr as follows:


 m - sapply(1:5,function(i)  
svm(person_oid~.,data=zrr[1:100*i,]))# (*)


However, when I inspect m[1], it literally shows

 m[1]
[[1]]
svm(formula = person_oid ~ ., data = zrr[1:N, ])

-- as opposed to

 m1 - svm(person_oid~.,data=zrr[1:100,])
 m1
 m1

Call:
svm(formula = person_oid ~ ., data = zrr[1:100, ])
... -- actual parameters

How do I force actual model evaluation in (*) ?

Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R/OCaml?

2008-10-09 Thread Alexy Khrabrov
Did anyone try to write R extensions in OCaml?  What would it entail  
to enable it?


Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] namespaces

2008-10-02 Thread Alexy Khrabrov
I'd like to control my namespace thoroughly, separated by task.  Is  
there a way, in R session, to introduce namespaces for tasks  
dynamically and switch them as needed?  Or, is there a combination of  
load/save workspace steps which can facilitate this?


Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] namespaces

2008-10-02 Thread Alexy Khrabrov
Yes, I could prefix everything with taskN$, but that's boring!  Can I  
now attach and detach tasks to have one prefix-less, since they look  
like dataframes?  What does proto buy us?


To respond to Uwe here as well, I want dynamic task switching and  
*also* an ability to load and save task from their workspaces into  
their namespaces in the same R session.


Cheers,
Alexy

On Oct 2, 2008, at 11:19 AM, Gabor Grothendieck wrote:


You could have an environment for each task and place your objects for
each task in its environment.

Note that this is getting close to object oriented ideas where each
enviroment/task
is an object containing your R variables and methods and the proto  
package

can be used to facilitate that:

library(proto)
task1 - proto(var = 0, addx = function(this, x) this$var - this 
$var + x)

task1$var # 0
task1$addx(3)
task1$var # 3

On Thu, Oct 2, 2008 at 11:03 AM, Alexy Khrabrov  
[EMAIL PROTECTED] wrote:
I'd like to control my namespace thoroughly, separated by task.  Is  
there a
way, in R session, to introduce namespaces for tasks dynamically  
and switch
them as needed?  Or, is there a combination of load/save workspace  
steps

which can facilitate this?

Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] partitioning vectors of intervals

2008-09-27 Thread Alexy Khrabrov
I have two pairs of time intervals: coarse- and fine-grained.  They're  
components of their respective dataframes, looking like,


coarse:endtimestarttime
1t1_end t1_start
 2   t2_end t2_start
...

fine: is the same, except that its intervals presumably fall into the  
coarse's enclosing ones.


The problem is to partition the fine intervals into the coarse ones,  
adding a list to each row showing which fine intervals fall in it, like


coarse: ... fine
1[1,2]
2[3,4,5]

Is there a functional way to do this?
Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] 10-minute presentation of R as a machine learning platform

2008-09-26 Thread Alexy Khrabrov
Greetings -- I'd like to present R to our university research group as  
a viable platform to do machine learning applications for human  
behavior modeling.  The actual research will be further specialized,  
but general activities -- acquiring/interfacing with the data,  
specifying/learning a model, detecting mean behaviors vs outliers,  
and of course visualizing it all, are perfectly done using R base and  
add-on packages.  I'm thinking of sticking some timeseries data into a  
MySQL database (real world), extracting it into an R session, and  
showing things emphasizing R strengths.  Do folks have some similar  
sessions saved, and what would you recommend to do in just the 10  
minutes I'll have?  Especially an example of bedazzling graphics which  
can be produced from models of data, 3d, colorful, and rotatable with  
rgl! :)


Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] creating horizontal dataframes with column names

2008-09-16 Thread Alexy Khrabrov
Greetings -- in order to write back to SQL databases, one needs to  
create a dataframe with values.  I can get column names of an existing  
table with sqlColumns.  Say I have a vector of values (if they're all  
the same type), or a list (if different).  How do I create a dataframe  
with column names given by my sqlColumns?  To make it concrete, how do  
we make a dataframe


A B C
1 2 3

out of

column.names - LETTERS[1:3]
values - 1:3

?
Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] creating horizontal dataframes with column names

2008-09-16 Thread Alexy Khrabrov

Exactly -- also found creating horizontal vector helps:

 df - data.frame(matrix(1:5,nrow=1))
 colnames(df) - LETTERS[1:5]
 df
  A B C D E
1 1 2 3 4 5

Thanks,
Alexy

On Sep 17, 2008, at 1:17 AM, Moshe Olshansky wrote:


If df is your data.frame, then
colnames(df) - c(col1,Col2,COL3)



--- On Wed, 17/9/08, Alexy Khrabrov [EMAIL PROTECTED] wrote:


From: Alexy Khrabrov [EMAIL PROTECTED]
Subject: [R] creating horizontal dataframes with column names
To: [EMAIL PROTECTED]
Received: Wednesday, 17 September, 2008, 1:52 PM
Greetings -- in order to write back to SQL databases, one
needs to
create a dataframe with values.  I can get column names of
an existing
table with sqlColumns.  Say I have a vector of values (if
they're all
the same type), or a list (if different).  How do I create
a dataframe
with column names given by my sqlColumns?  To make it
concrete, how do
we make a dataframe

A B C
1 2 3

out of

column.names - LETTERS[1:3]
values - 1:3

?
Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained,
reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] splitting time vector into days

2008-09-09 Thread Alexy Khrabrov
Greetings -- I have a dataframe a with one element a vector, time, of  
POSIXct values.  What's a good way to split the data frame into  
periods of a$time, e.g. days, and apply a function, e.g. mean, to some  
other column of the dataframe, e.g. a$value?


Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] dealing with NAs in time series

2008-09-05 Thread Alexy Khrabrov
Certain timeseries I have had outliers, which I removed by assigning  
NA to their positions.  Now acf() refuses to go to work.  What's the  
right way to remove outliers from ts objects, and what are teh  
standard ways to interpolate NAs in them?


Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] annotating objects in workspace

2008-09-05 Thread Alexy Khrabrov
Is there a way to associate descriptions with the objects in the  
workspace, and later retrieve them to know what the object was created  
for?


Thanks,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] modeling interval data, a.k.a. irregular timeseries

2008-09-03 Thread Alexy Khrabrov

Greetings -- I've got some sensor data of the form

t1_1, t1_2
t2_1, t2_2
...
tN_1,tN_2

-- time intervals measuring starts and stops of sensor activity.  I'd  
like to see whether there's any regularity in it.  Seems natural to  
consider these data timeseries -- except most of the timeseries  
packages and models assume regular ones, with a fixed frequency.  

I wonder what's a good way to apply existing regular timeseries  
packages to these data, and perhaps try some others?  I like David  
Stoffer's book a lot, yet he uses R's own ts methods (with some  
extras).  I also like the zoo package, which allows for irregular  
timeseries, yet I'm not sure how to apply the usual models to zoo  
objects -- even though zoo strives to be compatible with ts...  Is zoo  
directly usable for ts-like time domain and spectral analysis as per  
Stoffer?


Another way I was pondering is to map the above to a an artificial  
index 1:n and consider it multivariate timeseries.  Is it something  
done in irregular timeseries analysis?


Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] figure margins too large for a barplot in png, pdf ok

2008-05-07 Thread Alexy Khrabrov

On May 6, 2008, at 10:30 PM, Prof Brian Ripley wrote:

What were H  W?  For png() they are (by default) in pixels, for  
pdf() in inches.


You haven't told us your OS, but I guess Mac OS.  Please update to R  
2.7.0: that offers you two new png() devices for higher-quality  
plots, and various other improvements.  (If you want to use the  
cairo-based png() at other than 72dpi, use R-patched or remember  
that the pointsize is mistakenly in pixels.)


For some reason, R 2.6.2 on Mac OSX (indeed) stopped doing png at all,  
even for plot(1:10).  I've been using a postcard printer with A5 size  
in between, was wondering if that affected things -- and changed that  
in Page Setup to US Letter, to no avail.  H an W were 600 x 800,  
resp., and most importantly this same code worked with R 2.6.0 before...


In any case, simply went and installed the 2.7.0, recompiled littler,  
and it all works fine now, just as before.  (Hope the 4/22 build for  
Mac OSX incorporates Simon's recentmost fixes.)  Thanks!


Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] figure margins too large for a barplot in png, pdf ok

2008-05-06 Thread Alexy Khrabrov
I've used to have a script with a barplot command it in, preceded by a  
png:


png(graph.file,height=H,width=W)
barplot(t,names.arg=breaks[2:(length(t)+1)],tck=gridlines)

-- worked before R 2.6.2.  When I tried it in R 2.6.2, which I have  
for a while but didn't run with that script, it complained, the  
margins too large, and I've googled the messages from our list where  
neither height nor width had been specified.  Yet here they are  
specified here.


Calling same from R GUI draws the nice barplot in Quartz, and  
replacing png with pdf or postscript does fine.  Other graphs work  
fine with png.  Jpeg also complains; when I try various values for H  
or W, still the same -- how can I subdue the png?


Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] preallocating matrices and rda read-back objects

2008-04-09 Thread Alexy Khrabrov
I've read in Phil Spector's new book that it's a good idea to  
preallocate a big matrix, like

u - matrix(0,nrow,ncol) # (1)

Now, I read contents of a huge matrix from a Fortran binary dump.

u - readBin(con,what=double,n=nrow*ncol) # (2)

If I do (1) and then (2), u is a vector, obviously it's either  
reallocated or its matrix nature is lost -- overridden?  overwritten?

Instead, I do it now as

u -  
matrix(readBin(con,what=double,n=nrow*ncol),nrow=nrow,ncol=ncol) # (3)

What's going on with memory management here and what's the right way  
to make it efficient -- and how to preallocate?

After that, I'm saving u as R binary object in an rda file.  Does it  
make sense to preallocate u before reading it back now from the rda  
file?

Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] reading in a Fortran binary array

2008-04-08 Thread Alexy Khrabrov
Greetings -- I'd like to avoid converting a Fortran array of floats  
into ASCII and back reading it in R.  Furthermore it's much faster to  
dump large arrays in binary, as they take up much less space with full  
precision -- many decimal points take up many bytes in ASCII versus  
four or eight per float in raw. Is there a way to read such a Fortran  
unformatted file back into R?

Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] exporting a split list

2007-11-27 Thread Alexy Khrabrov
Using wk - with(d, split(word, kind)), I get the following class table:

wk$`1`
[1] a bra ...  # (*)

wk$`10`
ca dabra ...

Now I need to export it in the following format:

classnum_members   examples
1  23   a bra ...
104 ca dabra

For each class C such as `1`, I need to print the number of members,  
length(wk[[C]]), and show N examples as sample(wk[[C]], N), space- 
separated.  The columns themselves are tab-separated.

What's the R way to export such as list?
Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] [:]

2007-11-24 Thread Alexy Khrabrov
What are idioms for taking a head or a tail of a vector, either up to  
an index, or from an index to the end?  Also -- is it necessary to  
use length(v) to refer to the last element? E.g., Python has

v[:3] # indices 0,1,2
v[3:] # indices 3,4,...
v[-1] # the last element of v
v[:-1] # all but last

Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] ragged array with append

2007-11-24 Thread Alexy Khrabrov
I wonder what's the right way in R to do the following -- placing  
objects of the same kind together in subarrays of varying length.   
Here's what I mean:

  word - c(a,b,c,d,e,f,g,h,i,j)
  kind - c(1,1,1,2,3,4,5,5,7,7)
  d - data.frame(word,kind)
  d
word kind
1 a1
2 b1
3 c1
4 d2
5 e3
6 f4
7 g5
8 h5
9 i7
10j7

Now from this data frame, I want to assemble words of the same kind  
into lists.  The result should look like (not R syntax):

1 = [a,b,c]
2 = [d]
3 = [e]
4 = [f]
5 = [g,h]
7 = [i,j]

What is the most appropriate data structure in R for this result and  
growing these sublists most effectively with append?

Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] printing levels as tuples

2007-11-22 Thread Alexy Khrabrov
I'm running rle() on a long vector, and get a result which looks like

  uc
Run Length Encoding
   lengths: int [1:16753] 1 1 1 1 1 1 1 1 1 1 ...
   values : int [1:16753] 29462748 22596107 18322820 14323315  
12684505 9909036 7296916 6857692 5884755 5883697 ...


I can print uc$names or uc$levels separately.  Is there any way to  
print them together as tuples, looking like

(29462748, 1)   (22596107, 1) ...
(5883697, 1) ...
...

Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] shrink a dataframe for plotting

2007-11-21 Thread Alexy Khrabrov
I get tables with millions of rows.  For plotting to a screen-size  
jpg, obviously just about 1000 points are enough.  Instead of feeding  
plot() the original millions of rows, I'd rather shrink the original  
dataframe, using some kind of the following interpolation:

-- split dataframe into chunks of N rows each, e.g. 1000 rows each
-- compute average for each column
-- issue one new row of those averages into the shrunk result

Is there any existing package to do that in R?  Otherwise, which R  
idioms are most effective to achieve that?

Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] shrink a dataframe for plotting

2007-11-21 Thread Alexy Khrabrov

On Nov 21, 2007, at 1:24 PM, Thibaut Jombart wrote:

 Alexy Khrabrov wrote:

 I get tables with millions of rows.  For plotting to a screen-size
 jpg, obviously just about 1000 points are enough.  Instead of feeding
 plot() the original millions of rows, I'd rather shrink the original
 dataframe, using some kind of the following interpolation:

 -- split dataframe into chunks of N rows each, e.g. 1000 rows each
 -- compute average for each column
 -- issue one new row of those averages into the shrunk result

 Is there any existing package to do that in R?  Otherwise, which R
 idioms are most effective to achieve that?

 Cheers,
 Alexy

 if you want to extract relevant information from such a table,  
 splitting
 rows in arbitrary chuncks may not solve your problem. Ordinations in
 reduced space are designed for that kind of task, but hierachical
 clustering may also help. See Legendre  Legendre (1998, Numerical
 Ecology, Elsevier) for examples of such methods in Ecology, and the R
 packages ade4, vegan and hclust.

Well, in this case the function is monotonically decreasing, and so  
averages would do fine just to plot the curve.  What I'm after is an  
R way -- preferably functional -- to split a vector into chunks of N  
elements and issue a new vector of the averages of those chunks.

Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] uniq -c

2007-11-21 Thread Alexy Khrabrov
Is there an R analog of the Unix command uniq -c:

http://en.wikipedia.org/wiki/Uniq

Given an array x, uniq -c replaces each contiguous subsequence of  
identical numbers with a tuple (count, number).  E.g.

$ cat  usample
10
10
9
8
8
7
7
7
6
3
1
1
1
0
$ uniq -c usample
   2 10
   1 9
   2 8
   3 7
   1 6
   1 3
   3 1
   1 0

Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] reading graph metadata text from a file

2007-11-20 Thread Alexy Khrabrov
I've tried tetxplot() but it fills the whole frame.  I could either  
use mfrow but it would devote unnecessarily large space to the text.   
In fact I want it in a corner of my current plot, not occupied by the  
curves.  Here's what I ended up with -- it works, but I have to  
manually move top text down as it's cut off by the upper boundary of  
the graph.  Any better/shorter/nicer ways?

First I fetch the dimensions of the current plot:

max.xy - function() {
p - par()
x.max - p$usr[2]
y.max - p$usr[4]
xy.coords(x=x.max, y=y.max)
}

Then, after plotting, I say

plot(...)

xy.max - max.xy()

# top-right -- need to move a bit down or letters are cut off on top:
text(xy.max$x,xy.max$y*0.99,story$text,adj=c(1,1),col=blue)

# bottom-right -- no cut-off, a small nice gap comes free:
# story.text is story$text with some more appended details
text(xy.max$x, 0, story.text,adj=c(1,0),col=blue)

As for the original question -- I ended up creating, in each data  
directory, a file story.r, looking like this:

-
# R titles, labels, and story text for the plot
# when I assigned color=blue right in data.frame, it became an  
integer level!
story - data.frame(title=,x=,y=,text=,color=)
story$title - graph title
story$x - x units
story$y - y units
story$text - story text -- this data has come  long way.  Once upon  
a time there was R...
# we can separate colors of title, labels, and text like
# title.color, x.color, y.color, text.color
story$color - dark blue
-

First I create a data.frame and then assign to its components one by  
one for readability.  When I tried to assign color right in data.frame 
(..., color=blue), it became integer levels!  So I had to move it  
out along with others.  What's the logic here?

Cheers,
Alexy

On Nov 20, 2007, at 1:39 AM, Bert Gunter wrote:

 ... But is I understand correctly,this is certainly straightforward  
 without
 textplot,too...

 e.g.
 mylegend - Some text...\n Some more text
 mytitle - This is a title
 plot(0:1,0:1, main = mytitle)
 legend(.2,.2,leg=mylegend, bty=n)

 Naturally, this could all be functionized and the various text  
 arguments
 passed as arguments to the function (see ?plot.default or its  
 code); or they
 could be components of a list, or ...


 Bert Gunter
 Genentech Nonclinical Statistics


 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
 project.org] On
 Behalf Of Greg Snow
 Sent: Monday, November 19, 2007 2:18 PM
 To: Alexy Khrabrov; r-help@r-project.org
 Subject: Re: [R] reading graph metadata text from a file

 You may want to use the textplot function from the gplots package  
 rather
 than the legend.

 -- 
 Gregory (Greg) L. Snow Ph.D.
 Statistical Data Center
 Intermountain Healthcare
 [EMAIL PROTECTED]
 (801) 408-8111



 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Alexy Khrabrov
 Sent: Monday, November 19, 2007 1:17 PM
 To: r-help@r-project.org
 Subject: [R] reading graph metadata text from a file

 I'd like to produce graphs with titles, axis labels, and
 legend as parameters read from a separate text file.
 Moreover, I'd like to use the legend for a short summary of
 the data -- not necessarily for describing the line colors
 per se.  How do we do this?

 Cheers,
 Alexy

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting- 
 guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] xy.coords and log10

2007-11-20 Thread Alexy Khrabrov
Is there a way to teach xy.coords, when given log=xy, or just x  
or y separately, to do a decimal log10 instead of the natural log?

Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] xy.coords and log10

2007-11-20 Thread Alexy Khrabrov
xy.coords can have a log=xy parameter which then plot interprets to  
use log scale.
I wonder whether plot can be instructed in a similar way to use log10  
scale instead of natural logs.

Cheers,
Alexy

On Nov 20, 2007, at 7:01 PM, Duncan Murdoch wrote:

 On 11/20/2007 10:41 AM, Alexy Khrabrov wrote:
 Is there a way to teach xy.coords, when given log=xy, or just  
 x  or y separately, to do a decimal log10 instead of the  
 natural log?

 xy.coords doesn't do any transformation other than setting non- 
 positive values to NA.  So your question doesn't make sense; could  
 you elaborate on what you're seeing that you don't want to see?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] asymptote

2007-11-20 Thread Alexy Khrabrov
I have a graph which looks like hyperbole.  I'd like to fit a  
straight line to the lower segment going to infinity, approaching the  
X axis -- I'm interested in the angle.  If I'd do it manually, I'd  
cut off a certain initial part of the range, [0..x_min], and then do  
an lm with the rest.  Yet I wonder whether those curve fitting  
packages mentioned earlier do something similar automatically?

Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] reading graph metadata text from a file

2007-11-19 Thread Alexy Khrabrov
I'd like to produce graphs with titles, axis labels, and legend as  
parameters read from a separate text file.  Moreover, I'd like to use  
the legend for a short summary of the data -- not necessarily for  
describing the line colors per se.  How do we do this?

Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] assign if not set

2007-11-19 Thread Alexy Khrabrov
What's the idiom of assigning a default value to a variable if it's  
not set?  In Ruby one can say

v ||= default

-- that's an or-assign, which triggers the assignment only if v is  
not set already.  Is there an R shorthand?

Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] assign if not set; stand-alone R script, source'able too?

2007-11-19 Thread Alexy Khrabrov
Marc -- thanks, very interesting.

I was in fact tinkering at a very simple default arguments assignment  
to a generic command-line R script header:

#!/bin/sh
# graph a fertility run
tail --lines=+4 $0 | R --vanilla --slave --args $*; exit
args - commandArgs()[-(1:4)]

# the krivostroi library
source(/w/ct/r/kriv/krivostroi.r)

# NB: vector assignment in R? defaults?
file- args[1]
maxruns - if (is.na(args[2])) 1 else args[2]
prefix  - if (is.na(args[3])) file.basename(file) else args[3]

-- turns out, args[N] here are NA if not supplied on the command- 
line.  I'd like to see a nicer way to do it still, without repeating  
args[N] twice for each assignment though.


Another thing from this snippet, unrelated to assignment, is that was  
the only way to get R script to be packed in a single file runnable  
from the command line in a stand alone way.  Yet when I want to source 
() it, R obviously chokes on the shell command tail.  My previous  
solution was to have a pair of files, script.r/script.sh for each R  
script, where .sh would look like

echo argv - c('$1','$2'); source('main.r') | R --vanilla --slave

Wonder if there's a way to have a single file which is a stand-alone  
command script, and can be source()'d in R.

Cheers,
Alexy

On Nov 20, 2007, at 4:03 AM, Marc Schwartz wrote:

 On Tue, 2007-11-20 at 03:32 +0300, Alexy Khrabrov wrote:
 What's the idiom of assigning a default value to a variable if it's
 not set?  In Ruby one can say

 v ||= default

 -- that's an or-assign, which triggers the assignment only if v is  
 not set already.  Is there an R shorthand?

 Cheers,
 Alexy

 If 'v' is not set, then it does not exist, hence you can use exists 
 () to
 check for it. However, you need to [potentially] distinguish where the
 variable might be located. Keep in mind that R uses lexical scoping,
 hence the exists() function has other arguments to define where to  
 look.

 A simple example:

 v
 Error: object v not found

 if (!exists(v)) v - Not Set

 v
 [1] Not Set

 v - Set

 if (!exists(v)) v - Not Set

 v
 [1] Set


 See ?exists for more information.

 That being said, just as an example of extending R, you could do the
 following, which is to create a new function %||=% (think %in% or %*%)
 which can then take two arguments, one preceding it and one following
 it, and then basically do the same thing as above. Again here, scoping
 is critical.


 %||=% - function(x, y)
 {
   Var - deparse(substitute(x))
   if (!exists(Var))
 assign(Var, y, parent.frame())
 }

 v
 Error: object v not found

 v %||=% Not Set

 v
 [1] Not Set

 v - Set

 v %||=% Not Set

 v
 [1] Set

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R as a programming language

2007-11-07 Thread Alexy Khrabrov
On Nov 7, 2007, at 4:13 PM, Duncan Murdoch wrote:

 And, still no option processing as in GNU long options, or python  
 or  ruby's optparse.
 What's the semantics of parameter passing -- by value or by  
 reference?

 By value.

Thanks Duncan!  So if I have a huge table t, and the idea was to  
write a function t.xy(t, ...) to select slices of it, will parameter  
passing copying waste forfeit all aesthetic savings from  
refactoring?  What I'm dreading is having to explicitly select x and  
y from t,

if (t has some shape) {
plot(t$this, t$that, ...)
} else if (t has that shape) {
plot(t$smth_else, ...)
}

-- that way I do refer to parts of t and there's no copying except to  
plot (?), yet if indeed passing parameters by value copies them, one  
would have to refrain from writing functions!  Is that the state of  
things?

Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R as a programming language

2007-11-07 Thread Alexy Khrabrov
Greetings -- coming from Python/Ruby perspective, I'm wondering about  
certain features of R as a programming language.

Say I have a huge table t of the form

run ord unitwords   new
1   1   69391013641
1   2   275 1001518
1   3   33141008488
1   4   14154   1018463
1   5   29821006421

Alternatively, it may have a part column in front.  For each run (in  
a part if present), I select ord and new columns as x and y and plot  
their functions in various ways.  t is huge.  So I want to select the  
subset to plot, as follows:

t.xy - function(t,part=NA,run=NA) {
if (is.na(run)) {
# TODO does this entail a full copy -- or how do we do 
references  
in R?
r - t
} else if (is.na(part)) {
r - t[t$run == run,]
} else { # part present too
r - t[t$part == part  t$run == run,]
}
x - r$ord
y - r$new
xy.coords(x,y)
}

What I'm wondering about is whether r -t will copy the complete t,  
and how do I minimize copying in R.  I heard it's a functional  
language -- is there lazy evaluation in place here?

Additionally, tried to use --args command line arguments, and found a  
way only due to David Brahm -- who helped with several important R  
points (thanks Dave!):

#!/bin/sh
# graph a fertility run
tail --lines=+4 $0 | R --vanilla --slave --args $*; exit
args - commandArgs()[-(1:4)]
...

And, still no option processing as in GNU long options, or python or  
ruby's optparse.

What's the semantics of parameter passing -- by value or by reference?

Is there anything less ugly than

print(paste(x=,x,y=,y))

-- for routine printing?  Can [1] be eliminated from such simple  
printing?  What about formatted printing?

Is there a way to assign all of

a - args[1]
b - args[2]
c - args[3]

in one fell swoop, a lá Python's

a,b,c = args

What's the simplest way to check whether a filename ends in .rda?

Will ask more as I go programming...

(Will someone here please write an O'Reilly's Programming in R?  :)

Cheers,
Alexy
__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] running sum of a vector

2007-11-07 Thread Alexy Khrabrov
I need a vector with sums of vectors up to each position in the  
original.  The imperative version is simple:

# running sum: the traditional imperative way
sumr.1 - function(x) {
   s - c()
   ss - 0
   for (i in 1:length(x)) {
  ss - ss + x[i]
  s[i] - ss
   }
   s
}

Yet I want a functional way, which is shorter:

# running sum: functional way, but inefficient one!
sumr.2 - function(x) {
sapply(1:length(x), function(i) sum(x[1:i]))
}

-- the problem with the latter is, we need to create indices to run  
over them, and the sum is recomputed anew for each position, while  
the imperative version iterates without recomputing.  Is there a  
better functional solution?

Cheers,
Alexy

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R as a programming language

2007-11-07 Thread Alexy Khrabrov
With all due respect to the great book -- of which I own 2 copies I  
bought new -- it's not an O'Reilly Programming in X book.  The  
idea of a programming book like that is to thoroughly treat the  
language from a programmer's standpoint, in a fairly standard way,  
such as Ruby or Python.

As I'm learning more of statistics with R, I prefer to do it with the  
book by Crawley.  Looks like most of R books are written by  
statisticians who became programmers, not the other way.  Through all  
those years I periodically follow R, I forget its programming spirit  
in between, and there's no Programming ... book to help.   
Statistics is hard to forget once you master it; syntax sugar melts  
away...

Programming with Data is the closest to an O'Reilly, but more  
advanced and esoteric than that.

Since R became a bona fide Open Source language with CRAN and all, an  
O'Reilly book by a [Python and Ruby] programmer-turn-statistician is  
long overdue!  If it systematically compares R with Ruby and Python,  
its closest Open Source cousins, it would help even more.  RPy and  
RRb are there to help, too.  Just my $0.01...

Cheers,
Alexy

On Nov 7, 2007, at 7:46 PM, Bert Gunter wrote:

 (Will someone here please write an O'Reilly's Programming in  
 R?  :)

 Someone already has ... see Venable and Ripley's S PROGRAMMING.

 **However** R is more than a general purpose programming language:  
 it is a
 programming language specifically designed for data analysis --  
 including
 statistical graphics -- and statistics. So, IMHO anyway, it's really
 impossible to discuss it without reference to the data structures and
 procedures underlying such tasks. Because it is targeted to do  
 those sorts
 of things well, it may handle poorly some things that general purpose
 languages do well (minimizing storage with the use of references, for
 example).

 My own experience is that one appreciates the power and beauty of the
 language and the wisdom of the designers the more one uses it in real
 applications. But I am not a computer scientist and have only a  
 limited
 exposure to standard CS concepts and algorithms, to say nothing of  
 real
 programming experience. So just my $.02.

 Best regards,

 Bert Gunter
 Genentech Nonclinical Statistics


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.