from:"Sam Steingold"

Re: [R] [R-pkgs] Release of ess 0.0.1

2017-11-09 Thread Sam Steingold

> * Jorge Cimentada <pvzragn...@tznvy.pbz> [2017-11-09 00:02:53 +0100]:
>
> I'm happy to announce the release of ess 0.0.1 a package designed to
> download data from the European Social Survey

Given the existence of ESS (Emacs Speaks Statistics -
https://ess.r-project.org/) the package name "ess" seems unfortunate.

-- 
Sam Steingold (http://sds.podval.org/) on darwin Ns 10.3.1504
http://steingoldpsychology.com http://www.childpsy.net http://iris.org.il
http://mideasttruth.com http://thereligionofpeace.com https://jihadwatch.org
MS Windows: error: the operation completed successfully.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help with binom.power

2015-08-17 Thread Sam Steingold

 * Bert Gunter othagre.4...@tznvy.pbz [2015-08-17 10:27:58 -0700]:

 qbinom(.025,1000,.001,lower=FALSE)

I don't think this is what I need.
I am looking for an inverse of binom.confint.

Sorry that my question was not clear.

-- 
Sam Steingold (http://sds.podval.org/) on darwin Ns 10.3.1348
http://www.childpsy.net/ http://islamexposedonline.com http://jihadwatch.org
http://iris.org.il http://dhimmi.org http://americancensorship.org
Genius is immortal, but morons live longer.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] strsplit with a vector split argument

2013-09-18 Thread Sam Steingold

Hi,
I find this behavior unexpected:
--8---cut here---start-8---
 strsplit(c(a,b;c,d;e,f),c(,,;))
[[1]]
[1] a   b;c

[[2]]
[1] d   e,f
--8---cut here---end---8---
I thought that it should be identical to this:
--8---cut here---start-8---
 strsplit(c(a,b;c,d;e,f),[,;])
[[1]]
[1] a b c

[[2]]
[1] d e f
--8---cut here---end---8---
Is this a bug or did I misunderstand the docs?
Thanks!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 13.04 (raring) X 11.0.11303000
http://www.childpsy.net/ http://www.memritv.org http://truepeace.org
http://camera.org http://openvotingconsortium.org http://palestinefacts.org
Experience comes with debts.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] promise already under evaluation

2013-07-03 Thread Sam Steingold

Hi, I asked this question on SO but got no answers:
http://stackoverflow.com/questions/17310825/r-promise-already-under-evaluation

I understand that you are probably sick and tired of answering the same 
question again, 
but I am still getting the error discussed in several other questions:

 promise already under evaluation: recursive default argument reference or 
 earlier problems?

even though I did follow the cumbersome advice of prepending .:

--8---cut here---start-8---
show.large.objects.threshold - 10
show.large.objects.exclude - c(closure)
show.large.objects - function (.envir = sys.frame(),
threshold = show.large.objects.threshold,
exclude = show.large.objects.exclude) {
  for (n in print(ls(.envir, all.names = TRUE))) tryCatch({
o - get(n,envir = .envir)
s - object.size(o)
if (s  threshold  !(typeof(o) %in% exclude)) {
  cat(n,: )
  print(s,units=auto)
}
  }, error = function(e) { cat(n=,n,\n); print(e) })
}
show.large.objects.stack - function (threshold = show.large.objects.threshold,
  skip.levels = 1,# do not examine the last 
level - this function
  exclude = show.large.objects.exclude) {
  for (level in 1:(sys.nframe()-skip.levels)) {
cat(*** show.large.objects.stack(,level,) )
print(sys.call(level))
show.large.objects(.envir = sys.frame(level))
  }
}
--8---cut here---end---8---

but I still get errors:

--8---cut here---start-8---
 f - function () { c - 1:1e7; d - 1:1e6; 
 print(system.time(show.large.objects.stack())) }
 f()
*** show.large.objects.stack( 1 ) f()
[1] c d
c : 38.1 Mb
d : 3.8 Mb
*** show.large.objects.stack( 2 ) print(system.time(show.large.objects.stack()))
[1] ... x  
n= ... 
simpleError in get(n, envir = .envir): argument ... is missing, with no 
default
n= x 
simpleError in get(n, envir = .envir): promise already under evaluation: 
recursive default argument reference or earlier problems?
*** show.large.objects.stack( 3 ) system.time(show.large.objects.stack())
[1] exprgcFirst ppt time   
n= expr 
simpleError in get(n, envir = .envir): promise already under evaluation: 
recursive default argument reference or earlier problems?
  user systemelapsed 
0 (0.00ms) 0 (0.00ms) 0.002 (2.00ms) 
--8---cut here---end---8---

So, what am I still doing wrong?
Do I really need the . in .envir?
Why do I get the [[argument ... is missing, with no default]] error?
Why do I get the [[promise already under evaluation]] error?
What is the right way to pass threshold and exclude from
show.large.objects.stack to show.large.objects?

Thanks!

PS. I would prefer an answer on SO, but please feel free to reply using any
venue you like and I will copy your explanation to the other venues.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 13.04 (raring) X 11.0.11303000
http://www.childpsy.net/ http://iris.org.il http://mideasttruth.com
http://honestreporting.com http://openvotingconsortium.org
Linux - find out what you've been missing while you've been rebooting Windows.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] promise already under evaluation

2013-07-03 Thread Sam Steingold

 * Sam Steingold f...@tah.bet [2013-07-03 11:33:47 -0400]:

 Hi, I asked this question on SO but got no answers:
 http://stackoverflow.com/questions/17310825/r-promise-already-under-evaluation

Backlin explained on SO that the errors are to be expected: ... is a
formal argument which was not supplied and expr and x were actually
being evaluated at the time of get() call.

The bottom line is that I must catch and ignore errors.

The remaining problem is: how do I pass the same arguments down?

e.g.,

--8---cut here---start-8---
f - function (... verbose=FALSE ...) { ... }
g - function (... verbose=FALSE ...) { ... f(... verbose=verbose ...) ... }
--8---cut here---end---8---

results in promise already under evaluation (and, yes, I do understand
why).

is there anything better than

--8---cut here---start-8---
f - function ( ... f.verbose=FALSE ... ) { ... }
g - function ( ... g.verbose=FALSE ... ) { ... f(... f.verbose=g.verbose ...) 
... }
--8---cut here---end---8---


-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 13.04 (raring) X 11.0.11303000
http://www.childpsy.net/ http://www.memritv.org http://mideasttruth.com
http://honestreporting.com http://think-israel.org http://jihadwatch.org
Incorrect time synchronization.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cedta decided 'igraph' wasn't data.table aware

2013-04-21 Thread Sam Steingold

Hi, what does this mean?

--8---cut here---start-8---
 graph - graph.data.frame(merged[!v,], vertices=ve, directed=FALSE)
cedta decided 'igraph' wasn't data.table aware
cedta decided 'igraph' wasn't data.table aware
cedta decided 'igraph' wasn't data.table aware
cedta decided 'igraph' wasn't data.table aware
cedta decided 'igraph' wasn't data.table aware
--8---cut here---end---8---

`merged' and `ve' are `data.table' objects, and thus `data.frame' objects too.
the igraph function graph.data.frame accepts data.frame as the first argument.

the igraph maintainers say that it is not coming from igraph.

thanks.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.10 (quantal) X 11.0.1130
http://www.childpsy.net/ http://www.PetitionOnline.com/tap12009/
http://memri.org http://thereligionofpeace.com http://jihadwatch.org
Growing Old is Inevitable; Growing Up is Optional.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] str on large data.frame is slow on factors with many levels

2013-04-09 Thread Sam Steingold

str() takes 2+ minutes to print
--8---cut here---start-8---
'data.frame':   9445743 obs. of  25 variables:
 $ share.id: Factor w/ 1641168 levels 387059b61ffef5cf,..: 7 118 118 
209 242 242 254 254 263 291 ...
...
--8---cut here---end---8---
pausing for tens of seconds to print each factor variable which have a
lot of levels.
Why?

(R version 2.15.3 (2013-03-01) -- Security Blanket
 Platform: x86_64-pc-linux-gnu (64-bit))

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.10 (quantal) X 11.0.1130
http://www.childpsy.net/ http://pmw.org.il http://palestinefacts.org
http://mideasttruth.com http://americancensorship.org http://camera.org
Garbage In, Gospel Out

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] !0 + !0 == !0 - !0

2013-03-18 Thread Sam Steingold

 * Bert Gunter thagre.ore...@trar.pbz [2013-03-17 20:30:56 -0700]:

 I also think it fair to say that all (??) languages have these sorts
 of malapropisms due to operator precedence.

Except for those languages which do _not_ have operator precedence.
Like, e.g., Lisp.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://openvotingconsortium.org http://jihadwatch.org
http://palestinefacts.org http://mideasttruth.com http://camera.org
DRM access management == prison freedom management.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] select rows with identical columns from a data frame

2013-01-20 Thread Sam Steingold

 * Bert Gunter thagre.ore...@trar.pbz [2013-01-19 22:26:46 -0800]:

 But David W. and Bill Dunlap gave you solutions that also work and are
 much faster, no?!

Yes, indeed, and I am now using David's solution as it is fast
(enough), simple and concise.

Thanks a lot to David, Bill, Rui, and arun for their answers (to this
question, my many previous questions, and, I hope, my future questions
in advance)!

 On Sat, Jan 19, 2013 at 9:41 PM, Sam Steingold s...@gnu.org wrote:
 * Rui Barradas ehvconeen...@fncb.cg [2013-01-18 21:02:20 +]:

 Try the following.

 complete.cases(f)  apply(f, 1, function(x) all(x == x[1]))

 thanks, this works, but is horribly slow (dim(f) is 766,950x2)

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://americancensorship.org http://palestinefacts.org
http://thereligionofpeace.com http://camera.org http://think-israel.org
Lisp is a way of life.  C is a way of death.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] select rows with identical columns from a data frame

2013-01-19 Thread Sam Steingold

 * Rui Barradas ehvconeen...@fncb.cg [2013-01-18 21:02:20 +]:

 Try the following.

 complete.cases(f)  apply(f, 1, function(x) all(x == x[1]))

thanks, this works, but is horribly slow (dim(f) is 766,950x2)

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://truepeace.org http://palestinefacts.org
http://thereligionofpeace.com http://honestreporting.com http://ffii.org
usually: can't pay == don't buy. software: can't buy == don't pay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] select rows with identical columns from a data frame

2013-01-18 Thread Sam Steingold

I have a data frame with several columns.
I want to select the rows with no NAs (as with complete.cases)
and all columns identical.
E.g., for

--8---cut here---start-8---
 f - data.frame(a=c(1,NA,NA,4),b=c(1,NA,3,40),c=c(1,NA,5,40))
 f
   a  b  c
1  1  1  1
2 NA NA NA
3 NA  3  5
4  4 40 40
--8---cut here---end---8---

I want the vector TRUE,FALSE,FALSE,FALSE selecting just the first
row because there all 3 columns are the same and none is NA.

thanks!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://memri.org http://mideasttruth.com
http://honestreporting.com http://pmw.org.il http://iris.org.il
All extremists should be taken out and shot.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] select rows with identical columns from a data frame

2013-01-18 Thread Sam Steingold

I can do
  Reduce(==,f[complete.cases(f),])
but that creates an intermediate data frame which I would love to avoid
(to save memory).

 * Sam Steingold f...@tah.bet [2013-01-18 15:53:21 -0500]:

 I have a data frame with several columns.
 I want to select the rows with no NAs (as with complete.cases)
 and all columns identical.
 E.g., for

 f - data.frame(a=c(1,NA,NA,4),b=c(1,NA,3,40),c=c(1,NA,5,40))
 f
a  b  c
 1  1  1  1
 2 NA NA NA
 3 NA  3  5
 4  4 40 40

 I want the vector TRUE,FALSE,FALSE,FALSE selecting just the first
 row because there all 3 columns are the same and none is NA.

 thanks!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://truepeace.org http://iris.org.il
http://www.PetitionOnline.com/tap12009/ http://ffii.org http://jihadwatch.org
War doesn't determine who's right, just who's left.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] non-consing count

2013-01-04 Thread Sam Steingold

Hi,
to count vector elements with some property, the standard idiom seems to
be length(which):
--8---cut here---start-8---
x - c(1,1,0,0,0)
count.0 - length(which(x == 0))
--8---cut here---end---8---
however, this approach allocates and discards 2 vectors: a logical
vector of length=length(x) and an integer vector in which.
is there a cheaper alternative?
Thanks!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://iris.org.il http://honestreporting.com
http://jihadwatch.org http://pmw.org.il http://www.PetitionOnline.com/tap12009/
War doesn't determine who's right, just who's left.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] vectorization modifying globals in functions

2012-12-27 Thread Sam Steingold

I have the following code:

--8---cut here---start-8---
d - rep(10,10)
for (i in 1:100) {
  a - sample.int(length(d), size = 2)
  if (d[a[1]] = 1) {
d[a[1]] - d[a[1]] - 1
d[a[2]] - d[a[2]] + 1
  }
}
--8---cut here---end---8---

it does what I want, i.e., modified vector d 100 times.

Now, if I want to repeat this 1e6 times instead of 1e2 times, I want to
vectorize it for speed, so I do this:

--8---cut here---start-8---
update - function (i) {
  a - sample.int(n.agents, size = 2)
  if (d[a[1]] = delta) {
d[a[1]] - d[a[1]] - 1
d[a[2]] - d[a[2]] + 1
  }
  entropy(d, unit=log2)
}
system.time(entropy.history - sapply(1:1e6,update))
--8---cut here---end---8---

however, the global d is not modified, apparently update modifies the
local copy.

so,
1. is there a way for a function to modify a global variable?
2. how would you vectorize this loop?

thanks!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://honestreporting.com
http://pmw.org.il http://www.PetitionOnline.com/tap12009/
A number problem solved with floats turns into 1.9998 problems.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] lattice::xyplot file output

2012-12-20 Thread Sam Steingold

Hi,
When I was using the regular plot() function, I added this:

--8---cut here---start-8---
  if (!is.null(file)) {
do.call(tools::file_ext(file),list(file = file))
on.exit(dev.off())
cat(writing,file,\n)
  }
--8---cut here---end---8---

to the beginning of each of my functions which plotted anything.
now that I am using lattice::xyplot to plot multiple lines, the above
code does NOT result in the plot being written to a file.
why?

I trued passing file=file to xyplot but that appears to be ignored too.

so, how do I tell lattice::xyplot to write charts in png files?

thanks!


-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://honestreporting.com
http://jihadwatch.org http://think-israel.org http://mideasttruth.com
cogito cogito ergo cogito sum

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] axes labeling

2012-12-20 Thread Sam Steingold

Is it possible to control formatting of the numbers which go along the
axes in plots?
e.g.
plot(x=1:100,y=1:100)
will label the X axis as 0d+00, 2e+05 c.
I want that to read 0, 200k, 400k c.
I know of the function axis(), but it offers far too much control for
this simple task.
thanks.
-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://www.memritv.org http://jihadwatch.org
http://pmw.org.il http://americancensorship.org http://think-israel.org
Why do we want intelligent terminals when there are so many stupid users?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] axes labeling

2012-12-20 Thread Sam Steingold

 * David L Carlson qpney...@gnzh.rqh [2012-12-20 13:58:00 -0600]:

 It is possible, but only by using axis() since you can specify axis breaks
 in a plot command, but not the labels. You can ignore most of the axis()
 options so the commands are pretty simple:

 plot(x=c(1, 100), y=c(1, 100), xlab=x, ylab=y, 
  xaxt=n, yaxt=n, las=2)
 pos - c(0, 20, 40, 60, 80, 100)
 lbl - c(0, 200k, 400k, 600k, 800k, 1000k)
 axis(1, pos, lbl)
 axis(2, pos, lbl)

That's what I meant when I said too much control.
I am happy with the way R selects positions.
All I want is a say in the way R formats those positions.

Think in terms of 100 being a variable.
To use axis, I will need to write a map from variable range to axis tick
positions first, and then sapply my formatting to the positions.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://think-israel.org http://iris.org.il
http://mideasttruth.com http://www.memritv.org http://memri.org
All extremists should be taken out and shot.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] sitools: bug: f2si(0)=

2012-12-20 Thread Sam Steingold

Jonas,
I think f2si(0) should be 0, not  as it is now.
Thanks.
-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://ffii.org http://mideasttruth.com
http://thereligionofpeace.com http://iris.org.il http://truepeace.org
Type louder, please.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] the value of the last expression

2012-12-10 Thread Sam Steingold

 * Richard M. Heiberger e...@grzcyr.rqh [2012-02-09 21:48:50 -0500]:

 .Last.value

Thanks; it worked for a while, but not anymore:

http://stat.ethz.ch/R-manual/R-patched/library/base/html/Last.value.html
--8---cut here---start-8---
 gamma(1:15) 
 [1]   1   1   2   6  24 120
 [7] 7205040   40320  362880 362880039916800
[13]   479001600  6227020800 87178291200
 z - .Last.value
 z
NULL
--8---cut here---end---8---

could my .Rprofile be at fault?
--8---cut here---start-8---
## breaks ess
## options(error = utils::recover)
options(max.print = 100, repos = c(CRAN = http://lib.stat.cmu.edu/R/CRAN/;))
library(compiler)
compiler::enableJIT(3)
compiler::compilePKGS(1)
--8---cut here---end---8---


 On Thu, Feb 9, 2012 at 9:44 PM, Sam Steingold s...@gnu.org wrote:

 Is there an analogue of common lisp * variable which contains the
 value of the last expression?
 E.g., in lisp:
  (+ 1 2)
 3
  *
 3

 I wish I could recover the value of the last expression without
 re-evaluating it.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://mideasttruth.com http://thereligionofpeace.com
http://www.memritv.org http://iris.org.il http://americancensorship.org
Diplomacy is the art of saying nice doggy until you can find a nice rock.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] sum portions of a vector

2012-12-10 Thread Sam Steingold

How do I sum portions of a vector into another vector?
E.g., for
--8---cut here---start-8---
 vec - 1:10
 breaks - c(3,8,10)
--8---cut here---end---8---
I want to get a vector of length 3 with content
--8---cut here---start-8---
6 = 1+2+3
30 = 4+5+6+7+8
19 = 9+10
--8---cut here---end---8---
Obviously, I could write a loop, but I would rather have a vectorized
version.
Thanks!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://palestinefacts.org http://ffii.org
http://jihadwatch.org http://www.PetitionOnline.com/tap12009/
One can find Holy Grail or Higgs boson, but not the second sock.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] the value of the last expression

2012-12-10 Thread Sam Steingold

 * arun fznegcvax...@lnubb.pbz [2012-12-10 11:22:03 -0800]:

 It is working for me.

I do not claim to have found a bug.
I am merely pleading for help figuring out what could have gone wrong.
.Last.value word when I first start R under Emacs/ESS.
Then it stops working.
I can't figure out when or why...

--8---cut here---start-8---
 sessionInfo()
R version 2.15.2 (2012-10-26)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
[1] C

attached base packages:
[1] stats graphics  grDevices utils datasets  compiler  methods  
[8] base 

loaded via a namespace (and not attached):
[1] tools_2.15.2
--8---cut here---end---8---


 sessionInfo()
 R version 2.15.0 (2012-03-30)
 Platform: x86_64-pc-linux-gnu (64-bit)

 locale:
  [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C  
  [3] LC_TIME=en_US.UTF-8    LC_COLLATE=en_US.UTF-8    
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
  [7] LC_PAPER=C LC_NAME=C 
  [9] LC_ADDRESS=C   LC_TELEPHONE=C    
 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C   

 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base 

 other attached packages:
 [1] matrixStats_0.6.2 stringr_0.6   reshape_0.8.4 plyr_1.7.1   

 loaded via a namespace (and not attached):
 [1] R.methodsS3_1.4.2 tools_2.15.0
 A.K.




 - Original Message -
 From: Sam Steingold s...@gnu.org
 To: r-help@r-project.org; Richard M. Heiberger r...@temple.edu
 Cc: 
 Sent: Monday, December 10, 2012 2:13 PM
 Subject: Re: [R] the value of the last expression

 * Richard M. Heiberger e...@grzcyr.rqh [2012-02-09 21:48:50 -0500]:

 .Last.value

 Thanks; it worked for a while, but not anymore:

 http://stat.ethz.ch/R-manual/R-patched/library/base/html/Last.value.html
 gamma(1:15) 
 [1]           1           1           2           6          24         120
 [7]         720        5040       40320      362880     3628800    39916800
 [13]   479001600  6227020800 87178291200
 z - .Last.value
 z
 NULL

 could my .Rprofile be at fault?
 ## breaks ess
 ## options(error = utils::recover)
 options(max.print = 100, repos = c(CRAN = http://lib.stat.cmu.edu/R/CRAN/;))
 library(compiler)
 compiler::enableJIT(3)
 compiler::compilePKGS(1)


 On Thu, Feb 9, 2012 at 9:44 PM, Sam Steingold s...@gnu.org wrote:

 Is there an analogue of common lisp * variable which contains the
 value of the last expression?
 E.g., in lisp:
  (+ 1 2)
 3
  *
 3

 I wish I could recover the value of the last expression without
 re-evaluating it.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://americancensorship.org http://pmw.org.il
http://www.memritv.org http://iris.org.il http://jihadwatch.org http://ffii.org
If it has syntax, it isn't user friendly.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] list to matrix?

2012-12-04 Thread Sam Steingold

How do I convert a list to a matrix?

--8---cut here---start-8---
list(c(5, 101), c(1e+05, 46), c(15, 31), c(2e+05, 17), 
c(25, 19), c(3e+05, 11), c(35, 12), c(4e+05, 25), 
c(45, 19), c(5e+05, 16))
as.matrix(a)
  [,1] 
 [1,] Numeric,2
 [2,] Numeric,2
 [3,] Numeric,2
 [4,] Numeric,2
 [5,] Numeric,2
 [6,] Numeric,2
 [7,] Numeric,2
 [8,] Numeric,2
 [9,] Numeric,2
--8---cut here---end---8---

thanks!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://palestinefacts.org http://dhimmi.com
http://jihadwatch.org http://www.PetitionOnline.com/tap12009/ http://memri.org
Rhinoceros has poor vision, but, due to his size, it's not his problem.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate() runs out of memory

2012-11-27 Thread Sam Steingold

 * Steve Lianoglou znvyvatyvfg.ubarl...@tznvy.pbz [2012-11-26 19:47:25 
 -0500]:

 On Monday, November 26, 2012, Sam Steingold wrote:
 [snip]


 there is precisely one country for each id.
 i.e., unique(country) is the same as country[1].
 thanks a lot for the suggestion!

  R result - f[, list(min=min(delay), max=max(delay),
  count=.N,country=country[1L]), by=share.id]


 And is it performant?

acceptable.

 It just occurred to me that this is even better:

 R setkeyv(f, c(share.id, delay))
 R result - f[,  list(min=delay[1L], max=delay[.N], count=.N,
 country=country[1L]), by=share.id]


this assumes that delays are sorted (like in my example)
which, in reality, they are not.
thanks for your help!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://honestreporting.com
http://americancensorship.org http://memri.org http://www.memritv.org
Illiterate?  Write today, for free help!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate() runs out of memory

2012-11-27 Thread Sam Steingold

 * Steve Lianoglou znvyvatyvfg.ubarl...@tznvy.pbz [2012-11-27 12:53:23 
 -0500]:
 On Tue, Nov 27, 2012 at 11:29 AM, Sam Steingold s...@gnu.org wrote:
 * Steve Lianoglou znvyvatyvfg.ubarl...@tznvy.pbz [2012-11-26 19:47:25 
 -0500]:
 [snip]
 It just occurred to me that this is even better:

 R setkeyv(f, c(share.id, delay))
 R result - f[,  list(min=delay[1L], max=delay[.N], count=.N,
 country=country[1L]), by=share.id]


 this assumes that delays are sorted (like in my example)
 which, in reality, they are not.

 When you include delay in the call to `setkeyv` as I did above, it
 sorts low to high w/in each share.id group.

Ah, but then I would have to _sort_ (~n*log(n)) by delay within each ID
group, while all I care about is min/max (~n).

thanks again!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://think-israel.org http://truepeace.org
http://thereligionofpeace.com http://mideasttruth.com http://www.memritv.org
If You Want Breakfast In Bed, Sleep In the Kitchen.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] printing difftime summary

2012-11-26 Thread Sam Steingold

this overcomes the summary generation, but not printing:

--8---cut here---start-8---
summary.difftime - function (v, ...) {
  s - summary(as.numeric(v), ...)
  r - as.data.frame(sapply(s,difftime2string),stringsAsFactors=FALSE)
  names(r) - c(string)
  r[[units(v)]] - s
  class(r) - c(data.frame,summary.difftime)
  r
}
print.summary.difftime - function (sd) print.data.frame(sd)
--8---cut here---end---8---

summary(infl), where infl$delay is a difftime vector, prints

...
   
delay   
  
 string:c(492.00 ms, 18.08 min, 1.77 hrs, 8.20 hrs, 8.13 hrs, 6.98 
days)  
 secs  :c( 0.5,   1085.1,   6370.2,  29534.4,  29254.0, 
602949.7) 

  

  

instead of something like

   delay
   Min.:492 ms
   1st Qu.: 18.08 min

c

so, how do I arrange for a proper printing of difftime summary as a part
of the data frame summary?

 * David Winsemius qjvafrz...@pbzpnfg.arg [2012-11-25 00:50:51 -0800]:

 On Nov 24, 2012, at 7:48 PM, Sam Steingold wrote:

 * David Winsemius qjvafrz...@pbzpnfg.arg [2012-11-23 13:14:17
 -0800]:

 See 
 http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-should-I-write-summary-methods_003f

 --8---cut here---start-8---
 summary.difftime - function (v) {
  s - summary(as.numeric(v))
  r - as.data.frame(sapply(s,difftime2string),stringsAsFactors=FALSE)
  names(r) - c(string)
  r[[units(v)]] - s
  class(r) - c(data.frame,summary.difftime)
  r
 }
 print.summary.difftime - function (sd) print.data.frame(sd)
 --8---cut here---end---8---

 it appears to work for a single vector:

 --8---cut here---start-8---
 r1 - summary(infl$delay)
 r1
   string secs
 Min.492.00 ms  0.5
 1st Qu. 18.08 min   1085.0
 Median   1.77 hrs   6370.0
 Mean 8.20 hrs  29530.0
 3rd Qu.  8.12 hrs  29250.0
 Max.6.98 days 602900.0
 str(r1)
 Classes 'summary.difftime' and 'data.frame': 6 obs. of  2 variables:
 $ string: chr  492.00 ms 18.08 min 1.77 hrs 8.20 hrs ...
 $ secs  :Classes 'summaryDefault', 'table'  num [1:6] 4.92e-01
 1.08e+03 6.37e+03 2.95e+04 2.92e+04 ...
 --8---cut here---end---8---

 but not as a part of data frame:

 --8---cut here---start-8---
 a - summary(infl)
 Error in summary.difftime(X[[22L]], ...) :
  unused argument(s) (maxsum = 7, digits = 12)
 --8---cut here---end---8---

 I guess I should somehow accept a list of options in
 summary.difftime()
 and pass them on to the inner call to summary() (or should it be
 explicitly summary.numeric()?)


 In the usual way. If you know that the function will be called with
 arguments from the summary.data.frame function then you should allow the
 argument list to accept them. You can ignore them or provide provisions
 for them. You just can't define your function to have only one argument
 if you expect (as you should since you passes summary a dataframe
 object) that it might be called within summary.data.frame.

 This is the argument list for summary.data.frame:

   summary.data.frame
 function (object, maxsum = 7, digits = max(3, getOption(digits) -
 3), ...)

 how do I do that?

 summary.difftime - function (v, ... ) { 

 There are many asked and answered questions on rhelp about how to deal
 with the dots arguments.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://www.memritv.org http://memri.org
http://honestreporting.com http://dhimmi.com http://openvotingconsortium.org
People with a good taste are especially appreciated by cannibals.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate() runs out of memory

2012-11-26 Thread Sam Steingold

Hi,

 * Steve Lianoglou znvyvatyvfg.ubarl...@tznvy.pbz [2012-11-19 13:30:03 
 -0800]:

 For instance, if you want the min and max of `delay` within each group
 defined by `share.id`, and let's assume `infl` is a data.frame, you
 can do something like so:

 R as.data.table(infl)
 R setkey(infl, share.id)
 R result - infl[, list(min=min(delay), max=max(delay)), by=share.id]

perfect, thanks.
alas, the resulting table does not contain the share.id column.
do I need to add something like id=unique(share.id) to the list?
also, if there is a field in the original table infl which only depends
on share.id, how do I add this unique value to the summary?
it appears that count=unique(country) in list() does what I need, but
it slows down the process.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://openvotingconsortium.org http://jihadwatch.org
http://thereligionofpeace.com http://palestinefacts.org http://dhimmi.com
Why use Windows, when there are Doors?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate() runs out of memory

2012-11-26 Thread Sam Steingold

hi Steve,

 * Steve Lianoglou znvyvatyvfg.ubarl...@tznvy.pbz [2012-11-26 16:08:59 
 -0500]:
 On Mon, Nov 26, 2012 at 3:13 PM, Sam Steingold s...@gnu.org wrote:
 * Steve Lianoglou znvyvatyvfg.ubarl...@tznvy.pbz [2012-11-19 13:30:03 
 -0800]:

 For instance, if you want the min and max of `delay` within each group
 defined by `share.id`, and let's assume `infl` is a data.frame, you
 can do something like so:

 R as.data.table(infl)
 R setkey(infl, share.id)
 R result - infl[, list(min=min(delay), max=max(delay)), by=share.id]

 perfect, thanks.
 alas, the resulting table does not contain the share.id column.
 do I need to add something like id=unique(share.id) to the list?
 also, if there is a field in the original table infl which only depends
 on share.id, how do I add this unique value to the summary?
 it appears that count=unique(country) in list() does what I need, but
 it slows down the process.

 Hmm ... I think it should be there, but I'm having  a hard time
 remember what you want.

 Could you please copy paste the output of `(head(infl, 20))` as
 well as an approximation of what the result is that you want.

this prints all the levels for all the factor columns and takes
megabytes.

--8---cut here---start-8---
 f - data.frame(id=rep(1:3,4),country=rep(6:8,4),delay=1:12)
 f
   id country delay
1   1   6 1
2   2   7 2
3   3   8 3
4   1   6 4
5   2   7 5
6   3   8 6
7   1   6 7
8   2   7 8
9   3   8 9
10  1   610
11  2   711
12  3   812
 f - as.data.table(f)
 setkey(f,id)
 delays - 
 f[,list(min=min(delay),max=max(delay),count=.N,country=unique(country)),by=id]
 delays
   id min max count country
1:  1   1  10 4   6
2:  2   2  11 4   7
3:  3   3  12 4   8
--8---cut here---end---8---

this is still too slow, apparently because of unique.
how do I speed it up?

Thanks.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://iris.org.il
http://ffii.org http://pmw.org.il http://mideasttruth.com
Programming is like sex: one mistake and you have to support it for a lifetime.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] printing difftime summary

2012-11-26 Thread Sam Steingold

 * David Winsemius qjvafrz...@pbzpnfg.arg [2012-11-26 08:46:35 -0800]:

 On Nov 26, 2012, at 7:14 AM, Sam Steingold wrote:

 summary(infl), where infl$delay is a difftime vector, prints

 ...

delay
 string:c(492.00 ms, 18.08 min, 1.77 hrs, 8.20 hrs, 8.13 hrs,
 6.98 days)
 secs  :c( 0.5,   1085.1,   6370.2,  29534.4,  29254.0,
 602949.7)



 instead of something like

   delay
   Min.:492 ms
   1st Qu.: 18.08 min

 c

 so, how do I arrange for a proper printing of difftime summary as a
 part
 of the data frame summary?

 If you like a particular format from an existing print method then why
 not look it up and copy the code?

 methods(print)

the problem is that I cannot figure out which function prints this:

delay
 string:c(492.00 ms, 18.08 min, 1.77 hrs, 8.20 hrs, 8.13 hrs,
 6.98 days)
 secs  :c( 0.5,   1085.1,   6370.2,  29534.4,  29254.0,
 602949.7)

I added cat()s to print.summary.difftime and I do not see them, so it
appears that I have no direct control over how a summary.difftime is
printed as a part of a summary of a data.frame.


--8---cut here---start-8---
summary.difftime - function (v, ...) {
  s - summary(as.numeric(v), ...)
  r - as.data.frame(sapply(s,difftime2string),stringsAsFactors=FALSE)
  names(r) - c(string)
  r[[units(v)]] - s
  class(r) - c(summary.difftime,data.frame)
  invisible(r)
}
print.summary.difftime - function (sd, ...) {
  cat([[[print.summary.difftime]]]\n)
  print(list(...))
  print.data.frame(sd, ...)
}
--8---cut here---end---8---

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://palestinefacts.org http://think-israel.org
http://www.memritv.org http://openvotingconsortium.org http://mideasttruth.com
The force of gravity doubles when acting on a body on a couch.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate() runs out of memory

2012-11-26 Thread Sam Steingold

Hi,

 * Steve Lianoglou znvyvatyvfg.ubarl...@tznvy.pbz [2012-11-26 17:32:21 
 -0500]:

 --8---cut here---start-8---
 f - data.frame(id=rep(1:3,4),country=rep(6:8,4),delay=1:12)
 f
id country delay
 1   1   6 1
 2   2   7 2
 3   3   8 3
 4   1   6 4
 5   2   7 5
 6   3   8 6
 7   1   6 7
 8   2   7 8
 9   3   8 9
 10  1   610
 11  2   711
 12  3   812
 f - as.data.table(f)
 setkey(f,id)
 delays - 
 f[,list(min=min(delay),max=max(delay),count=.N,country=unique(country)),by=id]
 delays
id min max count country
 1:  1   1  10 4   6
 2:  2   2  11 4   7
 3:  3   3  12 4   8
 --8---cut here---end---8---

 this is still too slow, apparently because of unique.
 how do I speed it up?

 I think I'm missing something.

 Your call to `min(delay)` and `max(delay)` will return the minimum and
 maximum delays within the particular id you are grouping by. I guess
 there must be several values for country within each id group --
 do you really want the same min and max values to be replicated as
 many times as there are unique countrys?

there is precisely one country for each id.
i.e., unique(country) is the same as country[1].
thanks a lot for the suggestion!

 R result - f[, list(min=min(delay), max=max(delay),
 count=.N,country=country[1L]), by=share.id]

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://thereligionofpeace.com http://pmw.org.il
http://honestreporting.com http://americancensorship.org
Why do you never call me back after I scream that I will never talk to you 
again?!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] printing difftime summary

2012-11-26 Thread Sam Steingold

Thanks a lot - almost there!

--8---cut here---start-8---
format.summary.difftime - function(sd, ...) {
  t - matrix(sd$string)
  rownames(t) - rownames(sd)
  print(t)
  format(as.table(t))
}
print.summary.difftime - function (sd, ...) {
  print(format(sd), quote=FALSE)
  invisible(sd)
}
--8---cut here---end---8---

this almost works:

--8---cut here---start-8---
 summary(delays)
 share.id min  max   
 12cf12372b87cce9:  1   NULL:492.00 ms   NULL:492.00 ms  
 12cf36060bdb9581:  1   NULL:3.70 minNULL:21.80 min  
 12d2665c906bb232:  1   NULL:20.32 min   NULL:3.26 hrs   
 12d2802f1435b4cd:  1   NULL:5.52 hrsNULL:13.78 hrs  
 12d292988f5f8422:  1   NULL:2.81 hrsNULL:16.20 hrs  
 12d29dd2894e2790:  1   NULL:6.95 days   NULL:6.98 days  
--8---cut here---end---8---

why do I see NULLs?!

--8---cut here---start-8---
 t - matrix(sd$string)
 rownames(t) - rownames(sd)
 t
[,1]   
Min.492.00 ms
1st Qu. 3.70 min 
Median  20.32 min
Mean5.52 hrs 
3rd Qu. 2.81 hrs 
Max.6.95 days
 as.table(t)
A
Min.492.00 ms
1st Qu. 3.70 min 
Median  20.32 min
Mean5.52 hrs 
3rd Qu. 2.81 hrs 
Max.6.95 days
 format(as.table(t))
A  
Min.492.00 ms
1st Qu. 3.70 min 
Median  20.32 min
Mean5.52 hrs 
3rd Qu. 2.81 hrs 
Max.6.95 days
 --8---cut here---end---8---


 * William Dunlap jqha...@gvopb.pbz [2012-11-26 23:02:48 +]:

 It looks like summary.data.frame(d) calls format(d[[i]]) for i in 
 seq_len(ncol(d))
 and pastes the results together into a table object for printing.  Hence, 
 write
 a format.summary.difftime if you want objects of class summary.difftime 
 (which
 I assume summary.difftime produces) to be formatted as you wish when a
 difftime object is in a data.frame.  Once you've written it, have your 
 print.summary.difftime
 call it too.

 E.g., with the following methods
 summary.difftime - function(x, ...) {
  ret - quantile(x, p=(0:2)/2, na.rm=TRUE)
  class(ret) - c(summary.difftime, class(ret))
  ret
 }
 format.summary.difftime - function(x, ...) c(Min.Med.Max =
 paste(collapse=..., NextMethod(format)))
 print.summary.difftime - function(x, ...){ print(format(x), quote=FALSE) ; 
 invisible(x) }

 I get
 d - data.frame(Num=1:5, Date=as.Date(2012-11-26)+(0:4),
 Delta=diff(as.Date(2012-11-26)+2^(0:5)))
 summary(d)
   Num DateDelta
  Min.   :1   Min.   :2012-11-26   Min.Med.Max: 1 days... 4 days...16 days
  1st Qu.:2   1st Qu.:2012-11-27
  Median :3   Median :2012-11-28
  Mean   :3   Mean   :2012-11-28
  3rd Qu.:4   3rd Qu.:2012-11-29
  Max.   :5   Max.   :2012-11-30
 summary(d$Delta)
 Min.Med.Max
  1 days... 4 days...16 days

 My summary.difftime inherits from difftime so the format method is not really
 needed, as format.difftime does a reasonable job (except that it does not copy
 the input names to its output).  I put it in to show how it gets called.


 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Sam Steingold
 Sent: Monday, November 26, 2012 2:20 PM
 To: r-help@r-project.org; David Winsemius
 Subject: Re: [R] printing difftime summary
 
  * David Winsemius qjvafrz...@pbzpnfg.arg [2012-11-26 08:46:35 -0800]:
 
  On Nov 26, 2012, at 7:14 AM, Sam Steingold wrote:
 
  summary(infl), where infl$delay is a difftime vector, prints
 
  ...
 
 delay
  string:c(492.00 ms, 18.08 min, 1.77 hrs, 8.20 hrs, 8.13 hrs,
  6.98 days)
  secs  :c( 0.5,   1085.1,   6370.2,  29534.4,  29254.0,
  602949.7)
 
 
 
  instead of something like
 
delay
Min.:492 ms
1st Qu.: 18.08 min
 
  c
 
  so, how do I arrange for a proper printing of difftime summary as a
  part
  of the data frame summary?
 
  If you like a particular format from an existing print method then why
  not look it up and copy the code?
 
  methods(print)
 
 the problem is that I cannot figure out which function prints this:
 
 delay
  string:c(492.00 ms, 18.08 min, 1.77 hrs, 8.20 hrs, 8.13 hrs,
  6.98 days)
  secs  :c( 0.5,   1085.1,   6370.2,  29534.4,  29254.0,
  602949.7)
 
 I added cat()s to print.summary.difftime and I do not see them, so it
 appears that I have no direct control over how a summary.difftime is
 printed as a part of a summary of a data.frame.
 
 
 --8---cut here---start-8---
 summary.difftime - function (v, ...) {
   s - summary(as.numeric(v), ...)
   r - as.data.frame(sapply(s,difftime2string),stringsAsFactors=FALSE)
   names(r) - c(string)
   r[[units(v)]] - s
   class(r) - c(summary.difftime,data.frame)
   invisible(r

Re: [R] printing difftime summary

2012-11-26 Thread Sam Steingold

Looks like
format.summary.difftime - function(sd, ...) structure(sd$string,
names=rownames(sd))
does the job.
any reason not to use it?

On Mon, Nov 26, 2012 at 7:36 PM, William Dunlap wdun...@tibco.com wrote:
 why do I see NULLs?!

 because

  ... format.difftime does a reasonable job (except that it does not copy
  the input names to its output).

 Replace your call of the form
   format(difftimeObject)
 with
   structure(format(difftimeObject), names=names(difftimeObject))
 to work around this.


 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com


 -Original Message-
 From: Sam Steingold [mailto:sam.steing...@gmail.com] On Behalf Of Sam 
 Steingold
 Sent: Monday, November 26, 2012 4:09 PM
 To: William Dunlap
 Cc: r-help@r-project.org; David Winsemius
 Subject: Re: [R] printing difftime summary

 Thanks a lot - almost there!

 --8---cut here---start-8---
 format.summary.difftime - function(sd, ...) {
   t - matrix(sd$string)
   rownames(t) - rownames(sd)
   print(t)
   format(as.table(t))
 }
 print.summary.difftime - function (sd, ...) {
   print(format(sd), quote=FALSE)
   invisible(sd)
 }
 --8---cut here---end---8---

 this almost works:

 --8---cut here---start-8---
  summary(delays)
  share.id min  max
  12cf12372b87cce9:  1   NULL:492.00 ms   NULL:492.00 ms
  12cf36060bdb9581:  1   NULL:3.70 minNULL:21.80 min
  12d2665c906bb232:  1   NULL:20.32 min   NULL:3.26 hrs
  12d2802f1435b4cd:  1   NULL:5.52 hrsNULL:13.78 hrs
  12d292988f5f8422:  1   NULL:2.81 hrsNULL:16.20 hrs
  12d29dd2894e2790:  1   NULL:6.95 days   NULL:6.98 days
 --8---cut here---end---8---

 why do I see NULLs?!

 --8---cut here---start-8---
  t - matrix(sd$string)
  rownames(t) - rownames(sd)
  t
 [,1]
 Min.492.00 ms
 1st Qu. 3.70 min
 Median  20.32 min
 Mean5.52 hrs
 3rd Qu. 2.81 hrs
 Max.6.95 days
  as.table(t)
 A
 Min.492.00 ms
 1st Qu. 3.70 min
 Median  20.32 min
 Mean5.52 hrs
 3rd Qu. 2.81 hrs
 Max.6.95 days
  format(as.table(t))
 A
 Min.492.00 ms
 1st Qu. 3.70 min 
 Median  20.32 min
 Mean5.52 hrs 
 3rd Qu. 2.81 hrs 
 Max.6.95 days
  --8---cut here---end---8---


  * William Dunlap jqha...@gvopb.pbz [2012-11-26 23:02:48 +]:
 
  It looks like summary.data.frame(d) calls format(d[[i]]) for i in 
  seq_len(ncol(d))
  and pastes the results together into a table object for printing.  
  Hence, write
  a format.summary.difftime if you want objects of class summary.difftime 
  (which
  I assume summary.difftime produces) to be formatted as you wish when a
  difftime object is in a data.frame.  Once you've written it, have your
 print.summary.difftime
  call it too.
 
  E.g., with the following methods
  summary.difftime - function(x, ...) {
   ret - quantile(x, p=(0:2)/2, na.rm=TRUE)
   class(ret) - c(summary.difftime, class(ret))
   ret
  }
  format.summary.difftime - function(x, ...) c(Min.Med.Max =
  paste(collapse=..., NextMethod(format)))
  print.summary.difftime - function(x, ...){ print(format(x), quote=FALSE) 
  ; invisible(x) }
 
  I get
  d - data.frame(Num=1:5, Date=as.Date(2012-11-26)+(0:4),
  Delta=diff(as.Date(2012-11-26)+2^(0:5)))
  summary(d)
Num DateDelta
   Min.   :1   Min.   :2012-11-26   Min.Med.Max: 1 days... 4 days...16 days
   1st Qu.:2   1st Qu.:2012-11-27
   Median :3   Median :2012-11-28
   Mean   :3   Mean   :2012-11-28
   3rd Qu.:4   3rd Qu.:2012-11-29
   Max.   :5   Max.   :2012-11-30
  summary(d$Delta)
  Min.Med.Max
   1 days... 4 days...16 days
 
  My summary.difftime inherits from difftime so the format method is not 
  really
  needed, as format.difftime does a reasonable job (except that it does not 
  copy
  the input names to its output).  I put it in to show how it gets called.
 
 
  Bill Dunlap
  Spotfire, TIBCO Software
  wdunlap tibco.com
 
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] 
  On
 Behalf
  Of Sam Steingold
  Sent: Monday, November 26, 2012 2:20 PM
  To: r-help@r-project.org; David Winsemius
  Subject: Re: [R] printing difftime summary
 
   * David Winsemius qjvafrz...@pbzpnfg.arg [2012-11-26 08:46:35 -0800]:
  
   On Nov 26, 2012, at 7:14 AM, Sam Steingold wrote:
  
   summary(infl), where infl$delay is a difftime vector, prints
  
   ...
  
  delay
   string:c(492.00 ms, 18.08 min, 1.77 hrs, 8.20 hrs, 8.13 hrs,
   6.98 days)
   secs  :c( 0.5,   1085.1,   6370.2,  29534.4,  29254.0,
   602949.7)
  
  
  
   instead of something like
  
 delay
 Min.:492 ms
 1st Qu.: 18.08 min
  
   c
  
   so, how do I arrange for a proper printing of difftime summary as a
   part

Re: [R] printing difftime summary

2012-11-24 Thread Sam Steingold

 * David Winsemius qjvafrz...@pbzpnfg.arg [2012-11-23 13:14:17 -0800]:

 See 
 http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-should-I-write-summary-methods_003f

--8---cut here---start-8---
summary.difftime - function (v) {
  s - summary(as.numeric(v))
  r - as.data.frame(sapply(s,difftime2string),stringsAsFactors=FALSE)
  names(r) - c(string)
  r[[units(v)]] - s
  class(r) - c(data.frame,summary.difftime)
  r
}
print.summary.difftime - function (sd) print.data.frame(sd)
--8---cut here---end---8---

it appears to work for a single vector:

--8---cut here---start-8---
 r1 - summary(infl$delay)
 r1
   string secs
Min.492.00 ms  0.5
1st Qu. 18.08 min   1085.0
Median   1.77 hrs   6370.0
Mean 8.20 hrs  29530.0
3rd Qu.  8.12 hrs  29250.0
Max.6.98 days 602900.0
 str(r1)
Classes 'summary.difftime' and 'data.frame':6 obs. of  2 variables:
 $ string: chr  492.00 ms 18.08 min 1.77 hrs 8.20 hrs ...
 $ secs  :Classes 'summaryDefault', 'table'  num [1:6] 4.92e-01 1.08e+03 
6.37e+03 2.95e+04 2.92e+04 ...
--8---cut here---end---8---

but not as a part of data frame:

--8---cut here---start-8---
 a - summary(infl)
Error in summary.difftime(X[[22L]], ...) : 
  unused argument(s) (maxsum = 7, digits = 12)
--8---cut here---end---8---

I guess I should somehow accept a list of options in summary.difftime()
and pass them on to the inner call to summary() (or should it be
explicitly summary.numeric()?)

how do I do that?

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://camera.org http://jihadwatch.org
http://americancensorship.org http://truepeace.org http://memri.org
Why do you never call me back after I scream that I will never talk to you 
again?!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] printing difftime summary

2012-11-23 Thread Sam Steingold

 * R. Michael Weylandt zvpunry.jrlyn...@tznvy.pbz [2012-11-23 09:13:36 
 +]:

 2. because difftime.summary returns a data.frame and not a
 Classes 'summaryDefault', 'table' as I assume summary must return.

 See 
 http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-should-I-write-summary-methods_003f

what are the requirements on the class summary.foo?
does it have to inherit from some other class?
how do I define a class?

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://dhimmi.com http://honestreporting.com
http://thereligionofpeace.com http://iris.org.il http://americancensorship.org
In the race between idiot-proof software and idiots, the idiots are winning.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] printing difftime summary

2012-11-22 Thread Sam Steingold

 * R. Michael Weylandt zvpunry.jrlyn...@tznvy.pbz [2012-11-22 12:11:55 
 +]:

 I now think that what I want is
 --8---cut here---start-8---
 difftime.summary - function (v) {
   s - summary(as.numeric(v))
   r - as.data.frame(sapply(s,difftime2string),stringsAsFactors=FALSE)
   names(r) - c(string)
   r[[units(v)]] - s
   r
 }

 Any reason not summary.difftime to get S3 dispatch?

I hoped that someone will ask this :-)

1. because its argument has type vector of difftime, not difftime
(coming from CLOS, I do not expect summary(vector of difftime) to
dispatch to summary.difftime, but to summary.vector.of.difftime or something)

2. because difftime.summary returns a data.frame and not a
Classes 'summaryDefault', 'table' as I assume summary must return.

if these are not valid issues, then I wonder why my function should not
be the system default method.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://memri.org http://honestreporting.com
http://jihadwatch.org http://openvotingconsortium.org http://ffii.org
Sex is like air.  It's only a big deal if you can't get any.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] printing difftime summary

2012-11-21 Thread Sam Steingold

Hi,
I have a vector of difftime objects and I want to see its summary.
Alas:
--8---cut here---start-8---
 summary(infl$delay)
  LengthClass Mode 
 9008386 difftime  numeric 
--8---cut here---end---8---
this is almost completely useless.
I can use as.numeric:
--8---cut here---start-8---
 s - summary(as.numeric(infl$delay))
 dput(s)
structure(c(0.5, 1027, 5969, 29870, 28970, 603100), .Names = c(Min., 
1st Qu., Median, Mean, 3rd Qu., Max.), class = c(summaryDefault, 
table))
 s
Min.  1st Qu.   Median Mean  3rd Qu. Max. 
 0.5   1027.0   5969.0  29870.0  28970.0 603100.0 
--8---cut here---end---8---
but the printed representation is very unreadable: the fact that
603100.0 is almost exactly 7 days is not obvious.
Okay, maybe as.difftime will help?
--8---cut here---start-8---
 as.difftime(s,units=secs)
Time differences in secs
Min.  1st Qu.   Median Mean  3rd Qu. Max. 
 0.5   1027.0   5969.0  29870.0  28970.0 603100.0 
 as.difftime(s/3600,units=hours)
Time differences in hours
Min.  1st Qu.   Median Mean  3rd Qu. Max. 
1.39e-04 2.852778e-01 1.658056e+00 8.297222e+00 8.047222e+00 1.675278e+02 
--8---cut here---end---8---
nope; still unreadable.

What I really want to see _printed_ is something likes this:
--8---cut here---start-8---
 sapply(s,difftime2string)
   Min. 1st Qu.  MedianMean 3rd Qu.Max. 
500.00 ms 17.12 min 99.48 min  8.30 hrs  8.05 hrs 6.98 days 
--8---cut here---end---8---
except that the quotes are not needed in the printed output.
Here I wrote:
--8---cut here---start-8---
difftime2string - function (x) {
  if (x  1) return(sprintf(%.2f ms,x*1000))
  if (x  100) return(sprintf(%.2f sec,x))
  if (x  6000) return(sprintf(%.2f min,x/60))
  if (x  108000) return(sprintf(%.2f hrs,x/3600))
  if (x  400*24*3600) return(sprintf(%.2f days,x/(24*3600)))
  sprintf(%.2f years,x/(365.25*24*3600))
}
--8---cut here---end---8---

So, what is The Right R Way to print a summary of difftime objects?
Thanks!
-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://openvotingconsortium.org
http://memri.org http://camera.org http://mideasttruth.com http://pmw.org.il
MS Windows: error: the operation completed successfully.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] printing difftime summary

2012-11-21 Thread Sam Steingold

Hi,

 * arun fznegcvax...@lnubb.pbz [2012-11-21 14:04:36 -0800]:

 Are you looking for some other function (difftime2string)
 ot just remove the quotes from the printed output?

I am wondering what others do when they want to see a summary of difftime.

 If it is the latter, then this should do it.
 res-do.call(data.frame,lapply(s,difftime2string))
  names(res)-names(s)
  res
 #   Min.   1st Qu.    Median Mean  3rd Qu.  Max.
 #1 500.00 ms 17.12 min 99.48 min 8.30 hrs 8.05 hrs 6.98 days

cool, thanks.
I now think that what I want is
--8---cut here---start-8---
difftime.summary - function (v) {
  s - summary(as.numeric(v))
  r - as.data.frame(sapply(s,difftime2string),stringsAsFactors=FALSE)
  names(r) - c(string)
  r[[units(v)]] - s
  r
}
 difftime.summary(infl$delay)
   string secs
Min.500.00 ms  0.5
1st Qu. 17.12 min   1027.0
Median  99.48 min   5969.0
Mean 8.30 hrs  29870.0
3rd Qu.  8.05 hrs  28970.0
Max.6.98 days 603100.0
--8---cut here---end---8---


-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://ffii.org http://jihadwatch.org http://memri.org
http://www.memritv.org http://camera.org http://mideasttruth.com
A computer scientist is someone who fixes things that aren't broken.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] generated list element names

2012-11-19 Thread Sam Steingold

How can I create lists with element names created on the fly?

--8---cut here---start-8---
 list (foo = 10)
$foo
[1] 10

 list (foo = 10)
$foo
[1] 10

 list (paste(f,oo,sep=) = 10)
Error: unexpected '=' in list (paste(f,oo,sep=) =
--8---cut here---end---8---

I understand that tags in list() are not evaluated, but is there a more
elegant way than

--8---cut here---start-8---
 z - list(10)
 names(z) - paste(f,oo,sep=)
 z
$foo
[1] 10
--8---cut here---end---8---

thanks!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://www.memritv.org
http://thereligionofpeace.com http://truepeace.org
Unix roulette: `dd if=/dev/urandom of=/dev/kmem bs=1 count=1 seek=$RANDOM`

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] generated list element names

2012-11-19 Thread Sam Steingold

 * jim holtman wubyg...@tznvy.pbz [2012-11-19 13:14:05 -0500]:

 How about this (if you don't like writing two lines, encapsulate it in
 a function):

 x - list(10)
 names(x) - paste('f', 'oo', sep = '')
 str(x)
 List of 1
  $ foo: num 10


I am sorry, how is this different from my second snippet (except that
you use x and I use z and you use single quotes in paste and I use
double quotes)?


 On Mon, Nov 19, 2012 at 1:07 PM, Sam Steingold s...@gnu.org wrote:
 How can I create lists with element names created on the fly?

 --8---cut here---start-8---
 list (foo = 10)
 $foo
 [1] 10

 list (foo = 10)
 $foo
 [1] 10

 list (paste(f,oo,sep=) = 10)
 Error: unexpected '=' in list (paste(f,oo,sep=) =
 --8---cut here---end---8---

 I understand that tags in list() are not evaluated, but is there a more
 elegant way than

 --8---cut here---start-8---
 z - list(10)
 names(z) - paste(f,oo,sep=)
 z
 $foo
 [1] 10
 --8---cut here---end---8---

 thanks!

 --
 Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 
 11.0.11103000
 http://www.childpsy.net/ http://www.memritv.org
 http://thereligionofpeace.com http://truepeace.org
 Unix roulette: `dd if=/dev/urandom of=/dev/kmem bs=1 count=1 seek=$RANDOM`

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://memri.org http://truepeace.org http://ffii.org
http://think-israel.org http://jihadwatch.org http://palestinefacts.org
The only time you have too much fuel is when you're on fire.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate() runs out of memory

2012-11-19 Thread Sam Steingold

Thanks Steve,
what is the analogue of .N for min and max?
i.e., what is the data.table's version of
aggregate(infl$delay,by=list(infl$share.id),FUN=min)
aggregate(infl$delay,by=list(infl$share.id),FUN=max)
thanks!
Sam.

On Fri, Sep 14, 2012 at 3:40 PM, Steve Lianoglou
mailinglist.honey...@gmail.com wrote:
 Hi,

 On Fri, Sep 14, 2012 at 3:26 PM, Sam Steingold s...@gnu.org wrote:
 I have a large data.frame Z (2,424,185,944 bytes, 10,256,441 rows, 17 
 columns).
 I want to get the result of
 table(aggregate(Z$V1, FUN = length, by = list(id=Z$V2))$x)
 alas, aggregate has been running for ~30 minute, RSS is 14G, VIRT is
 24.3G, and no end in sight.
 both V1 and V2 are characters (not factors).
 Is there anything I could do to speed this up?
 Thanks.

 You might find you'll get a lot of mileage out of data.table when
 working with such large data.frames ...

 To get something close to what you're after, you can try:

 R library(data.table)
 R Z - as.data.table(Z)
 R setkeyv(Z, 'V2')
 R agg - Z[, list(count=.N), by='V2']

 From here you might

 R tab1 - table(agg$count)

 I think that'll get you where you want to be ... I'm ashamed to say
 that I haven't really done much w/ aggregate since I mostly have used
 plyr and data.table like stuff, so I might be missing your end goal --
 providing a reproducible example with a small data.frame from you can
 help here (for me at least).

 HTH,
 -steve

 --
 Steve Lianoglou
 Graduate Student: Computational Systems Biology
  | Memorial Sloan-Kettering Cancer Center
  | Weill Medical College of Cornell University
 Contact Info: http://cbio.mskcc.org/~lianos/contact



--
Sam Steingold http://sds.podval.org http://www.childpsy.net/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] LiblineaR: accept sparse matrices

2012-11-09 Thread Sam Steingold

Hi,

 * Thibault Helleputte guvonhyg.uryyrch...@qanylgvpf.pbz [2012-11-09 
 09:22:11 +0100]:

 The next release of LiblineaR should offer the possibility of using
 sparse matrices. However, the next release date is not fixed yet...

thanks.

 On Thu, Nov 8, 2012 at 10:07 PM, Sam Steingold s...@gnu.org wrote:
  It would also be nice if there were functions to read/write files in the
  native liblinear file format; I am sure the original liblinear library
  provides at least the input code.

How about i/o?

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://palestinefacts.org http://dhimmi.com
http://think-israel.org http://www.memritv.org http://openvotingconsortium.org
Money does not bother me at all.  In fact, it calms me down.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] as.data.frame(do.call(rbind,lapply)) produces something weird

2012-11-09 Thread Sam Steingold

The following code:
--8---cut here---start-8---
 myfun - function (x) list(x=x,y=x*x)
 z - as.data.frame(do.call(rbind,lapply(1:3,function(x) 
 c(a=paste(a,x,sep=),as.list(unlist(list(b=myfun(x),c=myfun(x*x*x
 z
   a b.x b.y c.x c.y
1 a1   1   1   1   1
2 a2   2   4   8  64
3 a3   3   9  27 729
--8---cut here---end---8---
the appearance of z is good, but str() and summary betray some weirdness:
--8---cut here---start-8---
 str(z)
'data.frame':   3 obs. of  5 variables:
 $ a  :List of 3
  ..$ : chr a1
  ..$ : chr a2
  ..$ : chr a3
 $ b.x:List of 3
  ..$ : int 1
  ..$ : int 2
  ..$ : int 3
 $ b.y:List of 3
  ..$ : int 1
  ..$ : int 4
  ..$ : int 9
 $ c.x:List of 3
  ..$ : int 1
  ..$ : int 8
  ..$ : int 27
 $ c.y:List of 3
  ..$ : int 1
  ..$ : int 64
  ..$ : int 729
--8---cut here---end---8---
how do I ensure that the columns of z are vectors, as in
--8---cut here---start-8---
 z - 
 data.frame(a=c(a1,a2,a3),b.x=c(1,2,3),b.y=c(1,4,9),c.x=c(1,8,27),c.y=c(1,64,729))
 z
   a b.x b.y c.x c.y
1 a1   1   1   1   1
2 a2   2   4   8  64
3 a3   3   9  27 729
 str(z)
'data.frame':   3 obs. of  5 variables:
 $ a  : Factor w/ 3 levels a1,a2,a3: 1 2 3
 $ b.x: num  1 2 3
 $ b.y: num  1 4 9
 $ c.x: num  1 8 27
 $ c.y: num  1 64 729
--8---cut here---end---8---
thanks!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://jihadwatch.org http://think-israel.org
http://www.PetitionOnline.com/tap12009/ http://honestreporting.com
Programming is like sex: one mistake and you have to support it for a lifetime.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] as.data.frame(do.call(rbind, lapply)) produces something weird

2012-11-09 Thread Sam Steingold

 * arun fznegcvax...@lnubb.pbz [2012-11-09 11:33:43 -0800]:

 z2-within(z1,{b.x-as.numeric(as.character(b.x));b.y-as.numeric(as.character(b.y));c.x-as.numeric(as.character(c.x));c.y-as.numeric(as.character(c.y))})

1. I don't want to have to list all the column names explicitly

2. I find the num-char-num conversion repugnant and unacceptable.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://www.PetitionOnline.com/tap12009/
http://truepeace.org http://honestreporting.com http://ffii.org
What was the best thing before sliced bread?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] LiblineaR: accept sparse matrices

2012-11-08 Thread Sam Steingold

 * Ben Bolker ooby...@tznvy.pbz [2012-11-07 21:51:07 +]:

 Sam Steingold sds at gnu.org writes:

 It would be nice if LiblineaR() accepted data in the form of a sparse
 matrix (it does not accept whatever e1071::read.matrix.csr returns).
 
 It would also be nice if there were functions to read/write files in the
 native liblinear file format; I am sure the original liblinear library
 provides at least the input code.

   You appear to have sent this to the general R-help mailing list
 rather than to the maintainer (or maybe you Bcc'd the maintainer)?

It was CCed (not BCCed) to Thibault Helleputte thellepu...@gmail.com

   Sparse matrices are nice, but once you start using sparse matrices
 you have to start worrying about the details of which linear algebra
 operators have been defined for them (e.g. whether the available
 operators allow pivoting, or work on rank-deficient matrices, or ...)
 So it's not always as easy as flipping a switch ...

The library in question is merely a thin layes which passes the data to
the underlying C++ library. The original library comes with a command
line interface which accepts input file in sparse matrix format _ONLY_.


-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://mideasttruth.com http://dhimmi.com
http://honestreporting.com http://think-israel.org http://jihadwatch.org
XFM: Exit file manager? [Continue] [Cancel] [Abort]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matrix.csr %*% matrix -- matrix

2012-11-07 Thread Sam Steingold

 * Martin Maechler znrpu...@fgng.zngu.rgum.pu [2012-11-07 10:10:51 +0100]:

 Sam == Sam Steingold s...@gnu.org
 on Tue, 6 Nov 2012 13:08:30 -0500 writes:

 Sam The question is even more pressing for me now given that I no longer 
 can
 Sam convert some csr matrices to the regular ones for scaling.
 Sam (http://article.gmane.org/gmane.comp.lang.r.general:279305)
 Sam Any suggestions? (the original csr matrix is too large to be 
 converted
 Sam to a regular one, but the product is small enough).

  * Sam Steingold f...@tah.bet [2012-08-27 14:58:47 -0400]:
  
  When a sparse matrix is multiplied by a regular one, the result is
  usually not sparse. However, when matrix.csr is multiplied by a regular
  matrix in R, a matrix.csr is produced.
  Is there a way to avoid this?
  Thanks!

 Why don't you use the sparse matrix classes from the Matrix
 package .. which is part of every R distribution ?
 SparseM has been written as very first package to support
 sparse matrices, and is to be applauded for that,
 but it does lack many features nowadays (and also uses less
 modern algorithm for e.g. the sparse Cholesky decomposition).

Thank you very much for your advice.

I do not think I use SparseM directly.
I use e1071::read.matrix.csr and e1071::write.matrix.csr which use SparseM.
I.e., I need to be able to do i/o on files which are palatable to
libsvm/liblinear, specifically, read/write files like
--8---cut here---start-8---
1.2 2:3.5 6:5.1
2 4:6.7
8 7:6.6
--8---cut here---end---8---

As you can see from my other messages
(e.g., http://article.gmane.org/gmane.comp.lang.r.general:279387),
I am not happy with my current setup.
I would be delighted to learn that there is an alternative, but so far
the only matrix i/o I could find is Matrix::readHB and it does not
handle the libsvm/liblinear format.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://openvotingconsortium.org http://truepeace.org
http://palestinefacts.org http://camera.org http://www.memritv.org
Heck is a place for people who don't believe in gosh.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] c weirdness

2012-11-07 Thread Sam Steingold

is there a way to avoid c() appending .0 and .1 to seed?
--8---cut here---start-8---
 c(nons=1, seed=3)
nons seed   ## good!
   13 
 c(nons=1, seed=tab[1])
   nons  seed.0 ## don't want .0!
  1 2344600 
 c(nons=1, seed=tab[2])
  nons seed.1   ## don't want .1!
 1   6843 
 tab
  0   1 
23446006843 
--8---cut here---end---8---

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://pmw.org.il
http://memri.org http://ffii.org http://openvotingconsortium.org
Islam is a religion of Peace. Its adherents will kill anyone who disagrees.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] LiblineaR: accept sparse matrices

2012-11-07 Thread Sam Steingold

Thibault,

It would be nice if LiblineaR() accepted data in the form of a sparse
matrix (it does not accept whatever e1071::read.matrix.csr returns).

It would also be nice if there were functions to read/write files in the
native liblinear file format; I am sure the original liblinear library
provides at least the input code.

Thanks!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://iris.org.il http://pmw.org.il
http://ffii.org http://dhimmi.com http://www.PetitionOnline.com/tap12009/
Sex is like air.  It's only a big deal if you can't get any.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] no method for coercing this S4 class to a vector

2012-11-06 Thread Sam Steingold

The matrix z is save()d in http://sds.podval.org/data/z.
It is a product of a sparse matrix and a non-sparse matrix.
I need to scale it and write to a file in the sparse format for libsvm.

platform   x86_64-pc-linux-gnu  
arch   x86_64   
os linux-gnu
system x86_64, linux-gnu
status  
major  2
minor  15.2 
year   2012 
month  10   
day26   
svn rev61015
language   R
version.string R version 2.15.2 (2012-10-26)

Package:SparseM
Version:0.96
Author: Roger Koenker rkoen...@uiuc.edu and Pin Ng
pin...@nau.edu
Maintainer: Roger Koenker rkoen...@uiuc.edu
Depends:R (= 2.4.1), methods, stats, utils
Description:Basic linear algebra for sparse matrices
License:GPL (= 2)
Title:  Sparse Linear Algebra
URL:http://www.econ.uiuc.edu/~roger/research/sparse/sparse.html
Packaged:   2012-03-18 19:39:05 UTC; root
Repository: CRAN
Date/Publication:   2012-03-18 20:55:08
Built:  R 2.15.2; x86_64-pc-linux-gnu; 2012-11-05 17:46:36
UTC; unix



 * Sam Steingold f...@tah.bet [2012-11-05 12:40:25 -0500]:

 all of a sudden, after a SparseM upgrade(?)
 I get this error:
 str(z)
 Formal class 'matrix.csr' [package SparseM] with 4 slots
   ..@ ra   : num [1:85372672] -0.4288 0.0397 0.0104 -0.1843 -0.1203 ...
   ..@ ja   : int [1:85372672] 1 2 3 4 5 6 7 8 9 10 ...
   ..@ ia   : int [1:699777] 1 123 245 367 489 611 733 855 977 1099 ...
   ..@ dimension: int [1:2] 699776 122
 z1-as.matrix(z)
 Error in as.vector(data) : 
   no method for coercing this S4 class to a vector
 z1-scale(z)
 Error in as.vector(data) : 
   no method for coercing this S4 class to a vector

 what has happened?
 how do I scale the matrix.csr object (to be written to a file)?

 PS. write.matrix.csr is very slow: it takes
 user   system  elapsed
 1137.058  510.615 1649.925
 to write the matrix z above.

 thanks!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://openvotingconsortium.org
http://www.memritv.org http://iris.org.il http://pmw.org.il
He who laughs last thinks slowest.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] matrix.csr %*% matrix -- matrix

2012-11-06 Thread Sam Steingold

The question is even more pressing for me now given that I no longer can
convert some csr matrices to the regular ones for scaling.
(http://article.gmane.org/gmane.comp.lang.r.general:279305)
Any suggestions? (the original csr matrix is too large to be converted
to a regular one, but the product is small enough).

 * Sam Steingold f...@tah.bet [2012-08-27 14:58:47 -0400]:

 When a sparse matrix is multiplied by a regular one, the result is
 usually not sparse. However, when matrix.csr is multiplied by a regular
 matrix in R, a matrix.csr is produced.
 Is there a way to avoid this?
 Thanks!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://www.memritv.org http://think-israel.org
http://camera.org http://openvotingconsortium.org http://honestreporting.com
If you have no enemies, you are probably dead.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] write.matrix.csr data conversion

2012-11-06 Thread Sam Steingold

David,
thanks for adding the feature.

read.matrix.csr and, especially, write.matrix.csr are extremely slow:

 usersystem   elapsed
 8381.988  3810.396 12345.349

for a 2797634 x 224 matrix I have to deal with.

The help page
http://rss.acs.unt.edu/Rdoc/library/e1071/html/read.matrix.csr.html
says

David Meyer (based on C/C++-code by Chih-Chung Chang and Chih-Jen Lin)

is there any chance that you might consider replacing the R code with
the original C/C++?

Thanks a lot!


 * David Meyer qnivq.zr...@jh.np.ng [2012-08-27 22:57:17 +0200]:

 done, thanks for the suggestion.

 David

 On 2012-08-27 21:15, Sam Steingold wrote:
 * jim holtman wubyg...@tznvy.pbz [2012-08-27 14:55:08 -0400]:

 Most likely when 'y' is converted to a dataframe (not sure what the
 function 'write.matrix.csr' does since you did not say where you got
 it),

 sorry,
 library(e1071)

 '0' and '1' are converted to factors which probably show up as 1
 and 2 in the file.

 sounds reasonable, thanks.

 David, could you please add an option `fac' to `write.matrix.csr',
 similar to `read.matrix.csr' which already accepts `fac'?

 thanks!


-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://jihadwatch.org http://honestreporting.com
http://iris.org.il http://www.memritv.org http://mideasttruth.com
The only intuitive interface is the nipple.  The rest has to be learned.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] write.matrix.csr data conversion

2012-11-06 Thread Sam Steingold

Dear David,

 * David Meyer zrl...@grpuavxhz-jvra.ng [2012-11-06 19:49:15 +0100]:

 there is C-code related to *reading* in such a file, but in the
 internal libsvm-format, not the matrix.csr format.

How is the libsvm-format differ from matrix.csr format?
I actually use matrix.csr only because it prints to what libsvm can read.

 There is certainly a way to speed this up, but I am not likely to do
 this in the near future.

too bad.

 On 2012-11-06 19:15, Sam Steingold wrote:
 David,
 thanks for adding the feature.

 read.matrix.csr and, especially, write.matrix.csr are extremely slow:

   usersystem   elapsed
   8381.988  3810.396 12345.349

 for a 2797634 x 224 matrix I have to deal with.

 The help page
 http://rss.acs.unt.edu/Rdoc/library/e1071/html/read.matrix.csr.html
 says

 David Meyer (based on C/C++-code by Chih-Chung Chang and Chih-Jen Lin)

 is there any chance that you might consider replacing the R code with
 the original C/C++?

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://truepeace.org http://mideasttruth.com
http://openvotingconsortium.org http://memri.org http://pmw.org.il
Programming is like sex: one mistake and you have to support it for a lifetime.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] no method for coercing this S4 class to a vector

2012-11-05 Thread Sam Steingold

all of a sudden, after a SparseM upgrade(?)
I get this error:
 str(z)
Formal class 'matrix.csr' [package SparseM] with 4 slots
  ..@ ra   : num [1:85372672] -0.4288 0.0397 0.0104 -0.1843 -0.1203 ...
  ..@ ja   : int [1:85372672] 1 2 3 4 5 6 7 8 9 10 ...
  ..@ ia   : int [1:699777] 1 123 245 367 489 611 733 855 977 1099 ...
  ..@ dimension: int [1:2] 699776 122
 z1-as.matrix(z)
Error in as.vector(data) : 
  no method for coercing this S4 class to a vector
 z1-scale(z)
Error in as.vector(data) : 
  no method for coercing this S4 class to a vector

what has happened?
how do I scale the matrix.csr object (to be written to a file)?

PS. write.matrix.csr is very slow: it takes
user   system  elapsed
1137.058  510.615 1649.925
to write the matrix z above.

thanks!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://thereligionofpeace.com
http://iris.org.il http://jihadwatch.org
A year spent in artificial intelligence is enough to make one believe in God.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R 2.15.2 is released

2012-11-04 Thread Sam Steingold

Cool.
I have some packages installed using install.packages().
Do I need to reinstall them?

https://r-forge.r-project.org/tracker/?func=detailatid=294aid=2224group_id=61
   Not a bug: This only happens under the circumstance of a Matrix
   package installation *not* matching your R installation. In other
   words: One way to fix your problem is to re install the Matrix
   package in the version of R you are using.

So, will the bug reappear now?

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://openvotingconsortium.org
http://mideasttruth.com http://www.memritv.org
Lisp: Serious empowerment.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R 2.15.2 is released

2012-11-04 Thread Sam Steingold

 * Bert Gunter thagre.ore...@trar.pbz [2012-11-04 09:48:58 -0800]:

 ?update.packages

It is not obvious to me that this is the answer to my question.
Specifically, I have package X version 1.2.3 installed and built against
R version 2.15.1.
If 1.2.3 is the current latest version of X, then update.packages() will
_not_ try to update it, but, apparently, at least for some packages, I
do need to rebuild them against the new R version 2.15.2.

Thanks.

 On Sun, Nov 4, 2012 at 7:01 AM, Sam Steingold s...@gnu.org wrote:
 I have some packages installed using install.packages().
 Do I need to reinstall them?

 https://r-forge.r-project.org/tracker/?func=detailatid=294aid=2224group_id=61
Not a bug: This only happens under the circumstance of a Matrix
package installation *not* matching your R installation. In other
words: One way to fix your problem is to re install the Matrix
package in the version of R you are using.

 So, will the bug reappear now?

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://americancensorship.org http://palestinefacts.org
http://www.PetitionOnline.com/tap12009/ http://www.memritv.org http://memri.org
If a woman is listening to a you without interrupting, do not wake her up!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] R 2.15.2 is released

2012-11-04 Thread Sam Steingold

 * Marc Schwartz znep_fpujn...@zr.pbz [2012-11-04 12:33:20 -0600]:

 On Nov 4, 2012, at 12:22 PM, Sam Steingold s...@gnu.org wrote:

 * Bert Gunter thagre.ore...@trar.pbz [2012-11-04 09:48:58 -0800]:
 
 ?update.packages
 
 It is not obvious to me that this is the answer to my question.

 Take note of the 'checkBuilt' argument, which defaults to FALSE...

Thanks a lot!

So, what I need to do is:

update.packages(checkBuilt=TRUE, ask=FALSE,
lib.loc=.libPaths()[grep(^/home/,.libPaths())])

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://americancensorship.org http://pmw.org.il
http://iris.org.il http://camera.org http://jihadwatch.org http://dhimmi.com
Kleptomania: the ability to find stuff even before its owner loses it.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to concatenate factor vectors?

2012-10-18 Thread Sam Steingold

 * Bert Gunter thagre.ore...@trar.pbz [2012-10-17 23:21:44 -0700]:

 However, Is level 5 in 'a' the same as level 5 in 'b' ?

yes, of course.
would anyone want to _different_ factors with identical string representations?!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://pmw.org.il http://americancensorship.org
http://memri.org http://think-israel.org http://camera.org
Lisp is a language for doing what you've been told is impossible. - Kent Pitman

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to concatenate factor vectors?

2012-10-18 Thread Sam Steingold

hi Jorge,

 * Jorge I Velez wbetrvinair...@tznvy.pbz [2012-10-18 16:43:58 +1100]:

 a - factor(5:1,levels=1:9)
 b - factor(9:1,levels=1:9)
 lev - sort(unique(f - c(a, b)))
 f - factor(f, levels = lev)
 str(f)
  Factor w/ 9 levels 1,2,3,4,..: 5 4 3 2 1 9 8 7 6 5 ...

is sort(unique()) really necessary?
I think
lev - levels(a)
should be enough.

However, this does not quite do what I want.
I want a function which will _NOT_ have a non-factor vector as an
intermediate value because that would waste a LOT of memory in my case.
I want a function which will check that a and b have identical levels
(in Lisp lingo, the levels are EQ, not just EQUALP).

--8---cut here---start-8---
 a - factor(letters[sample(1:10,20,replace=TRUE)],levels=letters)
 [1] e e a b c e j d a b h i a e e g j a c e
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
 b - factor(letters[sample(1:10,30,replace=TRUE)],levels=letters)
 [1] d d f c j b d e j j g i g j j g g a j a b e d c b i i a b f
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
 c(a,b)
 [1]  5  5  1  2  3  5 10  4  1  2  8  9  1  5  5  7 10  1  3  5  4  4  6  3 10
[26]  2  4  5 10 10  7  9  7 10 10  7  7  1 10  1  2  5  4  3  2  9  9  1  2  6
 factor(letters[c(a,b)],levels=letters)
 [1] e e a b c e j d a b h i a e e g j a c e d d f c j b d e j j g i g j j g g a
[39] j a b e d c b i i a b f
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
--8---cut here---end---8---

however, this is not a direct way (unlike my unlist(list(...))):
there is an intermediate integer vector c(a,b) which is mapped to a
character vector via letters, which is converted back to integers
(==factors).

IIUC, a factor is an integer vector which knows that the integers refer
to levels.

c(a,b) creates such an integer vector.
How do I tell it that it is a factor?

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://palestinefacts.org http://www.memritv.org
http://www.PetitionOnline.com/tap12009/ http://dhimmi.com
usually: can't pay == don't buy. software: can't buy == don't pay

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to concatenate factor vectors?

2012-10-18 Thread Sam Steingold

 * R. Michael Weylandt zvpunry.jrlyn...@tznvy.pbz [2012-10-18 16:01:37 
 +0100]:

 On Thursday, October 18, 2012, Sam Steingold wrote:

  * Bert Gunter thagre.ore...@trar.pbz [2012-10-17 23:21:44 -0700]:
 
  However, Is level 5 in 'a' the same as level 5 in 'b' ?

 yes, of course.
 would anyone want to _different_ factors with identical string
 representations?!

 Off the cuff, studying education and grades: F could be a grade or a
 gender.

would you ever want to concatenate a vector of grades with a vector of
genders?
as I said elsewhere, the function which concatenates factors must check
that the levels are identical before proceeding.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://pmw.org.il http://camera.org
http://openvotingconsortium.org http://truepeace.org http://jihadwatch.org
Ernqvat guvf ivbyngrf QZPN.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to concatenate factor vectors?

2012-10-18 Thread Sam Steingold

 * Jeff Newmiller wqarj...@qpa.qnivf.pn.hf [2012-10-18 07:53:24 -0700]:

 If you HAVE defined your factors using explicit levels definitions, you
 should have no trouble combining them.

http://article.gmane.org/gmane.comp.lang.r.general:277719

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://memri.org http://iris.org.il http://pmw.org.il
http://think-israel.org http://honestreporting.com http://www.memritv.org
A person without flaws probably lacks strengths either.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to concatenate factor vectors?

2012-10-18 Thread Sam Steingold

 * William Dunlap jqha...@gvopb.pbz [2012-10-18 15:33:38 +]:

 c() has an unfortunate history.

:-)
ISTR reading in the R manual ~15(?) years ago that the language was in a
flux and one could not expect code written for the current release to
work in the next release.  I was considering R as the graphing back end
at that time, so this note turned me off.
Now it turns out that R has a legacy it cannot shake. :-)

 Or, you can decide to write  a new concatenation function
 and stop using c().

 As for EQ vs. EQUALP, don't even think of EQ in R: it doesn't make
 sense there.  identical() is a pretty quick way to check that two
 objects have identical contents.

Good! That's what I was looking for!

concatenate.factors - function (x, y) {
  stopifnot(identical(levels(x),levels(y)))
  unlist(list(x,y), use.names=FALSE)
}

This seems to do what I need.

I see that
identical(levels(concatenate.factors(a,b)),levels(a))
== TRUE
DIUC that concatenate.factors does NOT create an intermediate vector and
then re-factor it?

Thank you very much for your insight!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://jihadwatch.org http://openvotingconsortium.org
http://www.memritv.org http://memri.org http://truepeace.org
Live Lisp and prosper.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] uniq -c

2012-10-17 Thread Sam Steingold

 * Sam Steingold f...@tah.bet [2012-10-16 11:03:27 -0400]:

 I need an analogue of uniq -c for a data frame.

Summary of options:

1. William:

isFirstInRun - function(x) UseMethod(isFirstInRun)
isFirstInRun.default - function(x) c(TRUE, x[-1] != x[-length(x)])
isFirstInRun.data.frame - function(x) {
  stopifnot(ncol(x)0)
  retval - isFirstInRun(x[[1]])
  for(column in x) {
retval - retval | isFirstInRun(column)
  }
  retval
}
row.count.1 - function (x) {
  i - which(isFirstInRun(x))
  data.frame(x[i,], count=diff(c(i, 1L+nrow(x
}

147 seconds

2. http://orgmode.org/worg/org-contrib/babel/examples/Rpackage.html#sec-6-1
row.count.2 - function (x) {
  equal.to.previous - rowSums( x[2:nrow(x),] != x[1:(nrow(x)-1),] )==0
  tf.runs - rle(equal.to.previous)
  counts - c(1, unlist(mapply(function(x,y) if (y) x+1 else (rep(1,x)),
   tf.runs$length, tf.runs$value)))
  counts - counts[ c( diff( counts ) = 0, TRUE ) ]
  unique.rows - which( c(TRUE, !equal.to.previous ) )
  cbind(x[ unique.rows, ,drop=FALSE ], counts)
}

136 seconds

3. Micael: paste/strsplit

row.count.3 - function (x) {
  pa - do.call(paste,x)
  rl - rle(p)
  sp - strsplit(as.character(rl$values), )
  data.frame(user = sapply(sp,[,1),
 country = sapply(sp,[,2),
 language = sapply(sp,[,3),
 count = rl$length)
}

here I know the columns and rely on absense of spaces in values.

27 seconds.

Thanks to all who answered.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://www.PetitionOnline.com/tap12009/
http://thereligionofpeace.com http://ffii.org http://camera.org
A slave dreams not of Freedom, but of owning his own slaves.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to concatenate factor vectors?

2012-10-17 Thread Sam Steingold

How do I concatenate two vectors of factors?
--8---cut here---start-8---
 a - factor(5:1,levels=1:9)
 b - factor(9:1,levels=1:9)
 str(c(a,b))
 int [1:14] 5 4 3 2 1 9 8 7 6 5 ...
 str(unlist(list(a,b),use.names=FALSE))
 Factor w/ 9 levels 1,2,3,4,..: 5 4 3 2 1 9 8 7 6 5 ...
--8---cut here---end---8---
so, unlist(list()) works.
is there a better way or is this how this is supposed to be done?
Thanks!
-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://honestreporting.com
http://think-israel.org http://thereligionofpeace.com http://mideasttruth.com
(lisp programmers do it better)

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] uniq -c

2012-10-16 Thread Sam Steingold

, 4475376L, 4475377L, 4475378L, 4475379L, 
5500564L, 7871329L, 7871330L, 8670694L), class = data.frame)
--8---cut here---end---8---

thanks!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://thereligionofpeace.com http://dhimmi.com
http://ffii.org http://truepeace.org http://mideasttruth.com
Bus error -- please leave by the rear door.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] uniq -c

2012-10-16 Thread Sam Steingold

 * R. Michael Weylandt zvpunry.jrlyn...@tznvy.pbz [2012-10-16 16:19:27 
 +0100]:

 Have you looked at using table() directly? If I understand what you
 want correctly something like:

 table(do.call(paste, x))

I wished to avoid paste (I will have to re-split later, so it will be a
performance nightmare).

 Also, if you take a look at the development version of R, changes are
 being put in place to allow much larger data sets.

 xtabs(), although dog slow, would have footed the bill nicely:
 --8---cut here---start-8---
 x - data.frame(a=1:32,b=1:32,c=1:32,d=1:32,e=1:32)
 system.time(subset(as.data.frame(xtabs( ~. , x )), Freq != 0 ))
user  system elapsed
  12.788   4.288  17.224
 --8---cut here---end---8---

you should not need much larger data sets for this.
x is sorted.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://openvotingconsortium.org http://iris.org.il
http://www.memritv.org http://memri.org http://think-israel.org
Just because you're paranoid doesn't mean they AREN'T after you.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] uniq -c

2012-10-16 Thread Sam Steingold

 * Duncan Murdoch zheqbpu.qha...@tznvy.pbz [2012-10-16 12:47:36 -0400]:

 On 16/10/2012 12:29 PM, Sam Steingold wrote:
 x is sorted.
 sparseby(data=x, INDICES=x, FUN=nrow)

this takes forever; apparently, it does not use the fact that x is
sorted (even then - it should not take more than a few minutes)...

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://openvotingconsortium.org http://www.memritv.org
http://think-israel.org http://pmw.org.il http://thereligionofpeace.com
Save the whales, feed the hungry, free the mallocs.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cannot coerce class 'rle' into a data.frame

2012-10-16 Thread Sam Steingold

why?

 rle
Run Length Encoding
  lengths: int [1:1650061] 2 2 8 2 4 5 6 3 26 46 ...
  values : chr [1:1650061] 4bbf9e94cbceb70c BG bg 4fbbf2c67e0fb867 SK sk ...
 as.data.frame(rle)
Error in as.data.frame.default(vertices.rle) : 
  cannot coerce class 'rle' into a data.frame

it seems that

rle.df - data.frame(values=rle$values,length=rle$length)

works and DTRT.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://iris.org.il http://memri.org
http://www.PetitionOnline.com/tap12009/ http://camera.org
char*a=char*a=%c%s%c;main(){printf(a,34,a,34);};main(){printf(a,34,a,34);}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] uniq -c

2012-10-16 Thread Sam Steingold

 * Duncan Murdoch zheqbpu.qha...@tznvy.pbz [2012-10-16 12:47:36 -0400]:
 sparseby(data=x, INDICES=x, FUN=nrow)

Error in `[-.data.frame`(`*tmp*`, index, , value = list(user = c(2L,  : 
  missing values are not allowed in subscripted assignments of data frames



-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://camera.org http://mideasttruth.com
http://palestinefacts.org http://www.memritv.org http://thereligionofpeace.com
Diplomacy is the art of saying nice doggy until you can find a nice rock.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] uniq -c

2012-10-16 Thread Sam Steingold

 * Duncan Murdoch zheqbpu.qha...@tznvy.pbz [2012-10-16 14:22:51 -0400]:

 On 16/10/2012 1:46 PM, Sam Steingold wrote:
  * Duncan Murdoch zheqbpu.qha...@tznvy.pbz [2012-10-16 12:47:36 -0400]:
 
  On 16/10/2012 12:29 PM, Sam Steingold wrote:
  x is sorted.
  sparseby(data=x, INDICES=x, FUN=nrow)

 this takes forever; apparently, it does not use the fact that x is
 sorted (even then - it should not take more than a few minutes)...

 It was more or less instantaneous on the examples you posted.  It
 would be a bit more honest to say it was fast on the examples, but it
 was very slow when I ran it on my real data, which consists of
 100 cases.

sure, I did not mean any insult to your code, sorry.
all I was saying was that it was too slow for my purposes because it
ignores the fact that the data is sorted.
it turned out that paste+sort+rle+strsplit is fast enough.
(although there should be a way to avoid paste/strsplit!)
Thanks!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://camera.org http://truepeace.org
http://jihadwatch.org http://www.PetitionOnline.com/tap12009/
Every day above ground is a good day.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] what to use for sna/graphs?

2012-10-15 Thread Sam Steingold

What do people use for SNA/graph analysis in R?
So far I have been using igraph (it implements the Louvain community
detection algorithm as multilevel.community, which is the killer feature
for me).
However, igraph is severely lacking in visualization, which I also need.
graphviz  gephi are alleged to be good at visualization, but,
apparently, not so for analysis (specifically, community detection).
Also, it appears that there is no way to directly interface R to gephi
(apparently I am supposed to save graphs into csv and read them into
gephi separately), and the Rgraphviz package must be installed in a
quite unorthodox way (source(http://bioconductor.org/biocLite.R;);
biocLite(Rgraphviz)); and then it is not clear how to turn an IGRAPH
graph object into an Ragraph object which Rgraphviz can handle.

So, what/how do people use/recommend?
Thanks!
-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://www.PetitionOnline.com/tap12009/
http://jihadwatch.org http://think-israel.org http://truepeace.org
You can have it good, soon or cheap.  Pick two...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Rgraphviz: how to read a dot file?

2012-10-15 Thread Sam Steingold

The Rgraphviz package index says nothing about reading dot files.
(it has toFile to write them but no fromFile).
How do I create an Ragraph object?
(either by reading a dot file or from a list of edges with weights and
vertices with names and other attributes).
-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://truepeace.org http://americancensorship.org
http://honestreporting.com http://openvotingconsortium.org
Is there another word for synonym?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] a merge() problem

2012-10-10 Thread Sam Steingold

 * Prof Brian Ripley evc...@fgngf.bk.np.hx [2012-10-08 06:37:07 +0100]:

 On 08/10/2012 02:57, Peter Ehlers wrote:
 On 2012-10-07 14:44, Sam Steingold wrote:
 * Peter Ehlers ruy...@hpnytnel.pn [2012-10-07 10:03:42 -0700]:

 On 2012-10-07 08:34, Sam Steingold wrote:
 I know it does not look very good - using the same column names to mean
 different things in different data frames, but here you go:
 --8---cut here---start-8---
 x - data.frame(a=c(1,2,3),b=c(4,5,6))
 y - data.frame(b=c(1,2),a=c(a,b))
 merge(x,y,by.x=a,by.y=b,all.x=TRUE,suffixes=c(,y))
 a ba
 1 1 4a
 2 2 5b
 3 3 6 NA
 Warning message:
 In merge.data.frame(x, y, by.x = a, by.y = b, all.x = TRUE) :
 column name 'a' is duplicated in the result
 --8---cut here---end---8---
 why is the suffixes argument ignored?
 I mean, I expected that the second a to be a.y.

 The 'suffixes' argument refers to _non-by_ names only (as per ?merge).

 yes, but a in y is _not_ a by-name.

 Yes, it is.
 The set of by-names is the union of names specified by by.x and by.y,
 in your case: c(a, b).
 I suppose that a case could be made that ?merge does not spell that
 out sufficiently explicitly.

 It does in 'Details' (and where else would there be such a detail?)
 E.g. in R 2.15.1:

  If the remaining columns in the data frames have any common names,
  these have ‘suffixes’ (‘.x’ and ‘.y’ by default) appended to
  try to make the names of the result unique.  If this is not
  possible, an error is thrown.

 Note *remaining*, and read what comes before that.

I read the docs and re-read them after seeing your message and, with all
due respect, I fail to interpret them the way you do:
The doc speaks about columns to merge on, not column names.
I specify both by.x and by.y, thus I do not specify the column y$b.

Note, however, that I do not want the doc fixed, I want the behavior modified.
I see no advantage in the current behavior (a warning + duplicate column
names) as opposed to the behavior I expected (renaming the column in the
result to b.y).

Thanks a lot for your kind replies and insight!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://americancensorship.org http://iris.org.il
http://jihadwatch.org http://ffii.org http://truepeace.org
Never argue with the person who is preparing your parachute.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] a merge() problem

2012-10-07 Thread Sam Steingold

I know it does not look very good - using the same column names to mean
different things in different data frames, but here you go:
--8---cut here---start-8---
 x - data.frame(a=c(1,2,3),b=c(4,5,6))
 y - data.frame(b=c(1,2),a=c(a,b))
 merge(x,y,by.x=a,by.y=b,all.x=TRUE,suffixes=c(,y))
  a ba
1 1 4a
2 2 5b
3 3 6 NA
Warning message:
In merge.data.frame(x, y, by.x = a, by.y = b, all.x = TRUE) :
  column name 'a' is duplicated in the result
--8---cut here---end---8---
why is the suffixes argument ignored?
I mean, I expected that the second a to be a.y.
(when I omit suffixes, the result is the same).
Thanks.
-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://palestinefacts.org http://honestreporting.com
http://truepeace.org http://openvotingconsortium.org
My name is Deja Vu. Have we met before?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] a merge() problem

2012-10-07 Thread Sam Steingold

 * Peter Ehlers ruy...@hpnytnel.pn [2012-10-07 10:03:42 -0700]:

 On 2012-10-07 08:34, Sam Steingold wrote:
 I know it does not look very good - using the same column names to mean
 different things in different data frames, but here you go:
 --8---cut here---start-8---
 x - data.frame(a=c(1,2,3),b=c(4,5,6))
 y - data.frame(b=c(1,2),a=c(a,b))
 merge(x,y,by.x=a,by.y=b,all.x=TRUE,suffixes=c(,y))
a ba
 1 1 4a
 2 2 5b
 3 3 6 NA
 Warning message:
 In merge.data.frame(x, y, by.x = a, by.y = b, all.x = TRUE) :
column name 'a' is duplicated in the result
 --8---cut here---end---8---
 why is the suffixes argument ignored?
 I mean, I expected that the second a to be a.y.

 The 'suffixes' argument refers to _non-by_ names only (as per ?merge).

yes, but a in y is _not_ a by-name.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://americancensorship.org
http://think-israel.org http://www.memritv.org http://mideasttruth.com
Save time: send elected officials to jail directly, bypassing the office.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] max summary contradict each other

2012-09-28 Thread Sam Steingold

why does summary report max 27600 and not 27603?

 x - c(27603, 1)
 max(x)
[1] 27603
 summary(x)
   Min. 1st Qu.  MedianMean 3rd Qu.Max. 
  16902   13800   13800   20700   27600 

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://memri.org http://pmw.org.il
http://dhimmi.com http://iris.org.il http://mideasttruth.com
Vegetarians eat Vegetables, Humanitarians are scary.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] aggregate help

2012-09-23 Thread Sam Steingold

Thanks.
Why does

 aggregate(z, list(id=z$id),FUN=list)
  id id  a1  a2
1 10 10, 10, 10 a, a, b x, x, z
2 20 20, 20b, by, y
3 30 30   c   z

work, but

aggregate(z, list(id=z$id),FUN=function(l) {
  t - sort(table(l),decreasing=TRUE)
  list(length(t),t[1],names(t)[1],t[2],names(t)[2])
  })
   id id a1 a2
1 10  1  2  2
2 20  1  1  1
3 30  1  1  1
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
  corrupt data frame: columns will be truncated or padded with NAs
  
does not?
(I do not want to put the whole list of all possible values into the
return value of aggregate because I am afraid of running out of ram)

 * arun fznegcvax...@lnubb.pbz [2012-09-20 14:24:37 -0700]:

 Hi,
 Try this:

 z1-aggregate(z,list(id=z$id),FUN=paste,sep=,)
 dat1-data.frame(id=z1[,1],a1.total=unlist(lapply(z1[,3],length)),a1.val1=unique(z$a1),a1.num=unlist(lapply(lapply(z1[,3],table),`[`,1)),a1.val2=unlist(lapply(z1[,3],`[`,3)),a1.num2=unlist(lapply(lapply(z1[,3],table),`[`,2)),a2.total=unlist(lapply(z1[,4],length)),a2.val1=unique(z$a2),a2.num=unlist(lapply(lapply(z1[,4],table),`[`,1)),a2.val2=unlist(lapply(z1[,4],`[`,3)),a2.num2=unlist(lapply(lapply(z1[,4],table),`[`,2)))
 dat1

 # id a1.total a1.val1 a1.num a1.val2 a1.num2 a2.total a2.val1 a2.num a2.val2
 #0 10    3   a  2   b   1    3   x  2   z
 #1 20    2   b  2    NA  NA    2   y  2    NA
 #2 30    1   c  1    NA  NA    1   z  1    NA
 #  a2.num2
 #0   1
 #1  NA
 #2  NA
 #It is not an elegant way!


 A.K.



 - Original Message -
 From: Sam Steingold s...@gnu.org
 To: r-help@r-project.org
 Cc: 
 Sent: Thursday, September 20, 2012 2:06 PM
 Subject: [R] aggregate help

 I want to count attributes of IDs:
 z - data.frame(id=c(10,20,10,30,10,20),
                 a1=c(a,b,a,c,b,b),
                 a2=c(x,y,x,z,z,y),
                 stringsAsFactors=FALSE)
 z
   id a1 a2
 1 10  a  x
 2 20  b  y
 3 10  a  x
 4 30  c  z
 5 10  b  z
 6 20  b  y
 I want to get something like
 id a1.tot a1.val1 a1.num1 a1.val2 a1.num2 a2.tot a2.val1 a2.num1 a2.val2 
 a2.num2
 10   3     a      2      b      1       3      x     2       z     1
 20   2     b      2      NA     0       2      y     2       NA    0
 30   1     c      1      NA     0       1      z     1       NA    0
 (except that I don't care what appears in the cells marked with NA)
 I tried this:
 aggregate(z,by=list(id=z$id),function (s) {
   t - sort(table(s),decreasing=TRUE)
   if (length(t) == 1)
     list(length(s),names(t)[1],t[1],junk,0)
   else
     list(length(s),names(t)[1],t[1],names(t)[2],t[2])
 })
   id id a1 a2
 1 10  3  3  3
 2 20  2  2  2
 3 30  1  1  1
 Warning message:
 In format.data.frame(x, digits = digits, na.encode = FALSE) :
   corrupt data frame: columns will be truncated or padded with NAs
 Thanks!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://mideasttruth.com http://think-israel.org
http://jihadwatch.org http://palestinefacts.org http://iris.org.il
Bill Gates is great, as long as `bill' is a verb.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] aggregate help

2012-09-20 Thread Sam Steingold

I want to count attributes of IDs:
--8---cut here---start-8---
z - data.frame(id=c(10,20,10,30,10,20),
a1=c(a,b,a,c,b,b),
a2=c(x,y,x,z,z,y),
stringsAsFactors=FALSE)
 z
  id a1 a2
1 10  a  x
2 20  b  y
3 10  a  x
4 30  c  z
5 10  b  z
6 20  b  y
--8---cut here---end---8---
I want to get something like
--8---cut here---start-8---
id a1.tot a1.val1 a1.num1 a1.val2 a1.num2 a2.tot a2.val1 a2.num1 a2.val2 a2.num2
10   3 a  2  b  1   3  x 2   z 1
20   2 b  2  NA 0   2  y 2   NA0
30   1 c  1  NA 0   1  z 1   NA0
--8---cut here---end---8---
(except that I don't care what appears in the cells marked with NA)
I tried this:
--8---cut here---start-8---
aggregate(z,by=list(id=z$id),function (s) {
  t - sort(table(s),decreasing=TRUE)
  if (length(t) == 1)
list(length(s),names(t)[1],t[1],junk,0)
  else
list(length(s),names(t)[1],t[1],names(t)[2],t[2])
 })
  id id a1 a2
1 10  3  3  3
2 20  2  2  2
3 30  1  1  1
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
  corrupt data frame: columns will be truncated or padded with NAs
--8---cut here---end---8---
Thanks!
-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://www.memritv.org http://palestinefacts.org
http://pmw.org.il http://dhimmi.com http://jihadwatch.org http://ffii.org
I'm out of my mind, but feel free to leave a message...

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] drop zero slots from table?

2012-09-19 Thread Sam Steingold

I find myself doing
--8---cut here---start-8---
tab - table(...)
tab - tab[tab  0]
tab - sort(tab,decreasing=TRUE)
--8---cut here---end---8---
all the time.
I am wondering if the drop 0 (and maybe even sort?) can be effected by
some magic argument to table() which I fail to discover in the docs?
Obviously, I could use droplevels() to avoid 0 counts in the first
place, but I do not want to drop the levels in the data.
-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://ffii.org http://truepeace.org
http://www.memritv.org http://honestreporting.com http://dhimmi.com
MS Windows: error: the operation completed successfully.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] drop zero slots from table?

2012-09-19 Thread Sam Steingold

Function
--8---cut here---start-8---
sorted.table - function (vec) {
  tab - table(vec)
  tab - tab[tab  0]
  sort(tab, decreasing=TRUE)
}
--8---cut here---end---8---
does what I want but it prints vec instead of the name of its
argument:
--8---cut here---start-8---
 sorted.table(foo$bar)
vec
  A  B
  10 3
--8---cut here---end---8---
how do I pass all arguments of sorted.table() on to table() as is?
thanks!

 * Sam Steingold f...@tah.bet [2012-09-19 11:51:08 -0400]:

 I find myself doing
 tab - table(...)
 tab - tab[tab  0]
 tab - sort(tab,decreasing=TRUE)
 all the time.
 I am wondering if the drop 0 (and maybe even sort?) can be effected by
 some magic argument to table() which I fail to discover in the docs?
 Obviously, I could use droplevels() to avoid 0 counts in the first
 place, but I do not want to drop the levels in the data.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://memri.org http://thereligionofpeace.com
http://www.PetitionOnline.com/tap12009/ http://dhimmi.com
Beauty is only a light switch away.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] drop zero slots from table?

2012-09-19 Thread Sam Steingold

cool, thanks!
Still, I wonder if there is a way to pass all args as is from a function
downward (like in a lisp macro); something like

sorted.table - function (...) { tab - table(...); ... }

 * William Dunlap jqha...@gvopb.pbz [2012-09-19 16:26:08 +]:

 Here is one way:

 sorted.table - function(x, name = if (is.list(x))names(x) else 
 deparse(substitute(x))[1]) {
 +tab - table(x)
 +names(dimnames(tab)) - name
 +tab - tab[tab  0]
 +sort(tab, decreasing=TRUE)
 + }
 digits - factor(trunc(100*log2(1:20)%%.1), levels=0:9)
 sorted.table(digits)
 digits
 0 8 2 6 4 5
 9 4 3 2 1 1
 sorted.table(data.frame(DigitsColumn=digits))
 DigitsColumn
 0 8 2 6 4 5
 9 4 3 2 1 1
 sorted.table(digits, name=My Digits)
 My Digits
 0 8 2 6 4 5
 9 4 3 2 1 1

 Bill Dunlap
 Spotfire, TIBCO Software
 wdunlap tibco.com


 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
 Behalf
 Of Sam Steingold
 Sent: Wednesday, September 19, 2012 9:13 AM
 To: r-help@r-project.org
 Subject: Re: [R] drop zero slots from table?
 
 Function
 --8---cut here---start-8---
 sorted.table - function (vec) {
   tab - table(vec)
   tab - tab[tab  0]
   sort(tab, decreasing=TRUE)
 }
 --8---cut here---end---8---
 does what I want but it prints vec instead of the name of its
 argument:
 --8---cut here---start-8---
  sorted.table(foo$bar)
 vec
   A  B
   10 3
 --8---cut here---end---8---
 how do I pass all arguments of sorted.table() on to table() as is?
 thanks!
 
  * Sam Steingold f...@tah.bet [2012-09-19 11:51:08 -0400]:
 
  I find myself doing
  tab - table(...)
  tab - tab[tab  0]
  tab - sort(tab,decreasing=TRUE)
  all the time.
  I am wondering if the drop 0 (and maybe even sort?) can be effected by
  some magic argument to table() which I fail to discover in the docs?
  Obviously, I could use droplevels() to avoid 0 counts in the first
  place, but I do not want to drop the levels in the data.
 
 --
 Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 
 11.0.11103000
 http://www.childpsy.net/ http://memri.org http://thereligionofpeace.com
 http://www.PetitionOnline.com/tap12009/ http://dhimmi.com
 Beauty is only a light switch away.
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://thereligionofpeace.com http://mideasttruth.com
http://palestinefacts.org http://openvotingconsortium.org http://truepeace.org
If you will not bring your husband coffee in bed, his day may start with a beer.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] where are these NAs coming from?

2012-09-19 Thread Sam Steingold

I see this:
--8---cut here---start-8---
 length(which(is.na(z$language)))
[1] 0
 locals - z[z$country == mycountry,]
 length(which(is.na(locals$language)))
[1] 229
--8---cut here---end---8---
where are those locals without the language coming from?!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://ffii.org http://honestreporting.com
http://camera.org http://www.memritv.org http://dhimmi.com
I don't like cats! -- Come on, you just don't know how to cook them!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] where are these NAs coming from?

2012-09-19 Thread Sam Steingold

Thanks, Sarah, your answer is, indeed, revealing:
--8---cut here---start-8---
 z - data.frame(a=c(1,2,3),b=c(5,6,NA))
 z
  a  b
1 1  5
2 2  6
3 3 NA
 z[z$b==6,]
a  b
2   2  6
NA NA NA
--8---cut here---end---8---
why do I get an extra all NA row?


 * Sarah Goslee fnenu.tbf...@tznvy.pbz [2012-09-19 13:54:56 -0400]:

 Well, you have no reproducible example, but I suspect either of these
 will fix it:

 locals - z[z$country == mycountry  !is.na(z$country),]

 locals - subset(z, country == mycountry)

 Sarah

 On Wed, Sep 19, 2012 at 1:50 PM, Sam Steingold s...@gnu.org wrote:
 I see this:
 --8---cut here---start-8---
 length(which(is.na(z$language)))
 [1] 0
 locals - z[z$country == mycountry,]
 length(which(is.na(locals$language)))
 [1] 229
 --8---cut here---end---8---
 where are those locals without the language coming from?!


-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://americancensorship.org
http://honestreporting.com http://truepeace.org http://ffii.org
.ACMD setaloiv siht gnidaeR

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] drop zero slots from table?

2012-09-19 Thread Sam Steingold

 * William Dunlap jqha...@gvopb.pbz [2012-09-19 18:20:50 +]:

 Why don't you try that and tell us if it works?

Because in my wildest dreams it did not occur to me that this could be
valid code in any programming language.
It appears to be valid R, which seems to be out-perling Perl at every turn.
However, it does not do what I want: it does not result in the right
name for the returned table.

Thanks a lot for your insight!


 -Original Message-
 From: Sam Steingold [mailto:sam.steing...@gmail.com] On Behalf Of Sam 
 Steingold
 Sent: Wednesday, September 19, 2012 10:48 AM
 To: r-help@r-project.org; William Dunlap
 Subject: Re: drop zero slots from table?
 
 cool, thanks!
 Still, I wonder if there is a way to pass all args as is from a function
 downward (like in a lisp macro); something like
 
 sorted.table - function (...) { tab - table(...); ... }
 
  * William Dunlap jqha...@gvopb.pbz [2012-09-19 16:26:08 +]:
 
  Here is one way:
 
  sorted.table - function(x, name = if (is.list(x))names(x) else 
  deparse(substitute(x))[1]) {
  +tab - table(x)
  +names(dimnames(tab)) - name
  +tab - tab[tab  0]
  +sort(tab, decreasing=TRUE)
  + }
  digits - factor(trunc(100*log2(1:20)%%.1), levels=0:9)
  sorted.table(digits)
  digits
  0 8 2 6 4 5
  9 4 3 2 1 1
  sorted.table(data.frame(DigitsColumn=digits))
  DigitsColumn
  0 8 2 6 4 5
  9 4 3 2 1 1
  sorted.table(digits, name=My Digits)
  My Digits
  0 8 2 6 4 5
  9 4 3 2 1 1
 
  Bill Dunlap
  Spotfire, TIBCO Software
  wdunlap tibco.com
 
 
  -Original Message-
  From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] 
  On
 Behalf
  Of Sam Steingold
  Sent: Wednesday, September 19, 2012 9:13 AM
  To: r-help@r-project.org
  Subject: Re: [R] drop zero slots from table?
 
  Function
  --8---cut here---start-8---
  sorted.table - function (vec) {
tab - table(vec)
tab - tab[tab  0]
sort(tab, decreasing=TRUE)
  }
  --8---cut here---end---8---
  does what I want but it prints vec instead of the name of its
  argument:
  --8---cut here---start-8---
   sorted.table(foo$bar)
  vec
A  B
10 3
  --8---cut here---end---8---
  how do I pass all arguments of sorted.table() on to table() as is?
  thanks!
 
   * Sam Steingold f...@tah.bet [2012-09-19 11:51:08 -0400]:
  
   I find myself doing
   tab - table(...)
   tab - tab[tab  0]
   tab - sort(tab,decreasing=TRUE)
   all the time.
   I am wondering if the drop 0 (and maybe even sort?) can be effected by
   some magic argument to table() which I fail to discover in the docs?
   Obviously, I could use droplevels() to avoid 0 counts in the first
   place, but I do not want to drop the levels in the data.
 
  --
  Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 
  11.0.11103000
  http://www.childpsy.net/ http://memri.org http://thereligionofpeace.com
  http://www.PetitionOnline.com/tap12009/ http://dhimmi.com
  Beauty is only a light switch away.
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide 
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 
 
 --
 Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 
 11.0.11103000
 http://www.childpsy.net/ http://thereligionofpeace.com 
 http://mideasttruth.com
 http://palestinefacts.org http://openvotingconsortium.org 
 http://truepeace.org
 If you will not bring your husband coffee in bed, his day may start with a 
 beer.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://truepeace.org http://iris.org.il
http://thereligionofpeace.com http://palestinefacts.org
Feynman: 'Philosophy of science is as useful to scientists as ornithology is to 
birds'

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] where are these NAs coming from?

2012-09-19 Thread Sam Steingold

 * jim holtman wubyg...@tznvy.pbz [2012-09-19 13:58:08 -0400]:

 At least provide a reproducible example by creating the problem with a
 subset of 'z' and 'mycountry'

if I knew how to reproduce the problem, I would have known what was going on.

 Could something like this be happening?

precisely, thanks!

 x - data.frame(country = 1:5, language = 1:5)
 mycountry - NA
 z - x[x$country == mycountry,]
 z
  country language
 NANA   NA
 NA.1  NA   NA
 NA.2  NA   NA
 NA.3  NA   NA
 NA.4  NA   NA


 On Wed, Sep 19, 2012 at 1:50 PM, Sam Steingold s...@gnu.org wrote:
 I see this:
 --8---cut here---start-8---
 length(which(is.na(z$language)))
 [1] 0
 locals - z[z$country == mycountry,]
 length(which(is.na(locals$language)))
 [1] 229
 --8---cut here---end---8---
 where are those locals without the language coming from?!

 --
 Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 
 11.0.11103000
 http://www.childpsy.net/ http://ffii.org http://honestreporting.com
 http://camera.org http://www.memritv.org http://dhimmi.com
 I don't like cats! -- Come on, you just don't know how to cook them!

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://ffii.org http://camera.org http://jihadwatch.org
http://americancensorship.org http://mideasttruth.com
Independence: nobody pays for you.  Liberty: nobody thinks for you.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] multi-column factor

2012-09-16 Thread Sam Steingold

I have a data frame with columns which draw on the same underlying
universe, so I want them to be factors with the same level set:

--8---cut here---start-8---
 z - data.frame(a=c(a,b,c),b=c(b,c,d),stringsAsFactors=FALSE)
 str(z)
'data.frame':   3 obs. of  2 variables:
 $ a: chr  a b c
 $ b: chr  b c d
 z$a - factor(z$a,levels=union(z$a,z$b))
 z$b - factor(z$b,levels=union(z$a,z$b))
 str(z)
'data.frame':   3 obs. of  2 variables:
 $ a: Factor w/ 4 levels a,b,c,d: 1 2 3
 $ b: Factor w/ 4 levels a,b,c,d: 2 3 4
--8---cut here---end---8---
factor(z$a,levels=union(z$a,z$b))
is factor(z$a,levels=union(z$a,z$b)) the right way to handle this?
maybe there is a better way to extract levels than union()?
(bear in mind that I have ~10M rows and ~1M levels, so performance is an
issue).

Thanks!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://iris.org.il http://honestreporting.com
http://camera.org http://www.memritv.org http://jihadwatch.org
When you talk to God, it's prayer; when He talks to you, it's schizophrenia.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] sum(table(v)) == length(v)

2012-09-16 Thread Sam Steingold

Is it possible to violate the identity sum(table(v)) == length(v) ??
I see no way to do that and it holds in my small examples,
but it is violated in the huge set I have:

system.time(z - unique(data.frame(u=U,s=S)))
tab1 - table(z$u)
tab1 - tab1[tab10] # S is factor so some counts were 0
tab2 - table(z$s)
stopifnot(length(tab2) == nrow(z)) # yes
stopifnot(sum(tab1) == nrow(z))  ### no!
sum(tab1)
728587
length(tab1)
503374
length(tab2)
2112951

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://think-israel.org http://americancensorship.org
http://ffii.org http://memri.org http://jihadwatch.org http://pmw.org.il
Live Lisp and prosper.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] please comment on my function

2012-09-14 Thread Sam Steingold

this function is supposed to canonicalize the language:

--8---cut here---start-8---
canonicalize.language - function (s) {
  s - tolower(s)
  long - nchar(s) == 5
  s[long] - sub(^([a-z]{2})[-_][a-z]{2}$,\\1,s[long])
  s[nchar(s) != 2  s != c] - unknown
  s
}
canonicalize.language(c(aa,bb-cc,DD-abc,eee,ff_FF,C))
[1] aa  bb  unknown unknown ff  c  
--8---cut here---end---8---

it does what I want it to do, but it takes 4.5 seconds on a vector of
length 10,256,341 - I wonder if I might be doing something aufully stupid.
I thought that sub() was slow, but my second attempt:
--8---cut here---start-8---
canonicalize.language - function (s) {
  s - tolower(s)
  good - nchar(s) == 5  substr(s,3,3) %in% c(_,-)
  s[good] - substr(s[good],1,2)
  s[nchar(s) != 2  s != c] - unknown
  s
}
--8---cut here---end---8---
was even slower (6.4 sec).

My two concerns are:

1. avoid allocating many small objects which are never collected
2. run fast

Which would be the best implementation?

Thanks a lot for your insight!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://think-israel.org http://openvotingconsortium.org
http://memri.org http://camera.org http://truepeace.org
WHO ATE MY BREAKFAST PANTS?

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] please comment on my function

2012-09-14 Thread Sam Steingold

 * jim holtman wubyg...@tznvy.pbz [2012-09-14 13:10:37 -0400]:

 more than half the time is in 'tolower' and 'nchar', so it is not all
 'sub's problem.

aha, thanks!

 This version runs a little faster since it does not need the 'tolower':

 canonicalize.language - function (s) {
   # s - tolower(s)
   long - nchar(s) == 5
   s[long] - sub(^([[:alpha:]]{2})[-_][[:alpha:]]{2}$,\\1,s[long])
   s[nchar(s) != 2  s != c] - unknown
   s
 }

but it does not convert EN to en, so it is not good for my purposes.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://thereligionofpeace.com http://mideasttruth.com
http://iris.org.il http://honestreporting.com http://memri.org
Life is like Tetris: failures accumulate, successes fade.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] aggregate() runs out of memory

2012-09-14 Thread Sam Steingold

I have a large data.frame Z (2,424,185,944 bytes, 10,256,441 rows, 17 columns).
I want to get the result of
table(aggregate(Z$V1, FUN = length, by = list(id=Z$V2))$x)
alas, aggregate has been running for ~30 minute, RSS is 14G, VIRT is
24.3G, and no end in sight.
both V1 and V2 are characters (not factors).
Is there anything I could do to speed this up?
Thanks.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://www.PetitionOnline.com/tap12009/
http://dhimmi.com http://think-israel.org http://iris.org.il
WinWord 6.0 UNinstall: Not enough disk space to uninstall WinWord

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] cannot read iso639 table

2012-09-13 Thread Sam Steingold

line 109 did not have 5 elements ... but it did!
empty beginning of file ... but it's not!

details:
--8---cut here---start-8---
get.language.ISO.table - function () {
  socket - url(http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt;,
open=r,encoding=utf-8);
  data - read.table(socket, as.is = TRUE, sep = |, header = FALSE,
 col.names = c(a3bibliographic,a3terminologic,
   a2,english,french));
  close(socket);
  data
}
language.ISO.table - get.language.ISO.table()

Error in read.table(socket, as.is = TRUE, sep = |, header = FALSE,
  col.names = c(a3bibliographic, : 
  empty beginning of file
--8---cut here---end---8---
the first line is _not_ blank, as one can see by downloading the
file with wget
  
In addition:
--8---cut here---start-8---
Warning messages:
1: In read.table(socket, as.is = TRUE, sep = |, header = FALSE, col.names = 
c(a3bibliographic,  :
  invalid input found on input connection 
'http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt'
--8---cut here---end---8---
what is invalid there? libreoffice calc opened the file just fine.

--8---cut here---start-8---
2: In read.table(socket, as.is = TRUE, sep = |, header = FALSE, col.names = 
c(a3bibliographic,  :
  incomplete final line found by readTableHeader on 
'http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt'
--8---cut here---end---8---
indeed the final NL is missing. why is this a big deal?

when I download the file:

--8---cut here---start-8---
read.table(ISO-639-2_utf-8.csv,encoding=utf-8, as.is = TRUE,
   sep = |, header = FALSE,
col.names = c(a3bibliographic,a3terminologic,
   a2,english,french))
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
  line 109 did not have 5 elements
--8---cut here---end---8---

however
--8---cut here---start-8---
 l - readLines(ISO-639-2_utf-8.csv,encoding=utf-8)
Warning message:
In readLines(ISO-639-2_utf-8.csv, encoding = utf-8) :
  incomplete final line found on 'ISO-639-2_utf-8.csv'
 l[108:110]
[1] dgr|||Dogrib|dogrib 
[2] din|||Dinka|dinka   
[3] div||dv|Divehi; Dhivehi; Maldivian|maldivien
--8---cut here---end---8---
all lines look legit to me.

so, why can't I read the file?

thanks.

ps. ubuntu; R 2.15.1 (2012-06-22) installed from cran using aptitude.
-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://dhimmi.com http://memri.org
http://ffii.org http://think-israel.org http://honestreporting.com
The past is gone, the present is ephemeral, the future is a guess.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] cannot read iso639 table

2012-09-13 Thread Sam Steingold

 * William Dunlap jqha...@gvopb.pbz [2012-09-13 19:50:21 +]:

 On Windows with R-2.15.1 in a 1252 locale, I had to read (and toss) out
 the initial 3 bytes (the byte-order mark?) to make things work:

socket -

 url(http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt,open=r,encoding=utf-8;)
readChar(socket, nchars=3, useBytes=TRUE)
   [1] ï»¿

confirmed - first 3 bytes are \357\273\277

d - read.table(socket, quote=, sep=|, stringsAsFactors=FALSE)
dim(d)
   [1] 485   5
head(d)
  V1 V2 V3 V4  V5
   1 aaraa   Afarafar
   2 abkab  Abkhazian abkhaze
   3 ace Achineseaceh
   4 achAcoli   acoli
   5 ada  Adangme adangme
   6 ady   Adyghe; Adygei  adyghé

alas, this is all I get:

Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
  invalid input found on input connection 
'http://www.loc.gov/standards/iso639-2/ISO-639-2_utf-8.txt'

  a3bibliographic a3terminologic a2english  french
1 aar NA aa   Afarafar
2 abk NA ab  Abkhazian abkhaze
3 ace NA  Achineseaceh
4 ach NA Acoli   acoli
5 ada NA   Adangme adangme
6 ady NAAdyghe; Adygei   adygh

note that the first non-ASCII character terminates the input.

so, I still cannot read the data from the URL.

I can read the file though - with quote= (thanks Peter!) -
except that the first record is \357\273\277aar.


-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://thereligionofpeace.com
http://mideasttruth.com http://iris.org.il http://jihadwatch.org
The only thing worse than X Windows: (X Windows) - X

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merge a list of data frames

2012-09-06 Thread Sam Steingold

 * David Winsemius qjvafrz...@pbzpnfg.arg [2012-09-05 21:02:16 -0700]:

 On Sep 5, 2012, at 8:51 PM, Sam Steingold wrote:

 I have a list of data frames:
 
 str(data)
 List of 4
 $ :'data.frame': 700773 obs. of  3 variables:
  ..$ V1: chr [1:700773] 200130446465779 200070050127778
 200030633708779 200010587002779 ...
  ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ V3: num [1:700773] 1 1 1 1 1 ...
 $ :'data.frame': 700773 obs. of  3 variables:
  ..$ V1: chr [1:700773] 200130446465779 200070050127778
 200030633708779 200010587002779 ...
  ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ V3: num [1:700773] 1 1 1 1 1 ...
 $ :'data.frame': 700773 obs. of  3 variables:
  ..$ V1: chr [1:700773] 200130446465779 200070050127778
 200030633708779 200010587002779 ...
  ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ V3: num [1:700773] 1 1 1 1 1 ...
 $ :'data.frame': 700773 obs. of  3 variables:
  ..$ V1: chr [1:700773] 200160325893778 200130647544079
 200130446465779 200120186959078 ...
  ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ V3: num [1:700773] 1 1 1 1 1 1 1 1 1 1 ...
 
 I want to merge them.

 Why? What are you expecting?

these are the results of applying a model to the test data.
the first column is the ID
the second column is the actual value
the third column is the model score

after I will merge the frames, I will
1. check that all the V2 columns are identical and drop all but one
(I guess I could just merge on c(V1,V2) instead, right?)

2. compute the sum (or the mean, whatever is easier) of all the V3
columns

3. sort by the sum/mean of the V3 columns and evaluate the combined
model using the lift quality metric
(http://dl.acm.org/citation.cfm?id=380995.381018)

I have many more score files (not just 4), so it is not practical for me
to rename the column to something unique.



-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://www.memritv.org http://truepeace.org
http://jihadwatch.org http://mideasttruth.com http://americancensorship.org
To be popular with ladies one has to be smart, handsome  rich. Or to be a cat.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] merge a list of data frames

2012-09-06 Thread Sam Steingold

 * David Winsemius qjvafrz...@pbzpnfg.arg [2012-09-06 10:30:16 -0700]:

 these are the results of applying a model to the test data.
 the first column is the ID

 In which case you should be using the 'by' argument to `merge`

I already do! see my initial message!

 3. sort by the sum/mean of the V3 columns and evaluate the combined
 model using the lift quality metric
 (http://dl.acm.org/citation.cfm?id=380995.381018)

 That's going to require more background (or more money since they want $15.00 
 for a pdf.

:-)
that I have already implemented, works just fine:

proficiency - function (actual, prediction) {
  proficiency1(ea = entropy(table(actual)),
   ep = entropy(table(prediction)),
   ej = entropy(table(actual,prediction)))
}

proficiency1 - function (ea, ep, ej) {
  mi - ea + ep - ej
  list(joint = ej, actual = ea, prediction = ep, mutual = mi,
   proficiency = mi / ea, dependency = mi / ej)
}

detector.statistics - function (tp,fn,fp,tn) {
  observationCount - tp + fn + fp + tn
  predictedPositive - tp + fp
  predictedNegative - fn + tn
  actualPositive - tp + fn
  actualNegative - fp + tn
  correct - tp + tn
  list(baseRate = actualPositive / observationCount,
   precision = if (tp == 0) 0 else tp / predictedPositive,
   specificity = if (tn == 0) 0 else tn / actualNegative,
   recall = if (tp == 0) 0 else tp / actualPositive,
   accuracy = correct / observationCount,
   lift = (tp * observationCount) / (predictedPositive * actualPositive),
   f1score = if (tp == 0) 0 else 2 * tp / (2 * tp + fp + fn),
   proficiency = proficiency1(ej = entropy(c(tp,fn,fp,tn)),
 ea = entropy(c(actualPositive,actualNegative)),
 ep = entropy(c(predictedPositive,predictedNegative
}

## v should be vector of 01 sorted according to some model
## Gregory Piatetsky-Shapiro, Samuel Steingold
## Measuring Lift Quality in Database Marketing
## http://sds.podval.org/data/l-quality.pdf
## http://www.sigkdd.org/explorations/issues/2-2-2000-12/piatetsky-shapiro.pdf
## SIGKDD Explorations, Vol. 2:2, (2000), 81-86
## tests: lift.quality(rbinom(1,size=1,prob=0.1)) == ~0
##lift.quality(rev(round((1:1)/12000))) == 1
lift.quality - function (v, plot = TRUE, file = NULL, main = lift curve, 
thresholds = NULL) {
  target.count - sum(v)
  total.count - length(v)
  base.rate - target.count / total.count
  target.level - cumsum(v)/target.count
  lq - ((2*sum(target.level) - 1)/total.count - 1) / (1 - base.rate)
  if (plot) {
if (!is.null(file)) {
  pdf(file = file)
  on.exit(dev.off())
}
plot(x=(1:total.count)/total.count,y=target.level,type=l,
 main=paste(main,( lift quality ,lq,)),
 xlab=% cutoff,ylab=cumulative % hit)
  }
  if (is.null(thresholds)) thresholds = c(base.rate)
  list(lift.quality = lq,
   detector.statistics = sapply(thresholds, function (l) {
 cutoff - round(l * total.count)
 tp - round(target.level[cutoff] * target.count) # = sum(v[1:cutoff])
 fn - target.count - tp
 fp - cutoff - tp
 tn - total.count - target.count - cutoff + tp
 detector.statistics(tp, fn, fp, tn)
   }))
}



 I have many more score files (not just 4), so it is not practical for me
 to rename the column to something unique.

 Which column?

the 3rd (score) column.

Meanwhile I realised that the fastest way is actuall shell:
sort+cut+paste produced the csv file which can be loaded into R much
faster than the individual score files, so this issue is now purely
academic.  However, I appreciate the replies I got so far, it was quite
educational, thanks!
(I also appreciate comments on the code above)

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://www.memritv.org http://truepeace.org
http://openvotingconsortium.org http://ffii.org http://mideasttruth.com
Save your burned out bulbs for me, I'm building my own dark room.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] merge a list of data frames

2012-09-05 Thread Sam Steingold

I have a list of data frames:

 str(data)
List of 4
 $ :'data.frame':   700773 obs. of  3 variables:
  ..$ V1: chr [1:700773] 200130446465779 200070050127778 200030633708779 
200010587002779 ...
  ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ V3: num [1:700773] 1 1 1 1 1 ...
 $ :'data.frame':   700773 obs. of  3 variables:
  ..$ V1: chr [1:700773] 200130446465779 200070050127778 200030633708779 
200010587002779 ...
  ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ V3: num [1:700773] 1 1 1 1 1 ...
 $ :'data.frame':   700773 obs. of  3 variables:
  ..$ V1: chr [1:700773] 200130446465779 200070050127778 200030633708779 
200010587002779 ...
  ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ V3: num [1:700773] 1 1 1 1 1 ...
 $ :'data.frame':   700773 obs. of  3 variables:
  ..$ V1: chr [1:700773] 200160325893778 200130647544079 200130446465779 
200120186959078 ...
  ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
  ..$ V3: num [1:700773] 1 1 1 1 1 1 1 1 1 1 ...

I want to merge them.
I tried to follow
http://rwiki.sciviews.org/doku.php?id=tips%3adata-frames%3amerge
and did:

 data.1 - Reduce(function(f1,f2) merge(f1,f2,by=c(V1),all=TRUE), data)
Warning message:
In merge.data.frame(f1, f2, by = c(V1), all = TRUE) :
  column names 'V2.x', 'V3.x', 'V2.y', 'V3.y' are duplicated in the result
 str(data.1)
'data.frame':   700773 obs. of  9 variables:
 $ V1  : chr  10001099079 10001254078 10001499078 
10001541779 ...
 $ V2.x: int  0 0 0 0 0 0 0 0 0 0 ...
 $ V3.x: num  0.476 0.748 0.442 0.483 0.577 ...
 $ V2.y: int  0 0 0 0 0 0 0 0 0 0 ...
 $ V3.y: num  0.476 0.748 0.442 0.483 0.577 ...
 $ V2.x: int  0 0 0 0 0 0 0 0 0 0 ...
 $ V3.x: num  0.476 0.752 0.443 0.485 0.578 ...
 $ V2.y: int  0 0 0 0 0 0 0 0 0 0 ...
 $ V3.y: num  0.47 0.733 0.57 0.416 0.616 ...

I don't like the warning and I don't like that I now have to use [n] to
access identically named columns, but, I guess, this is better than
this:

library('reshape')

 data.1 - merge_all(data,by=V1,all=TRUE)
Error in merge.data.frame(dfs[[1]], Recall(dfs[-1]), all = TRUE, sort = FALSE,  
: 
  formal argument all matched by multiple actual arguments
 data.1 - merge_all(data,by=V1,sort=TRUE,all=TRUE)
Error in merge.data.frame(dfs[[1]], Recall(dfs[-1]), all = TRUE, sort = FALSE,  
: 
  formal argument all matched by multiple actual arguments
 data.1 - merge_all(data,by=V1,sort=TRUE)
Error in merge.data.frame(dfs[[1]], Recall(dfs[-1]), all = TRUE, sort = FALSE,  
: 
  formal argument sort matched by multiple actual arguments
 data.1 - merge_all(data,by=V1)
Error in `[.data.frame`(df, , match(names(dfs[[1]]), names(df))) : 
  undefined columns selected
 data.1 - merge_all(data,by=c(V1))
Error in `[.data.frame`(df, , match(names(dfs[[1]]), names(df))) : 
  undefined columns selected

what does 'formal argument sort matched by multiple actual arguments' mean?

thanks.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://ffii.org http://pmw.org.il
http://dhimmi.com http://palestinefacts.org http://iris.org.il
I just forgot my whole philosophy of life!!!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply -- data.frame

2012-08-31 Thread Sam Steingold

 * David Winsemius qjvafrz...@pbzpnfg.arg [2012-08-30 10:14:34 -0700]:

 str( as.data.frame( do.call(rbind, strsplit(c(a,1,b,2,c,3),  
 ,) ) , stringsAsFactors=FALSE) )
 'data.frame': 3 obs. of  2 variables:
  $ V1: chr  a b c
  $ V2: chr  1 2 3

do.call/rbind appeared to be TRT. I tried it and got a data frame with
list columns (instead of vectors);

as.data.frame(do.call(rbind,lapply(list.files(...), function (name) {

c(name,list(num1,num2,num3), # num* come from some calculations above
  
strsplit(sub([^-]*(train|test)[^-]*(-(S)?pca([0-9]*))?-s([0-9]*)c([0-9.]*)\\.score,
   \\1,\\3,\\4,\\5,\\6,name),,)[[1]])
  })), stringsAsFactors = FALSE)

'data.frame':   2 obs. of  8 variables:
 $ file:List of 2
  ..$ : chr zzz_test_0531_0630-Spca181-s0c10.score
  ..$ : chr zzz_train_0531_0630-Spca181-s0c10.score
 $ lift.quality:List of 2
  ..$ : num 0.59
  ..$ : num 0.621
 $ proficiency :List of 2
  ..$ : num 0.0472
  ..$ : num 0.0472
 $ set :List of 2
  ..$ : chr test
  ..$ : chr train
 $ scale   :List of 2
  ..$ : chr S
  ..$ : chr S
 $ pca :List of 2
  ..$ : chr 181
  ..$ : chr 181
 $ s   :List of 2
  ..$ : chr 0
  ..$ : chr 0
 $ c   :List of 2
  ..$ : chr 10
  ..$ : chr 10

I guess the easiest way is to replace c(...list()...) with c(...) but
that would mean converting num1,num2,num3 to string and back which I
want to avoid for aesthetic reasons. Any better suggestions?

thanks a lot!

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://jihadwatch.org http://thereligionofpeace.com
http://palestinefacts.org http://ffii.org http://pmw.org.il
I don't have an attitude problem. You have a perception problem.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply -- data.frame

2012-08-31 Thread Sam Steingold

 * William Dunlap jqha...@gvopb.pbz [2012-08-31 18:38:52 +]:

 Is the following something like what you are doing?

yes, absolutely, thanks a lot!


-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://pmw.org.il http://dhimmi.com
http://palestinefacts.org http://www.memritv.org http://mideasttruth.com
char*a=char*a=%c%s%c;main(){printf(a,34,a,34);};main(){printf(a,34,a,34);}

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] apply -- data.frame

2012-08-30 Thread Sam Steingold

Is there a way for an apply-type function to return a data frame?
the closest thing I think of is

  foo - as.data.frame(sapply(...))
  names(foo) - c()

is there a more elegant way?
Thanks!
-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://palestinefacts.org http://dhimmi.com
http://honestreporting.com http://ffii.org http://mideasttruth.com
Lisp: it's here to save your butt.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply -- data.frame

2012-08-30 Thread Sam Steingold

 * Sam Steingold f...@tah.bet [2012-08-30 08:56:17 -0400]:

 Is there a way for an apply-type function to return a data frame?
 the closest thing I think of is

   foo - as.data.frame(t(sapply(...)))
   names(foo) - c()

alas, this has a problem of creating a homogeneous data frame, i.e.,
all the columns are numbers or characters, because the function passed
to sapply returns c() and
 c(1,2,a)
[1] 1 2 a

e.g.,
as.data.frame(t(sapply(c(a,1,b,2,c,3),function (n) strsplit(n,,)[[1]])))
V1 V2
a,1  a  1
b,2  b  2
c,3  c  3

'data.frame':   3 obs. of  2 variables:
 $ V1: Factor w/ 3 levels a,b,c: 1 2 3
  ..- attr(*, names)= chr  a,1 b,2 c,3
 $ V2: Factor w/ 3 levels 1,2,3: 1 2 3
  ..- attr(*, names)= chr  a,1 b,2 c,3

I wanted the V1 column to be a string, and V2 to be a number.
(I know stringsAsFactors=FALSE would replace factors with strings, but I
need a string and a number)

I could, of course, do ret$V2 - as.numeric(ret$V2) but this would mean
a double conversion: from number to string first (by c()) and then back.

thanks.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://mideasttruth.com http://truepeace.org
http://openvotingconsortium.org http://ffii.org http://www.memritv.org
Diplomacy is the art of saying nice doggy until you can find a nice rock.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply -- data.frame

2012-08-30 Thread Sam Steingold

 * Bert Gunter thagre.ore...@trar.pbz [2012-08-30 09:59:46 -0700]:

 You really should spend a little more time with the docs figuring out
 what R _does_ and a little less complaining about what you think R
 cannot do.

The only thing I think R cannot do is compact its memory, thus,
effectively, leaking it in _some_ situations.

The rest are just my humble questions...

PS. speaking about complaining, my pet peeve atm is the speed (or,
rather, lack thereof) of e1071 functions read.matrix.csr and
write.matrix.csr (they are implemented in R, not in C, and do a lot of
string manipulation, so they slowness is not surprising)

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://pmw.org.il http://thereligionofpeace.com
http://truepeace.org http://openvotingconsortium.org http://ffii.org
The best propaganda of atheism is done by organized religion.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] apply -- data.frame

2012-08-30 Thread Sam Steingold

 * William Dunlap jqha...@gvopb.pbz [2012-08-30 17:35:08 +]:

 I don't agree with your analysis of what went wrong with your example
 a double conversion: from number to string first (by c()) and then back.
I did not make myself quite clear, sorry.
I should have written something like
c(1,2,a) == 1 2 a =[as.numeric]= 1 2 a

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://openvotingconsortium.org http://www.memritv.org
http://ffii.org http://truepeace.org http://palestinefacts.org
Those who can't write, write manuals.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] variable scope

2012-08-29 Thread Sam Steingold

 * Duncan Murdoch zheqbpu.qha...@tznvy.pbz [2012-08-29 10:30:10 -0400]:

 On 29/08/2012 12:50 AM, Sam Steingold wrote:
  * Duncan Murdoch zheqbpu.qha...@tznvy.pbz [2012-08-28 21:06:33 -0400]:
 
  On 12-08-28 5:55 PM, Sam Steingold wrote:
 
  my observation is that gc in R sucks.
  (it cannot release small objects).
  this is not specific to R; ocaml suffers too.
 
  Sorry, I didn't realize you were just a troll

 I am not.

 I am referring here to a very specific deficiency which plagues all
 non-moving GCs.

I guess non-compacting GC might be a more common expression.

 I think you're a troll because you're making false statements, such as
 that gc in R cannot release small objects, without any evidence in
 support of them.

This is common knowledge, discussed, e.g., here:
http://article.gmane.org/gmane.comp.lang.r.general:256174

Whether R GC cannot release small objects or cannot reuse the
fragmented memory after it releases the small objects is
inconsequential: R consumes RAM which it cannot use.

Again, this is a common deficiency in all memory management systems
which do not compact their storage; something studied in CS101.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://iris.org.il http://americancensorship.org
http://dhimmi.com http://openvotingconsortium.org http://truepeace.org
Never underestimate the power of stupid people in large groups.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] variable scope

2012-08-28 Thread Sam Steingold

At the end of a for loop its variables are still present:

for (i in 1:10) {
  x - vector(length=1)
}
ls()

will print i and x.
this means that at the end of the for loop body I have to write

  rm(x)
  gc()

is there a more elegant way to handle this?

Thanks.

-- 
Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 11.0.11103000
http://www.childpsy.net/ http://camera.org http://palestinefacts.org
http://iris.org.il http://www.PetitionOnline.com/tap12009/ http://truepeace.org
Computers are like air conditioners: they don't work with open windows!

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

1 2 >

1 - 100 of 198 matches

Mail list logo